Please use this identifier to cite or link to this item: http://dx.doi.org/10.14279/depositonce-10447
For citation please use:
Main Title: Dari Dataset for Named Entity Recognition DariNER2
Author(s): Zia, Ghezal Ahmad Jan
Type: Generic Research Data
References: http://dx.doi.org/10.14279/depositonce-10413
http://dx.doi.org/10.14279/depositonce-10437
http://dx.doi.org/10.14279/depositonce-10420
http://dx.doi.org/10.14279/depositonce-10532
Language Code: und
Abstract: DariNER2 is the release of the Dari sentence-level Named Entity annotated dataset, collected from Dari Azadi Radio. The goal of the project was to annotate a corpus comprising various genres of text (news, newsgroups, and interviews) in the Dari language with structural information (syntax). In addition, it is developed to support sentence-level ambiguity in the Dari text. It contains 883 sentences, 22K word/token. It is manually annotated and used the person (PER), location (LOC), organization (ORG), and miscellaneous (MISC) classes.
URI: https://depositonce.tu-berlin.de/handle/11303/11562
http://dx.doi.org/10.14279/depositonce-10447
Issue Date: 8-Aug-2020
Date Available: 11-Aug-2020
DDC Class: 000 informatics, information science, general works
Subject(s): Dari Named Entity Recognition Corpus
Dari NLP Resources
License: https://choosealicense.com/licenses/gpl-3.0/
Notes: File is encoded as UTF-8 with arabic characters.
Appears in Collections:FG Modelle und Theorie Verteilter Systeme » Research Data

Files in This Item:
DariNER2.csv

Sentence-level Dari Named Entity annotated dataset for Named Entity Recognition Task.

Format: CSV | Size: 353.9 kB
Download

Item Export Bar

Items in DepositOnce are protected by copyright, with all rights reserved, unless otherwise indicated.