Please use this identifier to cite or link to this item:
For citation please use:
Main Title: Dari Dataset for Named Entity Recognition DariNER2
Author(s): Zia, Ghezal Ahmad Jan
Type: Generic Research Data
Language Code: und
Abstract: DariNER2 is the release of the Dari sentence-level Named Entity annotated dataset, collected from Dari Azadi Radio. The goal of the project was to annotate a corpus comprising various genres of text (news, newsgroups, and interviews) in the Dari language with structural information (syntax). In addition, it is developed to support sentence-level ambiguity in the Dari text. It contains 883 sentences, 22K word/token. It is manually annotated and used the person (PER), location (LOC), organization (ORG), and miscellaneous (MISC) classes.
Issue Date: 8-Aug-2020
Date Available: 11-Aug-2020
DDC Class: 000 informatics, information science, general works
Subject(s): Dari Named Entity Recognition Corpus
Dari NLP Resources
Notes: File is encoded as UTF-8 with arabic characters.
Appears in Collections:FG Modelle und Theorie Verteilter Systeme » Research Data

Files in This Item:

Sentence-level Dari Named Entity annotated dataset for Named Entity Recognition Task.

Format: CSV | Size: 353.9 kB

Item Export Bar

Items in DepositOnce are protected by copyright, with all rights reserved, unless otherwise indicated.