No Thumbnail Available
Dari Dataset for Named Entity Recognition DariNER2
Dari Dataset for Named Entity Recognition DariNER2
Zia, Ghezal Ahmad Jan
FG Modelle und Theorie Verteilter Systeme
DariNER2 is the release of the Dari sentence-level Named Entity annotated dataset, collected from Dari Azadi Radio. The goal of the project was to annotate a corpus comprising various genres of text (news, newsgroups, and interviews) in the Dari language with structural information (syntax). In addition, it is developed to support sentence-level ambiguity in the Dari text. It contains 883 sentences, 22K word/token. It is manually annotated and used the person (PER), location (LOC), organization (ORG), and miscellaneous (MISC) classes.
- File is encoded as UTF-8 with arabic characters.