No Thumbnail Available

Dari Dataset for Named Entity Recognition DariNER2

Zia, Ghezal Ahmad Jan

FG Modelle und Theorie Verteilter Systeme

DariNER2 is the release of the Dari sentence-level Named Entity annotated dataset, collected from Dari Azadi Radio. The goal of the project was to annotate a corpus comprising various genres of text (news, newsgroups, and interviews) in the Dari language with structural information (syntax). In addition, it is developed to support sentence-level ambiguity in the Dari text. It contains 883 sentences, 22K word/token. It is manually annotated and used the person (PER), location (LOC), organization (ORG), and miscellaneous (MISC) classes.
  • File is encoded as UTF-8 with arabic characters.