Dari Dataset for Named Entity Recognition DariNER2
dc.contributor.author | Zia, Ghezal Ahmad Jan | |
dc.date.accessioned | 2020-08-11T06:47:05Z | |
dc.date.available | 2020-08-11T06:47:05Z | |
dc.date.issued | 2020-08-08 | |
dc.description | File is encoded as UTF-8 with arabic characters. | en |
dc.description.abstract | DariNER2 is the release of the Dari sentence-level Named Entity annotated dataset, collected from Dari Azadi Radio. The goal of the project was to annotate a corpus comprising various genres of text (news, newsgroups, and interviews) in the Dari language with structural information (syntax). In addition, it is developed to support sentence-level ambiguity in the Dari text. It contains 883 sentences, 22K word/token. It is manually annotated and used the person (PER), location (LOC), organization (ORG), and miscellaneous (MISC) classes. | en |
dc.identifier.uri | https://depositonce.tu-berlin.de/handle/11303/11562 | |
dc.identifier.uri | http://dx.doi.org/10.14279/depositonce-10447 | |
dc.language.iso | und | en |
dc.relation.references | http://dx.doi.org/10.14279/depositonce-10413 | |
dc.relation.references | http://dx.doi.org/10.14279/depositonce-10437 | |
dc.relation.references | http://dx.doi.org/10.14279/depositonce-10420 | |
dc.relation.references | http://dx.doi.org/10.14279/depositonce-10532 | en |
dc.rights.uri | https://choosealicense.com/licenses/gpl-3.0/ | en |
dc.subject.ddc | 000 informatics, information science, general works | de |
dc.subject.other | Dari Named Entity Recognition Corpus | en |
dc.subject.other | Dari NLP Resources | en |
dc.title | Dari Dataset for Named Entity Recognition DariNER2 | en |
dc.type | Textual Data | en |
tub.accessrights.dnb | unknown | * |
tub.affiliation | Fak. 4 Elektrotechnik und Informatik::Inst. Softwaretechnik und Theoretische Informatik::FG Modelle und Theorie Verteilter Systeme | de |
tub.affiliation.faculty | Fak. 4 Elektrotechnik und Informatik | de |
tub.affiliation.group | FG Modelle und Theorie Verteilter Systeme | de |
tub.affiliation.institute | Inst. Softwaretechnik und Theoretische Informatik | de |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- DariNER2.csv
- Size:
- 353.9 KB
- Format:
- Comma-separated values
- Description:
- Sentence-level Dari Named Entity annotated dataset for Named Entity Recognition Task.
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 2.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description: