Dari Dataset for Named Entity Recognition DariNER2

dc.contributor.authorZia, Ghezal Ahmad Jan
dc.date.accessioned2020-08-11T06:47:05Z
dc.date.available2020-08-11T06:47:05Z
dc.date.issued2020-08-08
dc.descriptionFile is encoded as UTF-8 with arabic characters.en
dc.description.abstractDariNER2 is the release of the Dari sentence-level Named Entity annotated dataset, collected from Dari Azadi Radio. The goal of the project was to annotate a corpus comprising various genres of text (news, newsgroups, and interviews) in the Dari language with structural information (syntax). In addition, it is developed to support sentence-level ambiguity in the Dari text. It contains 883 sentences, 22K word/token. It is manually annotated and used the person (PER), location (LOC), organization (ORG), and miscellaneous (MISC) classes.en
dc.identifier.urihttps://depositonce.tu-berlin.de/handle/11303/11562
dc.identifier.urihttp://dx.doi.org/10.14279/depositonce-10447
dc.language.isounden
dc.relation.referenceshttp://dx.doi.org/10.14279/depositonce-10413
dc.relation.referenceshttp://dx.doi.org/10.14279/depositonce-10437
dc.relation.referenceshttp://dx.doi.org/10.14279/depositonce-10420
dc.relation.referenceshttp://dx.doi.org/10.14279/depositonce-10532en
dc.rights.urihttps://choosealicense.com/licenses/gpl-3.0/en
dc.subject.ddc000 informatics, information science, general worksde
dc.subject.otherDari Named Entity Recognition Corpusen
dc.subject.otherDari NLP Resourcesen
dc.titleDari Dataset for Named Entity Recognition DariNER2en
dc.typeTextual Dataen
tub.accessrights.dnbunknown*
tub.affiliationFak. 4 Elektrotechnik und Informatik::Inst. Softwaretechnik und Theoretische Informatik::FG Modelle und Theorie Verteilter Systemede
tub.affiliation.facultyFak. 4 Elektrotechnik und Informatikde
tub.affiliation.groupFG Modelle und Theorie Verteilter Systemede
tub.affiliation.instituteInst. Softwaretechnik und Theoretische Informatikde

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
DariNER2.csv
Size:
353.9 KB
Format:
Comma-separated values
Description:
Sentence-level Dari Named Entity annotated dataset for Named Entity Recognition Task.
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.71 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections