Please use this identifier to cite or link to this item: http://dx.doi.org/10.14279/depositonce-10532
For citation please use:
Main Title: Dari Dataset for Coreference Resolution
Author(s): Zia, Ghezal Ahmad Jan
Other Contributor(s): Amini, Fazel
Type: Generic Research Data
URI: https://depositonce.tu-berlin.de/handle/11303/11644
http://dx.doi.org/10.14279/depositonce-10532
License: https://choosealicense.com/licenses/gpl-3.0/
Abstract: DariCoref, a Dari corpus annotated for anaphoric relations, where all documents are collected from Dari VOA and Azadi Radio. The annotation scheme follows the OntoNotes and WikiCoref. Each markable annotated with coreference type (Identical, Attributive, and Copular), and mention type (Named Entity, Noun Phrase, and Pronominal). Since this is the first annotation efforts concentrate on very specific types of written text, mainly newswire, there is a lack of resources for Dari texts. Therefore, we present a freely available resource we devised for the task of coreference resolution algorithms dedicated to Dari texts. The annotation has been processed by MMAX2 tool.
Subject(s): DariCoref
Dari NLP Resources
Dari Coreference Resolution Dataset
Issue Date: 8-Sep-2020
Date Available: 14-Sep-2020
References: 10.14279/depositonce-10447
10.14279/depositonce-10413
10.14279/depositonce-10437
http://dx.doi.org/10.14279/depositonce-10420
Language Code: en
DDC Class: 000 informatics, information science, general works
TU Affiliation(s): Fak. 4 Elektrotechnik und Informatik » Inst. Softwaretechnik und Theoretische Informatik » FG Modelle und Theorie Verteilter Systeme
Appears in Collections:Technische Universit├Ąt Berlin » Research Data

Files in This Item:
DariCoref.zip

DariCoref, a Dari corpus annotated for anaphoric relations, where all documents are collected from Azadi Radio and Dari VOA. The annotation scheme follows the OntoNotes and WikiCoref..

Format: ZIP Archive | Size: 7.44 MB
Download
Description.txt
Format: Text | Size: 672 B
Download

Item Export Bar

Items in DepositOnce are protected by copyright, with all rights reserved, unless otherwise indicated.