Please use this identifier to cite or link to this item: http://dx.doi.org/10.14279/depositonce-10420
For citation please use:
Main Title: Dari Dataset for Part-of-Speech
Author(s): Zia, Ghezal Ahmad Jan
Other Contributor(s): Amini, Fazel
Type: Textual Data
URI: https://depositonce.tu-berlin.de/handle/11303/11536
http://dx.doi.org/10.14279/depositonce-10420
License: https://choosealicense.com/licenses/lgpl-3.0/
Abstract: This dataset is related to the task of part-of-speech tagging on the Dari language. It will be usable for many tasks of Natural Language processing on Dari text. The size of the dataset is 12K and it is annotated manually. The tagset used in this dataset is the Universal Tagger.
Subject(s): Dari POS Corpus
Dari Information Extraction
Issue Date: 25-Jul-2020
Date Available: 28-Jul-2020
References: http://dx.doi.org/10.14279/depositonce-10437
http://dx.doi.org/10.14279/depositonce-10413
http://dx.doi.org/10.14279/depositonce-10447
http://dx.doi.org/10.14279/depositonce-10532
Language Code: und
DDC Class: 000 informatics, information science, general works
Notes: File is encoded as UTF-8 with arabic characters.
TU Affiliation(s): Fak. 4 Elektrotechnik und Informatik » Inst. Softwaretechnik und Theoretische Informatik » FG Modelle und Theorie Verteilter Systeme
Appears in Collections:Technische Universit├Ąt Berlin » Research Data

Files in This Item:
DariPOS_dataset.csv

Dari Part-of-Speech Tagging Dataset

Format: CSV | Size: 182.96 kB
Download

Item Export Bar

Items in DepositOnce are protected by copyright, with all rights reserved, unless otherwise indicated.