Automatic glossary term extraction from large-scale requirements specifications

dc.contributor.authorGemkow, Tim
dc.contributor.authorConzelmann, Miro
dc.contributor.authorHartig, Kerstin
dc.contributor.authorVogelsang, Andreas
dc.date.accessioned2018-06-19T08:44:32Z
dc.date.available2018-06-19T08:44:32Z
dc.date.issued2018
dc.description.abstractCreating glossaries for large corpora of requirments is an important but expensive task. Glossary term extraction methods often focus on achieving a high recall rate and, therefore, favor linguistic proecssing for extracting glossary term candidates and neglect the benefits from reducing the number of candidates by statistical filter methods. However, especially for large datasets a reduction of the likewise large number of candidates may be crucial. This paper demonstrates how to automatically extract relevant domain-specific glossary term candidates from a large body of requirements, the CrowdRE dataset. Our hybrid approach combines linguistic processing and statistical filtering for extracting and reducing glossary term candidates. In a twofold evaluation, we examine the impact of our approach on the quality and quantity of extracted terms. We provide a ground truth for a subset of the requirements and show that a substantial degree of recall can be achieved. Furthermore, we advocate requirements coverage as an additional quality metric to assess the term reduction that results from our statistical filters. Results indicate that with a careful combination of linguistic and statistical extraction methods, a fair balance between later manual efforts and a high recall rate can be achieved.en
dc.identifier.eissn2332-6441
dc.identifier.isbn978-1-5386-7418-5
dc.identifier.isbn978-1-5386-7419-2
dc.identifier.issn1090-705X
dc.identifier.urihttps://depositonce.tu-berlin.de/handle/11303/7951
dc.identifier.urihttp://dx.doi.org/10.14279/depositonce-7113
dc.language.isoenen
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subject.ddc004 Datenverarbeitung; Informatikde
dc.subject.otherrequirements engineeringen
dc.subject.othernatural language processingen
dc.subject.otherglossary term extractionen
dc.subject.otherCrowd REen
dc.titleAutomatic glossary term extraction from large-scale requirements specificationsen
dc.typeConference Objecten
dc.type.versionacceptedVersionen
dcterms.bibliographicCitation.doi10.1109/RE.2018.00052
dcterms.bibliographicCitation.originalpublishernameIEEEen
dcterms.bibliographicCitation.originalpublisherplaceNew Yorken
dcterms.bibliographicCitation.pageend417
dcterms.bibliographicCitation.pagestart412
dcterms.bibliographicCitation.proceedingstitle2018 IEEE 26th International Requirements Engineering Conference (RE)en
tub.accessrights.dnbfreeen
tub.affiliationFak. 4 Elektrotechnik und Informatik::Inst. Telekommunikationssysteme::FG IT-basierte Fahrzeuginnovationende
tub.affiliation.facultyFak. 4 Elektrotechnik und Informatikde
tub.affiliation.groupFG IT-basierte Fahrzeuginnovationende
tub.affiliation.instituteInst. Telekommunikationssystemede
tub.publisher.universityorinstitutionTechnische Universität Berlinen

Files

Original bundle
Now showing 1 - 1 of 1
Loading…
Thumbnail Image
Name:
2018_gemkow_et-al.pdf
Size:
619.14 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
4.9 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections