Improving Trace Link Recovery using Semantic Relation Graphs and Spreading Activation
[Context & Motivation] Trace Link Recovery tries to identify and link related existing requirements with each other to support further engineering tasks. Existing approaches are mainly based on algebraic Information Retrieval or machine-learning. [Question/Problem] Machine-learning approaches usually demand reasonably large and labeled datasets to train. Algebraic Information Retrieval approaches like distance between tf-idf scores also work on smaller datasets without training but are limited in considering the context of semantic statements. [Principal Ideas/Results] In this work, we revise our existing Trace Link Recovery approach that is based on an explicit representation of the content of requirements as a semantic relation graph and uses Spreading Activation to answer trace queries over this graph. The approach generates sorted candidate lists and is fully automated including an NLP pipeline to transform unrestricted natural language requirements into a graph and does not require any external knowledge bases or other resources. [Contribution] To improve the performance, we take a detailed look at five common datasets and adapt the graph structure and semantic search algorithm. Depending on the selected configuration, the predictive power strongly varies. With the best tested configuration, the approach achieves a mean average precision of 50%, a Lag of 30% and a recall of 90%.
Published in: Requirements Engineering: Foundation for Software Quality : 27th International Working Conference, REFSQ 2021, Essen, Germany, April 12–15, 2021, Proceedings, 10.1007/978-3-030-73128-1_3, Springer