Isolating Concerns in Requirements Using Latent Semantic Analysis Raphael Kin Dick Suen and Elisa L. A. Baniassad Department of Computer Science & Engineering, Chinese University. of Hong Kong, China. {kdsuen, elisa}@cse.cuhk.edu.hk Abstract Latent Semantic Analysis (LSA) is an approach developed by Thomas Landauer et al [LSA]. It applies statistical analysis of a large text corpora for assessing the similarity of terms described in a sample input text. Other work on Aspects, Con- cerns and Requirements [Theme/Doc, Sampaio] has shown that depicting the relationships between concepts described in text is helpful to a developer who identifying a set of early concerns. The Latent Semantic Analysis approach may facili- tate this activity, since finding clusters of requirements is the first step in locating broadly scoped properties that affect many other requirements. In this work, we applied the publicly available LSA tools, in combination with our own manipula- tion of the results of those tools, to determine whether LSA is a promising technique for Aspect-Oriented Requirements Analy- sis (AORA). We found that LSA is indeed a useful approach, however more work is needed to compare it against other natural language based approaches such as [Sampaio] to determine whether it is the best approach, or could be used in concert with such approaches. This paper provides prelimi- nary evidence of the appropriateness of LSA for AORA. 1. Introduction Latent Semantic Analysis (LSA) is an approach for determin- ing the similarity of terms or portions of text based on statisti- cal analysis of a large text corpora [LSA]. One of the most intriguing aspects of LSA is that it operates language inde- pendently, meaning that it can be applied to Chinese, German, English or Java with equal efficacy. As such, LSA has been used for analysis of programs for the sake of semi-automated clustering [Maletic], and analysis [Marcus]. AORA can be divided into two main activities: determining concern decomposition of a set of requirements, and assessing which concerns have an impact over other concerns. Those ‘broadly scoped’ concerns are considered ‘crosscutting con- cerns’, or, by some definitions, aspects. Such concerns are described in a requirement called an aspectual requirement, which describes the influence of one concern (the aspect) over the other concern (the base) [EA]. Determining the dominant concern decomposition of a set of requirements involves clustering requirements that refer to similar concepts. This can be done manually, but is time con- suming and prone to error. Alternatively, lexical approaches for visualized keyword-searching [Theme/Doc] have been shown to be helpful. Finally, preliminary work is showing that natural language processing can be used to enhance the keyword-search approach [Sampaio]. LSA however may pro- vide the ability to semi-automatically cluster the requirements set to reveal the dominant concerns and their associated requirements. LSA may also prove helpful for identifying aspectual requirements. In revealing dominant concerns, it may high- light requirements that relate to more than one concern. These could be analyzed to see if they do indeed describe the influ- ence of one concern over another. These two possibilities make LSA interesting to investigate for assisting in Aspect Oriented Requirements Analysis (AORA). The goal of this work is to apply currently available LSA tools 1 and assess whether they provide help in visually clustering requirements into concerns, and in revealing aspectual requirements. We assessed the LSA approach in terms of three criteria: whether it is better than the lexical approach for pairing requirements and keywords whether it is better than the lexical approach for isolating cohesive clusters of requirements whether it is effective at highlighting aspectual requirements, and hence at revealing crosscutting concerns. We first describe the approach we used in obtaining and visu- alizing LSA results for the Pet Store case study [Petstore] (Section 2: Approach). We then provide the analysis of the three assessment criteria described above (Section 3: Analysis). Then, (Section 4: Discussion) we provide discussion of the results, and additional observations. Finally, we discuss related work (Section 5: Related Work) and conclude (Section 6: Con- clusions & Future Work). We also provide an appendix that contains the Pet Store requirements and concerns we used in the case study. 2. Approach To apply the LSA approach for AORA, we used the on-line tools for LSA, and then applied graphing filters to the output of those tools. We first describe the use of the LSA tools, and then describe the graph filters applied. 2.1. Case Study: The Pet Store As a basis for assessing LSA for AORA, we used the pet store case study [Petstore]. We used requirements from the on-line document, and maintained their original point-form organiza- tion. The set of requirements we used is shown in Appendix A. 2.2. Text Processing using LSA The LSA website provides five applications, each of which provides textual similarity-analysis based on statistical analysis of a particular corpus, or “term space”. “Near Neighbors” Submission to the Early-Aspects OOPSLA’05 Workshop Page 1 of 6 1 available at http://lsa.colorado.edu/