Automated User Story Generation with Test Case Specification Using Large Language Model 1 st Tajmilur Rahman Computer Science University of Saskatchewan Saskatoon, SK, Canada qoy860@usask.ca 2 nd Yuecai Zhu Enterprise Data Platform Bell Mobility Montreal, QC, Canada yuecai.zhu@bell.ca Abstract—Modern Software Engineering era is moving fast with the assistance of artificial intelligence (AI), especially Large Language Models (LLM). Researchers have already started automating many parts of the software development workflow. Requirements Engineering (RE) is a crucial phase that begins the software development cycle through multiple discussions on a proposed scope of work documented in different forms. RE phase ends with a list of user-stories for each unit task identified through discussions and usually these are created and tracked on a project management tool such as Jira, AzurDev etc. In this research we developed a tool “GeneUS” using GPT-4.0 to automatically create user stories from requirements document which is the outcome of the RE phase. The output is provided in JSON format leaving the possibilities open for downstream integration to the popular project management tools. Analyzing requirements documents takes significant effort and multiple meetings with stakeholders. We believe, automating this process will certainly reduce additional load off the software engineers, and increase the productivity since they will be able to utilize their time on other prioritized tasks. Index Terms—Prompt Engineering, LLM, User Story, Auto- mated Software Engineering I. I NTRODUCTION A “user story” [1] is commonly used in the Agile software development process. It is a description of a unit task that contains the overall description of a particular task, including what to develop, why users/stakeholders need it, and how it should be developed. In addition, a user story typically includes functional and non-functional constraints, acceptance criteria, a clear definition of when a task can be marked as “Done”, and often test case and coverage specification. Traditionally, software engineers create a requirements doc- ument during the RE phase after several back-and-forth meetings with the stakeholders. Engineers then distill these requirements into individual tasks by creating and adding user stories into the project management system. This process is heavily effort-consuming and requires a large amount of time from the developers. Senior developers, team leads, QA leads, project managers, and scrum masters mostly remains busy interpreting client’s statements into specific unit tasks that are easy for developers to understand. Leveraging large language models(LLMs) to automate soft- ware engineering processes and development activities is becoming extremely popular and evolving rapidly in both academia and industry. To have LLMs automatically complete tasks, we need to develop a mechanism to send appropriate instructions to the model. Such instructions are called prompts. The engineering of prompts is commonly known as prompt engineering [2], which is a fundamental methodology in the field of responsive AI. As the development of LLMs progresses, the importance of prompt engineering becomes increasingly evident. Designing suitable prompts for specific tasks has emerged as a meaningful research direction. Our study involves extensive work on prompt engineering to gener- ate user stories from the high-level semi-detailed requirements specifications. This research is a unique contribution to the automation of software engineering processes using LLMs. As per our knowledge no such study has been conducted yet to generate user stories with necessary functional and test specifications automatically using LLMs. Pre-trained LLM, including ChatGPT [3] and Google Palm [4], have not been developed as intelligent as a human. Simply asking the LLM to provide user stories and test cases with the requirements document does not generate a desirable and useful outcome. To overcome this challenge, we propose a prompting technique: Refine and Thought (RaT), which is a specialized version of Chain of Thought (CoT) prompting [5]. RaT prompting instructs the LLM to filter out meaningless tokens and refine redundant information from the input in the thought chain. RaT optimizes pre-trained LLM’s performance in handling redundant information and meaningless tokens and significantly improves the generated user stories and test cases in our application. To validate the result, we develop the RUST (Readabil- ity Understandability, Specifiability, Technical-aspects) survey questionnaire and send it to 50 developers of various back- grounds having experience in creating, reading, and track- ing user stories in Agile software development environment. The quantitative analysis of the survey participants’ feedback shows that the performance of our approach is highly accept- able with 5% missing technical details, and 1% of ambiguous task-description, and 0.5% duplicates. A. Contributions The ultimate outcome of this research is a tool “GeneUS” that takes a requirements document from users and delivers the arXiv:2404.01558v1 [cs.SE] 2 Apr 2024