Automatically Generating Natural Language Documentation for Methods Christian D. Newman Rochester Institute of Tech New York, USA cnewman@se.rit.edu Natalia Dragan Kent State University Ohio, USA ndragan@kent.edu Michael L. Collard The University of Akron Ohio, USA collard@uakron.edu Jonathan I. Maletic Kent State University Ohio, USA jmaletic@kent.edu Michael J. Decker Bowling Green State University Ohio, USA mdecke@bgsu.edu Drew T. Guarnera Kent State University Ohio, USA dguarner@kent.edu Nahla Abid Taibah University Saudi Arbia nabid@kent.edu Abstract— A tool to automatically generate natural language documentation summaries for methods is presented. The approach uses prior work by the authors on stereotyping methods along with the source code analysis framework srcML. First, each method is automatically assigned a stereotype(s) based on static analysis and a set of heuristics. Then, the approach uses the stereotype information, static analysis, and predefined templates to generate a natural-language summary for each method. This summary is automatically added to the code base as a comment for each method. The predefined templates are designed to produce a generic summary for specific method stereotypes. Keywords—Documentation, Stereotype, Method Summarization I. INTRODUCTION There are many techniques that automatically generate summaries/documentation directly from source code [1][2][3][4][5][6][7][8]. While these approaches can produce high-quality results, their performance depends upon high- quality identifiers and method names due to reliance on Natural Language Processing (NLP) techniques, which are not always reliable on source code [12], [13]. Consequently, if identifiers and methods are poorly named, the approach may fail to generate accurate comments or any reasonable comments at all. Recently, Abid et al. [9][10] proposed an approach that uses stereotypes [11] in the automatic generation of documentation for methods. This approach does not depend on NLP at all and is the one we will apply in this work. Instead, the summaries (for methods) are generated using predefined fill-in-the-blank sentence phrases, which are known as templates. We constructed the templates specifically for different method stereotypes. Stereotypes reflect the basic meaning and behavior of a method and include concepts such as predicate, set, and factory. In previous work by the authors on method stereotypes, a fully automated approach to assign stereotypes to methods was developed and evaluated [11]. Our tool, MethodMan; short for Method Manual, leverages stereotype information and custom template phrases. We constructed a custom template phrase for each stereotype to form the summaries. After the appropriate templates are matched to the method, they are filled with data to form the complete summary. To fill in the templates, static analysis and fact extraction on the method is done using the srcML [14][15] infrastructure (www.srcML.org). The generation of the summaries is fully automated. The summaries start with a short and precise description of the main responsibility of the method. Also included is additional information about external objects, properties modified, and a list of function calls along with their stereotypes. II. DATA SOURCES USED The approach uses only the source code file(s). However, it also leverages srcML and the stereotype information. srcML can accurately parse a single file or code fragments. As such the entire system is not required to compile to produce documentation for one class or method. III. APPROACH MethodMan takes source code as input and for each method automatically generates a documentation summary that is placed as a comment block above the method. An example of the automatically generated documentation summary for one method is presented in Fig. 1. The member function calcWidthParm() is in the class BinAxsLinear from the open- source system HippoDraw. The summary starts off with a short description, (e.g., the first three lines in Fig. 1). This emphasizes the computed value along with the data members and parameters used in the computation. Following that is a list of calls made from this method along with the corresponding stereotypes of the calls. To automatically construct a summary, two issues must be addressed. The first is determining what information should be included in the summary. The next is to present this information efficiently. Previous studies [7][8] on source-code summarization have investigated the former issue in depth and this information is used as a set of guidelines for building our automatic source-code summarization tool. First, the method summary should include at least a verb (i.e., describes the action performed by the method) and an object (i.e., describes