MIT Artiﬁcial Intelligence Laboratory, September 2002 1 Sketch Recognition in Software Design Tracy Hammond, Krzysztof Gajos, Randall Davis, Howard Shrobe The Problem: Sketching is a natural and integral part of software design. Software developers use sketching to aid in the brainstorming of ideas, visualizing programming organization, and understand- ing of requirements. Unfortunately, when it comes to coding the system, the drawings are left behind. We see sketch recognition as a way to bridge that gap. In addition to the vast amount of information given by a sketch, a plethora of other design informa- tion may be voiced during a software design meeting. We can capture the spoken and visual software design meeting information by videotaping the meeting and any white-boards used. By indexing these videos, we make it easy to retrieve the videotaped information without watching the entire video from start to ﬁnish. Motivation: We want to allow software design meetings to continue as they are, with software de- signers discussing the design and drawing free-hand sketches of these designs on a white-board. Using our system, designers can sketch naturally, as we place few requirements on the sketcher. We recog- nize and interpret these diagrams using sketch recognition. Because the diagrams are interpreted, we provide natural editing capabilities to the designers, allowing the users to edit their original strokes in an intuitive way. For instance, the designer can drag their drawn class from the center and move all of the strokes used to draw the class as well as stretch and skew the strokes used to create an attached arrow. The interpreted diagrams are used to automatically generate stub code using a software engi- neering tool. Software design meetings are videotaped to capture visual and spoken design information unobtrusively. When drawn items are interpreted, we use these understood sketch events to index the videotape of the software design meeting. We decided to design our application as a Metaglue agent since the Metaglue agent architecture provides support for multi-modal interactions through speech, gesture, and graphical user interfaces[2]. The Metaglue agent architecture also provides mechanisms for resource discovery and management which allows us to use available video agents or screen capture agents in a Metaglue supported room. We have selected UML-type diagrams because they are a de facto standard for depicting software applications. Within UML [1] we focused on class diagrams, ﬁrst because of their central role in de- scribing program structure, and second because many of the symbols used in class diagrams are quite similar, and hence, offer an interesting challenge for sketch recognition. We added several symbols for agent-design since many of the applications created in the Intelligent Room [6] of the MIT AI Lab are agent-based. Previous Work: Work at Berkeley by Hse [7] has shown that users prefer a single-stroke sketch-based user interface to a mouse-and-palette based tool for UML design. One company [3] has developed a gesture based diagramming tool, Ideogramic UML, TM which al- lows users to sketch UML diagrams. The tool is based on a grafﬁti-like implementation and requires users to draw each gesture in one stroke, in the direction and style as speciﬁed by the user manual. As a consequence, some of the gestures drawn only loosely resemble the output glyph. For example, ϕ is the stroke used to indicate an actor, drawn by the system as a stick ﬁgure. Work at Queen’s University has developed a system to recognize sketches of UML diagrams using a distance metric [8]. Each glyph (square, circle, or line) is classiﬁed based on the total stroke length compared to the perimeter of its bounding box (e.g., if the stroke length is approximately equal to the perimeter of the bounding box, it is classiﬁed as a square). The shape of the stroke is not considered. Approach: We have created Tahuti [4], a system for recognizing hand-drawn sketches of UML class diagrams. Users can sketch UML-type diagrams on a white board in the same way they are drawn on paper, and have the diagrams recognized by the computer. The system differs from grafﬁti-based approaches to this task in that it recognizes objects by how they look, not by how they are drawn. While sketching, the sketcher can seamlessly switch between the interpreted designs and the original strokes (See Figure ). Editing commands operate identically in the two views.