Newman, N.C., Porter, A.L, Newman, D., Courseault-Trumbach, C., and Bolan, S.D., Comparing Methods to Extract Technical Content for Technological Intelligence, PICMET (Portland International Conference on Management of Engineering and Technology), Vancouver, 2012. Comparing Methods to Extract Technical Content for Technological Intelligence Nils C. Newman IISC P.O. Box 77691 Atlanta, GA 30357 newman@iisco.com Alan L. Porter Georgia Institute of Technology Atlanta, GA 30332 David Newman University of California, Irvine Irvine, CA 92697 Cherie Courseault University of New Orleans New Orleans, LA 70148 Stephanie D. Bolan Georgia Institute of Technology Atlanta, GA 30332 Abstract: We are developing indicators for the emergence of science and technology (S&T) topics. We are targeting various S&T information resources, including metadata (i.e., bibliographic information) and full text. We explore alternative text analysis approaches – principal components analysis (PCA) and topic modeling – to extract technical topic information. We analyze the topical content to pursue potential applications and innovation pathways. In this presentation we compare alternative ways of consolidating messy sets of key terms [e.g., using Natural Language Processing (NLP) on abstracts and titles, together with various keyword sets]. Our process includes combinations of stopword removal, fuzzy term matching, association rules, and tf-idf weighting. We compare PCA results to topic modeling results. Our key test set consists of 4104 Web of Science records on Dye-Sensitized Solar Cells (DSSCs). Results suggest good potential to enhance our technical intelligence payoffs from database searches on topics of interest. 1. Introduction Tracking technologies or trying to determine their state has always been a challenging task. The globalization of research has only added to the difficulty. In the past, analysts primarily used expertise augmented by research to assess the state of technologies. However, the increasing availability of electronic information about technology has opened up new possibilities to invert this process. Since the mid-1980s researchers at the Technology Policy and Assessment Center at the Georgia Institute of Technology have been investigating the use of text mining to aid in the assessment of technologies [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. This research is based on the premise that computer processable electronic records (bibliographic journal abstracts, full text journal articles, conference proceedings, etc.) can be effectively text mined and that the results of that mining can help determine the state of a technology. The process that evolved at Georgia Tech over 20+ years of development uses the output of text mining to good effect, but, overall, the techniques employed in the “Tech Mining” process still require a fair amount of analyst judgment and expertise in text mining [11]. The question today is can this Tech Mining