1 An Automatic Method for Extracting Citations from Google Books 1 Kayvan Kousha Statistical Cybermetrics Research Group, School of Technology, University of Wolverhampton, Wulfruna Street, Wolverhampton WV1 1LY, UK E-mail: k.kousha@wlv.ac.uk Mike Thelwall Statistical Cybermetrics Research Group, School of Technology, University of Wolverhampton, Wulfruna Street, Wolverhampton WV1 1LY, UK. E-mail: m.thelwall@wlv.ac.uk Recent studies have shown that counting citations from books can help scholarly impact assessment and that Google Books (GB) is a useful source of such citation counts, despite its lack of a public citation index. Searching GB for citations produces approximate matches, however, and so its raw results need time- consuming human filtering. In response, this article introduces a method to automatically remove false and irrelevant matches from GB citation searches in addition to introducing refinements to a previous GB manual citation extraction method. The method was evaluated by manual checking of sampled GB results and comparing citations to about 14,500 monographs in the Thomson Reuters Book Citation Index (BKCI) against automatically extracted citations from GB across 24 subject areas. GB citations were 103% to 137% as numerous as BKCI citations in the humanities, except for tourism (72%) and linguistics (91%), 46% to 85% in social sciences, but only 8% to 53% in the sciences. In all cases, however, GB found substantially more citing books than did BKCI, with BKCI's results coming predominantly from journal articles. Moderate correlations between the GB and BKCI citation counts in social sciences and humanities, with most BKCI results coming from journal articles rather than books, suggests that they could measure the different aspects of impact, however. Introduction Books are major scholarly outputs in many social sciences and humanities disciplines and are therefore important for research evaluation (e.g., Moed, 2005; Nederhof, 2006; Huang & Chang, 2008). For instance, about a third of the submissions in social sciences and humanities fields to the 2008 U.K. Research Assessment Exercise (RAE) were books in comparison to about 1% in the sciences (Kousha, Thelwall & Rezaie, 2011). Moreover, counting citations from books rather than journal articles can give different results when benchmarking authors (Cronin, Snyder & Atkins, 1997) and countries (Archambault et al., 2006) in the social sciences and humanities. This shows that citations from books are an important source of impact evidence that cannot be replaced by citations from journal articles. The lack of a comprehensive index for the bibliographic references of books is therefore an issue for bibliometric monitoring of research in book-based disciplines. Almost two decades ago, this led to a call to include citations from books in academic citation databases (Garfield, 1996). Nevertheless, most previous quantitative investigations into the impact of book-based scholarship have counted citations from journal articles indexed in the commercial citation databases (Web of Science and Scopus) (e.g., Glänzel & Schoepflin, 1999; Butler & Visser, 2006; Bar-Ilan, 2010; Hammarfelt, 2011) rather than citations from other books, although some studies have manually extracted cited references from selected monographs for bibliometric analysis (e.g., Cullars, 1998; Krampen, Becker, Wahner & Montada, 2007). There have also been initiatives to use non-citation metrics for usage assessment of books, such as counting library holdings (“libcitations”) (White, Boell, Yu et al., 2009) and using library loan statistics (Cabezas-Clavijo et al., 2013). Several attempts have been made to extract citations from academic books on a large scale for citation analysis or citation searching. In 2011 Thomson Reuters introduced the Book Citation Index, a set of citations from selected academic books and book chapters that could be added to the journal citations in the Web of Science (WoS). Whilst this is a valuable 1 This is a preprint of an article to be published in the Journal of the American Society for Information Science and Technology © copyright 2013 John Wiley & Sons, Inc.