Brown, Stephen, David Croft and Simon Coupland. 'Methods for Mining Messy Real World Data: Co-reference Identification Using Fuzzy Logic'. Source: https://www.dhi.ac.uk/openbook/chapter/dhc2014-brown 1 Methods for Mining Messy Real World Data: Co-reference Identification Using Fuzzy Logic by Stephen Brown, David Croft and Simon Coupland Citation Brown, Stephen, David Croft and Simon Coupland. 'Methods for Mining Messy Real World Data: Co-reference Identification Using Fuzzy Logic'. In: Clare Mills, Michael Pidd and Jessica Williams. Proceedings of the Digital Humanities Congress 2014. Studies in the Digital Humanities. Sheffield: The Digital Humanities Institute, 2014.Available online at: <https://www.dhi.ac.uk/openbook/chapter/dhc2014-brown> Abstract As the number and volume of online museum collections grows, there is an increasing imperative to improve their discoverability by finding ways of linking records that go beyond simple keyword searching. Keyword searches are inefficient because they are prone to errors of both omission and commission. A range of more sophisticated approaches have been developed for cross collection searching including metadata harvesting, data mining, Linked Data and Application Programming Interfaces (APIs) but these variously rely on availability of well structured, consistent and standardised data and a large corpus of text. While some heritage institutions have successfully implemented one or more of these approaches, pioneering the way for others to follow, the majority of online collection records are not amenable to such treatments because they employ different data schemas, which when they are applied inconsistently, are often fragmentary and imprecise and non-machine readable. A further challenge is that while there are many millions of individual object records, each containing many different fields, the entries in each separate field tend to be quite small, eg. dates, person names, titles, making it difficult to apply corpus based approaches to data analysis. Converting fragmentary, messy museum records into well-structured data is an unlikely prospect in the immediate future because of the costs and technical skills required. This paper