A PageRank based predictive model for the estimation of the archaeological potential of an urban area Nevio Dubbini, Gabriele Gattiglia Abstract—We present the analysis of multi-faceted, GIS man- aged data for determining the archaeological potential, i.e. a measure of the possibility that a more or less significant archae- ological stratification is preserved. We used a sizable number of datasets, in order to consider the problem of estimation of archaeological potential in all of its aspects: archaeological data, building archaeological data, historical data, toponymic data, geomorphological data. As the identification of relations among finds is a key issue for the data mining in archaeological interpretation process, we applied a modified version of the PageRank model, because the criteria for assigning importance to web pages by search engines are similar and based on relations, also. The procedure included a categorization archaeological data, the assignment of initial values of potential to the available data through an automatic procedure, the creation of geomorpho- logical facies maps, the definition of functional areas (i.e. the levels of spatial and functional organization: urban, suburban and rural areas), and the application of the PageRank based algorithm. The model has been applied on the urban area of Pisa, and tested through the data of 14 new cores. The map of archaeological potential consists of the composition of the 7 layers, one for each archaeological period under considera- tion: Protohistory, Etruscan period, Roman period, Late Roman period, Early Medieval period, Late Medieval period, Modern Age, Contemporary Age. The results, including the archaeological potential map, are to be considered as the first steps towards an automatic, formally definable, and repeatable, approach to the computation of archaeological potential. Keywordspredictive modelling, archaeological potential, PageRank, archaeological GIS, geomorphology. I. I NTRODUCTION This paper studies the problem of computation of archae- ological potential, the assumptions made to solve it, the mathematical model used, the software implementation, and the test of the algorithm in the case study of the urban area of Pisa. We based the mathematical model on PageRank, because there is an analogy between the criteria used for attributing archaeological potential and the criteria used for assigning importance to web pages in search engine algorithms. The key issue of the computation of archaeological potential, from an abstract viewpoint, is the identification of relations among finds: the presence of a particular find near another could strengthen or weaken the probability that they will form a more complex structure, and so strengthen or weaken the archaeological potential of the area. This is exactly the criterion upon which page ranking algorithms are based, whereby each web page attributes importance to the web pages it points to N. Dubbini is with University of Pisa, Mathematics department, Pisa, Italy, nevio.dubbini@gmail.com. G. Gattiglia is with University of Pisa, Archaeology Department, Pisa, Italy. (via a link) and receives importance from the web pages it receives a link from. The reader can refer to [4] for further explanations about the choice of the mathematical model, and to [8] for a general mathematical introduction to PageRank models. In the following we will consider all the archaeological data as categorised, having assigned each find to a category in order to characterize its salient features, to effectively implement the algorithm, and to make the results general enough to be applied also in different contexts (pp. 89-99, [2]). II. DEFINITION OF ARCHAEOLOGICAL POTENTIAL The archaeological potential of an area represents the proba- bility that a more or less significant archaeological stratification is preserved. It is computed by analysing a series of historical, archaeological and paleo-environmental data, with a degree of approximation that may vary according to the quantity and quality of the data provided. The archaeological potential of an area is independent of any other following intervention that is carried out, which must be regarded as a contingent risk factor. The process of defining overall urban archaeological potential consists in drawing up a series of predictive maps relative to historical periods. The general criterion was to reconstruct stratigraphic intervals, and integrate this information with both archaeological and geomorphological data: geological maps define stratigraphic units and sedimentary bodies, geomorpho- logical and paleogeographical maps show relief forms and define the geomorphic processes responsible for their genesis, in addition to recent modifications. Generally speaking, each morphological unit (or morphotype) can be more or less suitable for settlements. Subsequently the diachronic evolution of the forms has been characterised. In archaeological terms, the following parameters were taken into consideration for the predictive definition of the city throughout its historical periods: typology of finds, inferred on the basis of the interpre- tation of the archaeological records [7]; quality and quantity of the archaeographic data; spatial and typological relations among the finds, which allow identification in probabilistic terms of the presence of further finds in areas that have not been archaeologically investigated; expert judgment; land use, including traces that are not strictly connected to constructions or settlements, such as agricultural and/or farming practices; historical data from written sources and maps. Finally, we indentified the following overall parameters that best determine urban archaeological potential: type of settlement, i.e. the presence of settlement structures and their different typology; density of settlement; multi-layering of deposits; removable or non-removable nature of the archaeological deposit; degree of preservation of the deposit, calculated according to the presence of anthropic and natural removals [1]. 978-1-4799-3169-9/13/$31.00 ©2013 IEEE 571