Possibilistic Estimation of Distributions to Leverage Sparse Data in Machine Learning Andrea G. B. Tettamanzi 1(B ) , David Emsellem 2 , C´ elia da Costa Pereira 3 , Alessandro Venerandi 4 , and Giovanni Fusco 4 1 Universit´ e Cˆote d’Azur, CNRS, Inria, I3S, Sophia Antipolis, France andrea.tettamanzi@univ-cotedazur.fr 2 Kinaxia SA, Sophia Antipolis, France david.emsellem@kcitylabs.fr 3 Universit´ e Cˆote d’Azur, CNRS, I3S, Sophia Antipolis, France celia.da-costa-pereira@univ-cotedazur.fr 4 Universit´ e Cˆote d’Azur, CNRS, ESPACE, Nice, France {alessandro.venerandi,giovanni.fusco}@univ-cotedazur.fr Abstract. Prompted by an application in the area of human geography using machine learning to study housing market valuation based on the urban form, we propose a method based on possibility theory to deal with sparse data, which can be combined with any machine learning method to approach weakly supervised learning problems. More specifically, the solution we propose constructs a possibilistic loss function to account for an uncertain supervisory signal. Although the proposal is illustrated on a specific application, its basic principles are general. The proposed method is then empirically validated on real-world data. Keywords: Possibility theory · Machine learning · Weakly supervised learning 1 Introduction Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs [18]. Each example consists of an input record, which collects the values of a number of input variables, and the associated value of the output variable (also called the supervisory signal). The learnt function can then be used to “predict” the value of the output variable for new unlabeled input records, whose output value is not known. In many real-world problems, obtaining a fully labeled dataset is expen- sive, difficult, or outright impossible. An entire subfield of machine learning, called weakly (or semi-) supervised learning has thus emerged, which studies how datasets where the supervisory signal is not available or completely known c Springer Nature Switzerland AG 2020 M.-J. Lesot et al. (Eds.): IPMU 2020, CCIS 1237, pp. 431–444, 2020. https://doi.org/10.1007/978-3-030-50146-4_32