Abstract Bayesian Networks (BNs) are probabilistic models widely used for classification in several fields. Their ability to deal with missing values and to integrate da- ta and expert knowledge make them an appropriate technique for classification in glaucoma. In this study a set of supervised, unsupervised and semi-supervised BNs are explored and tested on independent longitu- dinal data. Glaucoma metrics based upon visual field test data and retinal image data are used to build BNs, which are then compared to several state-of-the-art classifiers in terms of performance. An anatomy based BN is also tested and compared to the others for pre- and post- diagnosis. Further, a set of com- bined networks is explored in an attempt to exploit specific features and abilities of different BNs. In general, BNs outperformed traditional metrics and other classifiers tested. Among BNs, the BN super- vised with the visual field metric and semi-supervised networks trained on control patient outperformed the others, obtaining about 70% sensitivity at 90% speci- ficity. Combining BNs seem to be the key for better classification performance in practice, however inves- tigating specific BN structures gives interesting in- sights about the data and further about the mechan- isms involved in the developing of the disease. 1 Introduction Glaucoma is a leading cause of blindness worldwide, but the mechanisms underlying its progress are still unclear. It is known that early medication is an effective way to slow the progression of the disease, thus an early diagnosis is desirable [Yanoff and Duker, 2003]. Diagnosis with typi- cal screening techniques is insufficient. These are based on clinical judgement based upon functional tests and optical imaging techniques. The high variability of the test results and the essential subjectivity of clinical judge- ments make this approach not as sensitive as is desired [Artes and Chauhan, 2005]. Furthermore, there is no gold standard available for the onset of glaucoma, but only several clinical metrics used for classification. Among these metrics, the most used are the Advanced Glaucoma Intervention Study score (AGIS) and the Moorfields Re- gression Analysis (MRA). The AGIS defect scoring sys- tem [Gaasterland et al., 1994] depends on the number and depth of clusters of adjacent depressed test sites in the total deviation printout of the threshold program visual field test STATPAC-2 analysis. The visual field test aims to measure the functional ability of an eye by exposing different stimulus to different locations of the patient vis- ual field, and it’s a routine test when screening for gla u- coma. The MRA score [Garway-Heath, 2005] provides an automatic statistical classification technique for optic disc measurements from Heidelberg Retinal Tomograph (HRT). It implies the processing of the raw HRT output with a linear combination of Age, Optic Disc Area (ODA) and Retinal Rim Area (RRA) into one single parameter, which is then compared to a cut-off value to obtain classi- fication. Since recent screening instruments like HRT and Humphrey Visual Field Analyser II (HFA II) still provide a great amount of data which is not exploited totally, sev- eral Machine Learning (ML) approaches have been ap- plied to glaucoma data in the last few years, with promis- ing results [Goldbaum et al., 2002]. Although several ML Classifiers have been used, further analysis of identifying glaucoma in clinics leads us to investigate an approach based on a model that fits better the issues and the charac- teristics of the disease and the relative datasets. Bayesian Network models [Pearl, 1988] seem most appropriate for this analysis, being able to integrate different datasets and model dependency relationships between variables, as well as allowing expert knowledge to be easily modelled as causal relations in the structure. Furthermore, BNs can handle missing values in the data, which can be exploited in order to train the network apart from the problematic typical metrics described above. The output of a BN is highly intuitive and the structure of the model can be eas- ily represented and compared to expert’s knowledge. Fi- nally, the performance of BN models with glaucoma data have already been partly explored, proving to be a promis- ing technique [Tucker et al., 2005]. In this paper, we will explore a set of supervised, unsupervised and semi- supervised BN classifiers based on VF and HRT data, using independent test data. The structures and the per- formances of supervised BN classifiers based on classic metrics for classifying glaucoma (MRA and AGIS) will be evaluated and compared to unsupervised and semi- supervised BN classifiers. Among the supervised BNs explored, an expertise-driven network will also be evalu- ated. BN classifiers will be explored in relation to pre- and post- diagnosis data and compared to traditional ML clas- sifiers in terms of performance. Investigations of Clinical Metrics and Anatomical Expertise with Bayesian Network Models for Classification in Early Glaucoma S. Ceccon 1 , D. Garway-Heath 2 , D. Crabb 3, A. Tucker 1 1 Department of Information Systems and Computing, Brunel University, London, UK 2 NIHR Biomedical Research Centre for Ophthalmology, Moorfields Eye Hospital NHS, London, UK 3 Department of Optometry and Visual Science, City University, London, UK stefano.ceccon@brunel.ac.uk