Some Branches May Bear Rotten Fruits: Diversity Browsing VP-Trees Daniel Jasbick 1 , Lucio Santos 2 , Daniel de Oliveira 1 , and Marcos Bedo 3(B) 1 Institute of Computing, UFF, Niter´oi, RJ, Brazil danieljasbick@id.uff.br , danielcmo@ic.uff.br 2 Federal Institute of North of Minas Gerais, Montes Claros, MG, Brazil lucio.santos@ifnmg.edu.br 3 Fluminense Northwest Institute, UFF, St. A. P´adua, RJ, Brazil marcosbedo@id.uff.br Abstract. Diversified similarity searching embeds result diversification straight into the query procedure, which boosts the computational per- formance by orders of magnitude. While metric indexes have a hidden potential for perfecting such procedures, the construction of a suitable, fast, and incremental solution for diversified similarity searching is still an open issue. This study presents a novel index-and-search algorithm, coined diversity browsing , that combines an optimized implementation of the vantage-point tree (VP-Tree) index with the distance browsing search strategy and coverage-based query criteria. Our proposal maps data ele- ments into VP-Tree nodes, which are incrementally evaluated for solving diversified neighborhood searches. Such an evaluation is based not only on the distance between the query and candidate objects but also on distances from the candidate to data elements (called influencers ) in the partial search result. Accordingly, we take advantage of those distance- based relationships for pruning VP-Tree branches that are themselves influenced by elements in the result set. As a result, diversity browsing benefits from data indexing for (i) eliminating nodes without valid can- didate elements, and (ii) examining the minimum number of partitions regarding the query element. Experiments with real-world datasets show our approach outperformed competitors GMC and GNE by at least 4.91 orders of magnitude, as well as baseline algorithm BRID k in at least 87.51% regarding elapsed query time. Keywords: Similarity searching · Result diversification · Metric spaces 1 Introduction Similarity searching is a widely employed paradigm supporting modern com- putational applications that rely on data that are “alike” but not “equal”, M. Bedo—This study was financed in part by the Coordena¸c˜ ao de Aperfei¸coamento de Pessoal de N´ıvel Superior – Brasil (CAPES) - Finance Code 001 and Research Support Foundation of Rio de Janeiro State - G. E-26/010.101237/2018. c Springer Nature Switzerland AG 2020 S. Satoh et al. (Eds.): SISAP 2020, LNCS 12440, pp. 140–154, 2020. https://doi.org/10.1007/978-3-030-60936-8_11