Challenges of Navigational Queries: Finding Best Paths in Graphs Louiqa Raschid 1 , Mar´ ıa-Esther Vidal 2 , Yao Wu 1 , Marelis Cardenas 2 , and Natalia Marquez 2 1 University of Maryland {louiqa,yaowu}@umiacs.umd.edu 2 UniversidadSim´onBol´ ıvar {mvidal,mcardenas,nmarquez}@ldc.usb.ve Abstract. Life science sources are characterized by a complex graph of overlapping sources, and multiple alternate links between sources. A (navigational) query may be answered by traversing multiple alternate paths between an origin and target source. Paths may be character- ized by several metrics, including the cardinality of objects of the target source(TOC), the cost of query evaluation of a plan for the path, and the user’s preference for specific paths. Our challenge is finding the best paths among the set of all solutions, AllPaths, that meet some user specified ranking criteria. If the user ranking criteria is strict, then the problem is to find the Top K paths. If the user wants a trade-off of several metrics, then the problem is to find the Skyline paths that are not dominated by other paths. NSearch is a naive solution. BFSrchOpt is a heuristic best- first search strategy. It uses a metric to rank partial solutions (subpaths) and (local) metrics to guide graph traversal, and produces BFPaths. We compare the precision and recall of BFPaths compared to the Top K% or Skyline of AllPaths. We study the impact of graph properties on the behavior of BFSrchOpt. BFSrchOpt can be orders of magnitude faster than NSearch. 1 Introduction During the past few years, the number of biomolecular Web accessible sources has increased rapidly. For a particular molecular concept, e.g., gene or protein, there may be several sources, each of which may have several links to other Web sources. To integrate data across these sources, users traverse alternate links and paths through sources. Given a navigational query, the space of possible paths can be exponential in the number of sources and links that are relevant to the query [11]. Further, once paths have been identified, the user still has to explore all the results in the target sources, e.g., all the publications linked to some protein. Since this is a time intensive exercise, it is important to support a user and try to identify the best paths to answer a navigational query. Suppose we consider a specific path. The number of target objects in the target source reached along the path is one metric characterizing the path. Sci- entists also have their own preferences for a specific source or link. For example,