Online Bipartite Perfect Matching With Augmentations Kamalika Chaudhuri * , Constantinos Daskalakis † , Robert D. Kleinberg ‡ , and Henry Lin † * Information Theory and Applications Center, U.C. San Diego Email: kamalika@soe.ucsd.edu † Division of Computer Science, U.C. Berkeley E-mail: costis@cs.berkeley.edu, henrylin@eecs.berkeley.edu ‡ Computer Science Department, Cornell University E-mail: rdk@cs.cornell.edu Abstract—In this paper, we study an online bipartite matching problem, motivated by applications in wireless communication, content delivery, and job scheduling. In our problem, we have a bipartite graph G between n clients and n servers, which represents the servers to which each client can connect. Although the edges of G are unknown at the start, we learn the graph over time, as each client arrives and requests to be matched to a server. As each client arrives, she reveals the servers to which she can connect, and the goal of the algorithm is to maintain a matching between the clients who have arrived and the servers. Assuming that G has a perfect matching which allows all clients to be matched to servers, the goal of the online algorithm is to minimize the switching cost, the total number of times a client needs to switch servers in order to maintain a matching at all times. Although there are no known algorithms which are guaranteed to yield switching cost better than the trivial O(n 2 ) in the worst case, we show that the switching cost can be much lower in three natural settings. In our first result, we show that for any arbitrary graph G with a perfect matching, if the clients arrive in random order, then the total switching cost is only O(n log n) with high probability. This bound is tight, as we show an example where the switching cost is Ω(n log n) in expectation. In our second result, we show that if each client has edges to Θ(log n) uniformly random servers, then the total switching cost is even better; in this case, it is only O(n) with high probability, and we also have a lower bound of Ω(n/ log n). In terms of the number of edges needed for each client, our result is tight, since Ω(log n) edges are needed to guarantee a perfect matching in G with high probability. In our last result, we derive the first algorithm known to yield total cost O(n log n), given that the underlying graph G is a forest. This is the first result known to match the existing lower bound for forests, which shows that any online algorithm must have switching cost Ω(n log n), even when G is restricted to be a forest. I. I NTRODUCTION In this paper, we study an online bipartite matching problem, which models a scenario in which clients arrive over time and request permanent service from a set of given servers. As each client arrives, she announces a set of feasible servers capable of servicing her request, and our goal is to provide service to each client persistently by maintaining a matching at all times between clients who have arrived and servers capable of servicing their requests. We would like to assign clients to servers permanently without ever having to reassign clients to different servers, but when a new client arrives we may be forced to reassign existing clients to alternative servers to ensure that all clients can receive service. As it is often more important to provide service to all clients, the goal of our algorithm will be to maintain a matching always between arrived clients and allowed servers, while minimizing the switching cost, the total number of times that clients are reassigned to different servers. Our online bipartite matching problem has a wide variety of applications spanning diverse areas, including streaming con- tent delivery, web hosting, remote data storage, job scheduling, and hashing. We describe a few applications below, which can be modeled as an instance of the online bipartite matching problem we described above. In the following examples, we always refer to the entities requesting service as clients and the entities providing service as servers for consistency. In examples where it is not clear, we mark in parenthesis which entities are clients and which entities are servers. • Streaming Content Delivery, Web Hosting, and Re- mote Data Storage We have a set of servers capable of streaming content online, hosting web pages, or storing data remotely. A sequence of clients arrives requesting to have their content streamed online, their web pages hosted online, or their data stored online. Due to locality, security, cost, routing policy, or other reasons, the stream- ing content, web page, or data from each client can only be hosted at a subset of server locations. • Job Scheduling We have a set of servers with differing capabilities available to process job requests from persis- tent sources - jobs that need to be processed over a long or indefinite period of time (e.g. protein folding, genomic research, SETI@HOME). A sequence of persistent job requests (clients) arrive and reveal a subset of servers capable of servicing their request. • Hashing We have locations in a hash table (servers) available to store data objects (clients), and a set of hash functions. Data objects arrive over time and can be assigned to a location in the hash table, if one of the hash functions maps the data object to that location. Note that in all the examples above, it is reasonable to assume that clients can be reassigned to different servers, but at a cost. For instance, in the streaming content example, clients