TurboSync: Clock Synchronization for Shared Media Networks via Principal Component Analysis with Missing Data Ryad Ben-El-Kezadri *†‡ , Giovanni Pau * , and Thomas Claveirole * Computer Science Department - University of California, Los Angeles, CA 90095, US CRISTAL, Universit´ e de la Manouba, Campus Manouba, 2010 - Tunisia LIP6/CNRS - UPMC Univ Paris 06 e-mail: ryad@nrl.cs.ucla.edu, gpau@cs.ucla.edu, thomas.claveirole@lip6.fr Abstract—Clock synchronization in shared media networks is particularly challenging because the operating conditions are dynamic and the resources limited. This paper presents Tur- boSync, an accurate and bandwidth efficient synchronization scheme. Unlike traditional solutions that synchronize pairs of nodes, TurboSync, is able to synchronize entire node clusters. TurboSync relies on principal component analysis with missing data. Packets are broadcasted on the medium and their capture times at each node side are used to compute the clock conversion parameters. To have a complete and usable set of capture times for each packet, our idea is to fill out the missing packet timestamps at the transmitters’ side using an inference mechanism. TurboSync synchronizes all the clocks in the cluster at a time which leads to a coherent clock conversion system between the nodes. Our performance results show better accuracy compared to the RBS protocol. Index Terms—Clock, Synchronization, Principal Component Analysis, Missing Data, Broadcast Media I. I NTRODUCTION Synchronization in distributed networks has been studied for a long time. But as networks are going faster, mobile and wireless, new solutions have to be proposed to support the tight timing constraints required by the new wireless radios [1] and monitoring tools. Without proper time management, the data collected by the network also loses part of their context. Hence, algorithms that perform well on short timescale are required. The ideas behind TurboSync are motivated by two simple observations. Existing synchronization schemes rely on a spe- cific broadcaster that transmits identifiable reference frames on the network. These packets are captured by the neighboring nodes and their id and capture time are sent back to the broadcaster. The broadcaster uses this information to compute the conversion parameters between each pair of nodes’ clock. The conversion parameters are then distributed on the network so that each application can translate a time reference from any other node base to its own time base. The first idea of TurboSync comes from the fact that it exists in the network a lot of frames that are uniquely identifiable and can be used instead of dedicated reference frames. Exploiting these frames makes the synchronization algorithm more bandwidth efficient because no more dedicated frames are needed. Most of broadcast control packets are reusable because they include a unique sequence number. However control traffic is generated at all network ends, and so the synchronization scheme will have to deal with a situation where the role of the broadcaster is distributed on every node, which is a fundamental change in the algorithm design. The second idea of TurboSync is to synchronize multiple nodes at a time. The time observations captured at each node can be laid in a matrix with rows representing the packet’s id and columns representing nodes. A synchronization scheme needs a complete data set to compute the conversion parameters between the nodes. The problem is that the capture time is not available at the transmitter side because the packet capture system does not record the true time when the packets are sent in the air. As a result, every transmission observation in the matrix is flagged as a missing data. Traditional schemes only use one transmitter to gather all the missing observations in one column and operate pairwise on the remaining columns to compute the conversion parameters between each pair of nodes. This is in order to cope with simple and complete matrix structures. TurboSync is able to operate on matrices with missing data scattered on each column. Surprisingly, it has lower computational complexity than classical schemes and it operates on all columns at a time, which ultimately provides a coherent conversion parameter system between the nodes’ clocks. Thanks to such a conversion system, each time reference has a unique value in each node base, whatever the different time conversions it experiences. Contrary to existing algorithms that operate on clock pairs via 2-D line fitting to compensate for offset and skew, our solution can process N clocks simultaneously through N-D line fitting. It uses the same inputs as the well-known RBS scheme [1] and does not introduce additional assumptions. But it makes better use of reference frames diversity and corrects offset and skew jointly for all nodes. The synchronization algorithms of RBS and TurboSync operate on single broadcast domains (clusters). To function as a cluster, the nodes must be physically close enough so that each one hears each other. In such a configuration, all nodes can decode the broadcast packets sent in the cluster because broadcast frames are gen- erally transmitted with robust modulation. TurboSync is both compatible with off-line and on-line clock synchronization