FTP Mirror Tracker: First steps towards URN. February 18, 2000 1 FTP Mirror Tracker: First Steps towards URN. Martin Hamilton (martin@wwwcache.ja.net), JANET Web Cache Service and Department of Computer Science, Loughborough University, United Kingdom Alexei Novikov (anovikov@heron.itep.ru), Institute of Theoretical and Experimental Physics, Moscow, Russia Abstract FTP Mirror Tracker 1 is a software package (written in Perl and C++) which enables transparent, user- controlled redirection to the nearest anonymous FTP mirror sites that are exact replicas of the original source. This redirection can be achieved by using a Web Cache server or by making HTTP requests to the FTP Mirror Tracker directly. The Mirror Tracker also has internal URN support and can be used as a URN resolver for FTP requests. Underlying the system is a MySQL database recording FTP mirror site details. In this report we explain how this database is constructed, and show how it may be used - directly by end users, and under the policy based control of Web Cache and mirror service administrators. 1 Introduction Although FTP traffic passing through the modern Internet only accounts for a small fraction of request transactions, its bandwidth utilization is significant. For example, FTP accounted for between 7% and 11% of the incoming traffic on the JANET 2 network’s links to the United States for every month in 1999. There is a long standing Internet convention that sites which are particularly large (e.g. operating system distributions) or popular (e.g. the Starr Report) will be widely replicated - usually by volunteer effort. The replication process, which typically takes place on a daily basis, is usually referred to as mirroring. Mir- roring software exists for replicating Web (HTTP), FTP and (by prior arrangement) arbitrary content - e.g. GNU wget [2], mirror [3] and rsync [4]. In recent years some formalization of this role has taken place, e.g. with the establishment of the UK Mirror Service [5] for JANET users, and the AARNet2 Mir- ror Archive [6] for Australian academic and research users. Localizing what might well be international (and chargeable) traffic to geographically and/or topologi- cally nearby mirror sites is a challenging task. Both the size of individual mirror sites and the number of mirror sites world-wide are increasing constantly. Given the lack of common standards for sharing infor- mation about mirrors and mirrored sites, we are forced to fall back on a brute force approach to learning about mirrors. There have been many manually produced lists of archive sites and mirror sites at various times in the past - usually maintained by an individual or a small group of people. The most well-known 1. <URL:http://squid.itep.ru/>, mirrored at <URL:http://wwwcache.ja.net/mirrors/MirrorTracker/> 2. JANET [1] is the UK’s Higher Education and Research Network