Dead Timestamp Identification in Stampede Nissim Harel nissim@cc.gatech.edu Hasnain A. Mandviwala mandvi@cc.gatech.edu Kathleen Knobe kath.knobe@hp.com Umakishore Ramachandran rama@cc.gatech.edu Abstract Stampede is a parallel programming system to support computationally demanding applications including interac- tive vision, speech and multimedia collaboration. The sys- tem alleviates concerns such as communication, synchro- nization, and buffer management in programming such real- time stream-oriented applications. Threads are loosely con- nected by channels which hold streams of items, each iden- tified by a timestamp. There are two performance con- cerns when programming with Stampede. The first is space, namely, ensuring that memory is not wasted on items bear- ing a timestamp that is not fully processed. The second is time, namely, ensuring that processing resource is not wasted on a timestamp that is not fully processed. In this pa- per we introduce a single unifying framework, dead times- tamp identification, that addresses both the space and time concerns simultaneously. Dead timestamps on a channel represent garbage. Dead timestamps at a thread represent computations that need not be performed. This framework has been implemented in the Stampede system. Experimen- tal results showing the space advantage of this framework are presented. Using a color-based people tracker appli- cation, we show that the space advantage can be signifi- cant (up to 40%) compared to the previous GC techniques in Stampede. 1 Introduction There is a class of emerging stream-oriented applications spanning interactive vision, speech, and multimedia collab- oration that are computationally demanding and dynamic in their communication characteristics. Such applications are The work has been funded in part by an NSF ITR grant CCR-01- 21638, NSF grant CCR-99-72216, Compaq Cambridge Research Lab, the Yamacraw project of the State of Georgia, and the Georgia Tech Broadband Institute. The equipment used in the experimental studies is funded in part by an NSF Research Infrastructure award EIA-99-72872, and Intel Corp. College of Computing, Georgia Institute of Technology HP Cambridge Research Lab good candidates for the scalable parallelism exhibited by clusters of SMPs. A major problem in implementing these kinds of appli- cation in parallel is “buffer management”, as (1) threads may not access their input in a strict stream-like manner, (2) newly created threads may have to re-analyze earlier data, (3) datasets from different sources need to be corre- lated temporally, and (4) not all the data that is produced at lower levels of the processing pipeline will necessarily be used at the higher levels, since computations performed be- come more sophisticated as we move through the pipeline. These features imply two requirements. First, data items must be meaningfully associated with time, and second, there must be a discipline of time that allows systematic reclamation of storage for data items (garbage collection). Stampede is a parallel programming system designed and developed to simplify programming of such applica- tions. The programming model of Stampede is simple and intuitive. A Stampede program consists of a dynamic col- lection of threads communicating timestamped data items through channels. Threads can be created to run anywhere in the cluster. Channels can be created anywhere in the cluster and have cluster-wide unique names. Threads can connect to these channels for doing input/output via get/put operations. A timestamp value is used as a name for a data item that a thread puts into or gets from a channel. The run- time system of Stampede takes care of the synchronization and communication inherent in these operations, as well as managing managing the storage for items put into or gotten from the channels. 1.1 Live and dead timestamps Every item on a channel is uniquely indexed by a times- tamp. Typically a thread will get an item with a particu- lar timestamp from an input connection, perform some pro- cessing 1 on the data in the item, and then put an item with that same timestamp onto one of its output connections. 1 We use “processing a timestamp”, “processing an item”, and “process- ing a timestamped item” interchangeably to mean the same thing.