1 A Survey on Internet Traffic Archival Systems Zhen Chen, Lin-yun Ruan, Jun Li, Shuai Ding, Fu-ye Han, Hang Li, Wen-yu Dong 1)(Research Institute of Information Technology, Tsinghua University, Beijing, 100084) 2)(Tsinghua National Laboratory for Information Science and Technology, Beijing, 100084) Abstract- With the popularity of Internet applications and widespread use of mobile Internet, the Internet traffic maintains a rapid growth over the past decades. Internet traffic archival system (ITAS) for packets or flow records becomes more and more widely used in network monitor, network troubleshooting, user behavior and experience analysis etc. In this paper, we survey the design and implementation of several typical traffic archival systems. We analyze and compare the architectures and key technologies backing up Internet traffic archival system, and summarize the key technologies which include packet/flow capturing, packet/flow storage and bitmap index encoding algorithm, and dive into the packet/flow capturing technologies. Then, we propose the design and implementation of TiFaflow traffic archival system. Finally, we summarize and discuss the future direction of Internet traffic archival systems. Keywords: Internet Traffic; Big Data; Traffic archival; Network Security; Traffic Acquisition; Packet Capturing; Columnar Storage; Bitmap Index; Bitmap Encoding, algorithm. 1. Introduction 1.1 Big data Massive, high-speed, dynamic data appear in major areas of the application. In the field of Internet search, Google improves the userssearch experiences by personalized search technology [1]. In this technique, the user's search behavior information, including Web access path and each page access time, records to a huge database in real time. As search engine will continue to query these information which processes 4200 request per second. When users search for some information. In scientific experiments, the experiments of the European Large Hadron Collider (LHC) produced 15PB data which rated up to 1.5GBps [2]. Network monitor, communication services, sensor networks, and financial services generate unlimited, continuous, rapid, real-time streaming data. Streaming data is characterized by massive, continuous and real-time, it is not feasible that using single linear scanning for random access and storing the stream data locality completely, as the on-line analysis process requires fast, and real-time analysis system resources are limited. Once the flow of data management in emerging overload, it will store, query, handling a significant impact. Once the data flow emerges overload, it will have a great influence on storage, query and analysis process. Traditional database storage is a collection of static relational data records for persistent data storage and complex queries. Query operations are more frequently than insert, update, delete and other operations, and the results