Just Another Day on Twitter: A Complete 24 Hours of Twitter Data urgen Pfeffer 1 , Daniel Matter 1 , Kokil Jaidka 2 , Onur Varol 3 , Afra Mashhadi 4 , Jana Lasser 5, 15 , Dennis Assenmacher 6 , Siqi Wu 7 , Diyi Yang 8 , Cornelia Brantner 9 , Daniel M. Romero 7 , Jahna Otterbacher 10 , Carsten Schwemmer 11 , Kenneth Joseph 12 , David Garcia 13 , Fred Morstatter 14 1 School of Social Science and Technology, Technical University of Munich, Germany 2 Centre for Trusted Internet and Community, National University of Singapore, Singapore 3 Computer Science Department, Sabanci University, Turkey 4 School of Science, Technology, Engineering Mathematics, University of Washington (Bothell), USA 5 Faculty of Computer Science and Biomedical Engineering, Graz University of Technology, Austria 6 GESIS – Leibniz Institute for the Social Sciences, Germany 7 School of Information, University of Michigan, USA 8 Computer Science Department, Stanford University, USA 9 Department of Geography, Media and Communication, Karlstad University, Sweden 10 Faculty of Pure and Applied Sciences, Open University of Cyprus & CYENS CoE, Cyprus 11 Department of Sociology, Ludwig Maximilian University of Munich, Germany 12 Department of Computer Science and Engineering, University at Buffalo, USA 13 Department of Politics and Public Administration, University of Konstanz, Germany 14 Information Sciences Institute, University of Southern California, USA 15 Complexity Science Hub Vienna, Austria Abstract At the end of October 2022, Elon Musk concluded his acqui- sition of Twitter. In the weeks and months before that, sev- eral questions were publicly discussed that were not only of interest to the platform’s future buyers, but also of high rele- vance to the Computational Social Science research commu- nity. For example, how many active users does the platform have? What percentage of accounts on the site are bots? And, what are the dominating topics and sub-topical spheres on the platform? In a globally coordinated effort of 80 scholars to shed light on these questions, and to offer a dataset that will equip other researchers to do the same, we have collected all 375 million tweets published within a 24-hour time period starting on September 21, 2022. To the best of our knowl- edge, this is the first complete 24-hour Twitter dataset that is available for the research community. With it, the present work aims to accomplish two goals. First, we seek to an- swer the aforementioned questions and provide descriptive metrics about Twitter that can serve as references for other researchers. Second, we create a baseline dataset for future research that can be used to study the potential impact of the platform’s ownership change. Introduction On March 21, 2006, Twitter’s first CEO Jack Dorsey sent the first message on the platform. In the subsequent 16 years, close to 3 trillion tweets have been sent. 1 Roughly two-thirds of these have been either removed from the platform because the senders deleted them or because the accounts (and all Copyright © 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. 1 While we do not have an official source for this number, it rep- resents an educated guess from a collaboration of dozens of schol- ars of Twitter. their tweets) have been banned from the platform, have been made private by the users, or are otherwise inaccessible via the historic search with the v2 API endpoints. By utilizing Twitter’s count/all API and the approaches described in this article, we estimate that about 900 billion public tweets were on the platform when Elon Musk acquired Twitter in Octo- ber 2022 for $44B 2 . Besides its possible economic value, Twitter has been instrumental in studying human behavior with social me- dia data and the entire field of Computational Social Sci- ence (CSS) has heavily relied on data from Twitter. At the AAAI International Conference on Web and Social Media (ICWSM), in the past two years alone (2021-2022), over 30 scientific papers analyzed a subset of Twitter for a wide range of topics ranging from public and mental health anal- yses to politics and partisanship. Indeed, since its emer- gence, Twitter has been described as a digital socioscope (i.e., social telescope) by researchers in fields of social sci- ence (Mejova, Weber, and Macy 2015), “a massive antenna for social science that makes visible both the very large (e.g., global patterns of communications) and the very small (e.g., hourly changes in emotions)”. Beyond CSS, there is increas- ing use of Twitter data for training large pre-trained language models in the field of natural language processing and ma- chine learning, such as Bernice (DeLucia et al. 2022), where 2.5 billion tweets are used to develop representations for Twitter-specific languages, and TwHIN-BERT (Zhang et al. 2022) that leverages 7 billion tweets covering over 100 dis- tinct languages to model short, noisy, and user-generated text. Although Twitter data has fostered interdisciplinary re- 2 https://www.nytimes.com/2022/10/27/technology/elon-musk- twitter-deal-complete.html Proceedings of the Seventeenth International AAAI Conference on Web and Social Media (ICWSM 2023) 1073