Languages and Frameworks for Big Data Analysis Marco Aldinucci, Maurizio Drocco, Claudia Misale, and Guy Tremblay Overview Boosted by Big Data popularity, new languages and frameworks for data analytics are appearing at an increasing pace. Each of them introduces its own concepts and terminology and advocates a (real or alleged) superiority in terms of performances or expressiveness against predecessors. In this hype, for a user approaching Big Data analytics (even an educated computer scientist), it might be difficult to have a clear picture of the programming model underneath these tools and the expressiveness they provide to solve some user defined problem. To provide some order in the world of Big Data processing, a toolkit of models to identify their common features is introduced, starting from data layout. Data-processing applications are divided into batch vs. stream processing. Batch programs process one or more finite datasets to produce a resulting finite output dataset, whereas stream programs process possibly unbounded sequences of data, called streams, doing so in an incremental manner. Operations over streams may also have to respect a total data ordering—for instance, to represent time ordering. Marco Aldinucci Computer Science Department, University of Torino, Italy e-mail: aldinuc@di.unito.it Maurizio Drocco Computer Science Department, University of Torino, Italy e-mail: drocco@di.unito.it Claudia Misale Cognitive and Cloud, Data-Centric Solutions, IBM T.J. Watson Research Center. Yorktown Heights, New York, USA e-mail: c.misale@ibm.com Guy Tremblay D´ epartement d’Informatique, Universit´ e du Qu´ ebec ` a Montr´ eal, Montr´ eal (QC), Canada e-mail: tremblay.guy@uqam.ca 1 Pre-print of “M. Aldinucci, M. Drocco, M. Claudia, and G. Tremblay, “Encyclopedia of Big Data Technologies,,” , S. Sakr and A. Zomaya, Eds., Springer, 2019. ISBN 978-3-319-77524-1