Preserving Data Journalism: A Systematic Literature Review
Bahareh Heravi
a
, Kathryn Cassidy
b
, Edie Davis
c
and Natalie Harrower
d
a
School of Information & Communication Studies, University College Dublin, Dublin, Ireland;
b
Digital
Repository of Ireland, Trinity College Dublin, Dublin, Ireland;
c
The Library, Trinity College Dublin, Dublin,
Ireland;
d
Digital Repository of Ireland, Royal Irish Academy, Dublin, Ireland
ABSTRACT
News organisations have longstanding practices for archiving and
preserving their content. The emerging practice of data journalism
has led to the creation of complex new outputs, including dynamic
data visualisations that rely on distributed digital infrastructures.
Traditional news archiving does not yet have systems in place for
preserving these outputs, which means that we risk losing this
crucial part of reporting and news history. Following a systematic
approach to studying the literature in this area, this paper provides
a set of recommendations to address lacunae in the literature. This
paper contributes to the field by (1) providing a systematic study of
the literature in the fields, (2) providing a set of recommendations
for the adoption of long-term preservation of dynamic data
visualisations as part of the news publication workflow, and (3)
identifying concrete actions that data journalists can take
immediately to ensure that these visualisations are not lost.
KEYWORDS
data journalism; data-driven
journalism; data
visualisation; data
visualization; digital
preservation; digital
archiving; software
preservation
Introduction
“Journalism is the first rough draft of history” (widely attributed to Philip Graham, 1963),
and the archives of news organisations are an indispensable source for research into
global history. Traditional journalistic outputs are usually published in text and audiovi-
sual format, with news organisations having a longstanding history of archiving and pre-
serving these outputs on various media, for example, paper, tape, or hard disc drives,
depending on the historical time period and the original format of the output. Similarly,
memory institutions such as national libraries and archives generally hold large and long-
standing newspaper archives.
In the past decade journalism has become more “quantitatively oriented” (Coddington
2015, 332), and increasingly incorporates “data journalism”– a practice which uses datasets,
computational tools and algorithms to create news stories (Heravi and Lorenz 2020). The
output of this type of journalism includes traditional text and audiovisual formats, but
also includes data visualisations and/or news applications. These visualisations communi-
cate key aspects of the story, and without them, the story is either incomplete or entirely
© 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License
(http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any
medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way.
CONTACT Bahareh Heravi Bahareh.Heravi@ucd.ie
Supplemental data for this article can be accessed at https://doi.org/10.1080/17512786.2021.1903972.
JOURNALISM PRACTICE
https://doi.org/10.1080/17512786.2021.1903972