In a room so packed that people had to sit on the floor, three early adopters of data reporting explained the latest data trends and how to get data no one else has.
Sarah Cohen, Brant Houston, and Jennifer LaFleur are well-seasoned journalists who teach investigative reporting at universities across the US.
“I began doing data journalism, known as computer-assisted reporting, in 1986,” said Brant Houston, Professor and Knight Chair of Investigative Reporting at the University of Illinois, speaking at the Global Investigative Journalism Conference in Hamburg.
The three presenters’ careers intersected at various points and they all know each other from working at the Investigative Reporters and Editors (IRE).
“We all worked together at some point at IRE and you can tell because we bicker all the time,” said Sarah Cohen, a professor at Arizona State University.
As data journalism teachers, they are on top of the latest developments in the field. Today, the hottest words in data reporting are machine learning and AI, Cohen said.
In general, journalists use these tools to deal with large document dumps and unstructured data. Algorithms can cluster data by topic, help select newsworthy parts of the data and clean it up.
More advanced machine learning tools can match images and sound and transcribe recordings.
The Wall Street Journal, for example, used language analysis tool Quid to discover thousands of fake comments on federal agency websites.
For transcribing interviews, the panel suggested AI-powered tools Otter and Trint, which are both paid services but have free trials for up to 30 hours of recordings.
Sensors
Using sensors is a great way to get data when there is none publicly available.
“Sensors, especially in the US, are being used a lot,” said Houston.
In 2018, the Philadelphia Enquirer used sensors to see how much lead there was in school water and air. They brought their own testing machinery to the schools and enlisted teachers to help them gather the data.
Investigative center Reveal and the Center for Public Integrity measured exhaust fumes near schools and calculated the impact this might have on children’s health.
Houston is currently working on a project on pesticide drift around schools. He says using the right tools and methodology is key to getting good results.
“Everyone is thinking of using sensors,” Houston said. “You can get very cheap ones, I just highly recommend you research what types of sensor to use and figure out the right location.”
What’s so powerful about using sensors is that you are generating your own data.
“No one else will have this story,” said Cohen.
Because working with sensors can be complex, the panel suggested you either enlist a company to train or help you, or try to get a university involved.
Space Journalism
Journalists can also turn to space journalism – a concept that made some in the audience chuckle – to create data when there is none available.
Rather than astronautical reporting, space journalism refers to using satellites to gather data.
“Especially if you are in areas where you can’t get actual data, satellite data can be really powerful,” said Jennifer LaFleur, who teaches at the American University in Washington, DC.
With climate change and “wild weather patterns,” the importance of using satellite imagery for telling stories will only increase, she added.
Space journalism can be used to show pollution patters, isolated communities, illicit development, drought, inequality, human migration routes, and gentrification patters, for example.
LaFleur highlighted a Reuters story that used satellite images to illustrate the changing nature of the camps that house hundreds of thousands of Rohingya refugees in Bangladesh.
Satellites use reflection measurements to create visualizations of parts of the light spectrum that the human eye cannot perceive.
“Processing the data can be a little bit tricky,” LaFleur said. “But the tools to do this have gotten much better.”
A few free resources for satellite data are:
One of the earliest investigations that used satellite data was a 2006 series by the St. Petersburg Times, a Florida newspaper suspended in 2014, that showed how wetlands in the US were disappearing and federal rules meant to protect them were rarely enforced. The series is no longer online but has been turned into a book called Paving Paradise.
Forensic Investigations
Some of the latest innovative data projects mash up open-source video, text, audio, and social media to recreate events.
“The real strength of a lot of these investigations are timelines,” Houston said. “They are timelines that reveal people were lying about what has happened.”
Two reporters that have used many types of data to create such timelines are Bellingcat’s Henk van Ess, who tracked a missile launcher, and New York Times reporter Malachy Browne, who recreated the disappearance of journalist Jamal Khashoggi.
But the latest tools available go beyond collecting and compiling data.
LaFleur highlighted a new tool that facilitates collaboration between organizations and reporters such as the International Consortium of Investigative Journalists’ Implant Files and the Organized Crime and Corruption Reporting Project’s Troika Laundromat.
ProPublica’s Collaborate tool helps newsrooms organize and cooperate big data projects by allowing them to “assign data points to individuals or newsrooms; track progress and keep notes around each data point; sort, filter and export the data; and automatically redact sensitive information,” according to its website.
Perhaps most importantly, the increasing availability of data tools means that more newsrooms can use data to tell new kinds of stories.
Argentinian outlet La Nacion developed tools for data sharing in their newsrooms.
“I went down there to speak to some journalists maybe 10 years ago and what I heard from them was ‘we can never do this, there is no data,’” LaFleur said.
“But what they have done through scraping and building their own databases [they have become] international leaders in data journalism and it is an all-woman team too, which is pretty cool,” she said.
La Nacion’s data sharing tool is called VozData.
Having witnessed data journalism develop from floppy disks in the 1980s to online collaborative tools such as VozData, Houston says the same principles from 40 years ago are still behind the latest data trends.
“From the beginning we have always been looking for patterns, trends, and outliers in data,” he said. “We continue to use software, old and new, that allows us to sift through, organize, and visualize those things more quickly.”
The most impressive innovations come from fusing different tools, Houston said.
“For me, it’s the mash-up of the tools rather than a specific tool that sometimes surprises me,” he said. “The tools have often been developed and used by other professions before we get to them.”
Jelter Meers is a researcher and reporter at the Organized Crime and Corruption Reporting Project and a coordinator and editor at the Investigative Journalism Education Consortium. He helped organize the data and academic speaker tracks at #GIJC2017 and #GIJC2019.