The Data Slipstream Project
Managing terrabytes of data from multiple sources can be costly in a number of ways, but with the pressing need to use data to help combat climate change, the scene is set to find a solution.
The Data Slipstream project aims to streamline the process for geoscientists handling satellite imagery, and is one of many projects helped by six new data-driven innovation hubs at the University of Edinburgh and Heriot-Watt University, as part of the Edinburgh and South East Scotland City Region Deal.
Dr Dave McKay is the Data Architect tasked with managing the development of the project.
“After a survey of researchers in the university that were using Earth observation images, we found they were going to various disparate sources to find that data, and sometimes they would pay quite a lot of money for it,” McKay explained.
“Sometimes it was technically free, but they’d have to pay to download it. Sometimes it was a little bit limited as to what they could do with it, plus the amount they could download was limited by the storage they had locally on their computers. So we wanted to help that situation by having one place they could go to to use the data, and we would go and source the data from various places for them, and elsewhere for computing power.”
Using the data storage and management capabilities of the Edinburgh International Data Facility, hosted, alongside UK National Supercomputing services, by EPCC at the University of Edinburgh, scientists at the university were uniquely placed to develop a platform which could handle and process vast volumes of data.
Images from satellites, like the European Space Agency’s Sentinel-2, go back as far as 2015, allowing scientists to track changes over time in areas such as land cover and icebergs melting. However, working with such enormous data sets is a technical challenge.
“The main idea was to produce something that people could use, all in one place, and also build systems that they could use for data analytics,” McKay explained.
“A lot of people in Earth observation data will think more like a data analyst than a computer programmer. They want to load the data and run an algorithm on it that finds something in that data and counts it, and they expect particular interfaces for that. They don’t want to programme the whole system, but they might want to programme how their part of the system works. So we want to give them that kind of access, give them enough computing power to run that kind of algorithm, and then give them a place where they could store the output of their analysis.”
The pilot programme, called ‘Single Tree’, used radar and optical data, linked together, with an algorithm analysing it to predict whether there was a tree or not in each pixel of the image. The images were 100 kilometres by 100 kilometres, with each pixel 10 metres by 10 metres, giving a total of 10,000 pixels.
This involved fetching data from two different sources, fusing it and processing it. Because optical images are affected by cloud cover, McKay and his team took the same image for three months, overlaid them all, and then built up a cloudless composite image.
It was so successful that it became a product in its own right.
“We had an industrial partner at that stage called Resilience Constellation, who plan a satellite constellation to help improve society’s climate change resilience,” McKay said.
“They saw that there was a saleable product there. So even though, to the researcher, it was a means to an end – the production of these cloud-free images – that actually became a product that is now available on their website to buyers: cloud-free images of Scotland.”
The potential for the Data Slipstream project is almost unlimited, given the computing power behind it.
“It’s ideas-driven, by people who are working in climate change applications, day in, day out. They’re the experts in that, they’re the kind of people that we want to work with to give us ideas,” he said.
Read the latest Case studies
As one of the TRAIN@Ed programme fellows, Sarah Galey had the opportunity to work with…
Childlight, the data institute based in the University of Edinburgh, is exploring how data can…