From Space to Spreadsheet: The Troubled Waters methodology for analysing floods across Europe using Copernicus satellite data and other sources

Flood data in Europe is fragmented and inconsistent. By combining satellite data with public records, we built a decade-long dataset of flood impacts—revealing the most vulnerable regions and the urgent need for a unified EU disaster database.

May 31, 2025

By Konstantina Maltepioti

The Troubled Waters project set out to analyze the impact of flooding across Europe, assess the types of land and infrastructure affected, and ultimately identify the most vulnerable regions and communities over the past decade.

An initial search for open European flood data—covering basic indicators such as flood extent, fatalities, affected population, and damage to homes and infrastructure—revealed a fragmented and inconsistent landscape. Data was often incomplete, outdated, or missing key parameters necessary for long-term analysis.

To better understand the available data, we reached out to three institutions with published information on natural disasters: European Environment Agency (EEA), the European Commission’s Copernicus Emergency Management Mapping service, and the Centre for Research on the Epidemiology of Disasters (CRED). It became clear that comprehensive, regularly updated official data on European floods covering the parameters we needed for the decade, were not available.

Following their recommendations, we focused on three open datasets they referenced as reliable sources for flood data and its impact across Europe over the last decade:

Data Limitations and Challenges of Our Three Sources

Flood extent data was virtually absent in both Hanze and Public EM-DAT datasets. While some information on affected populations was available, it remained sparse and inconsistent. Detailed data on homes, land, and infrastructure affected were found only in the Copernicus Emergency Management Service (EMS) Mapping platform, and only for the last two years.

With the exception of Hanze, none of the datasets included accurate regional flood impact data. This required manual searching and cross-referencing with Eurostat’s NUTS classification to align regions with Hanze’s methodology and apply the European Commission’s urban-rural and coastal typology. This process helped identify which types of regions were most affected.

In terms of the number of floods, Hanze appears to have the most comprehensive dataset. Its data were sourced from media reports, the EU Commission, and academic literature.

Public EM-DAT data, which were sourced from the press (mainly AFP), the DFO database (until 2022), the Reinsurance companies, ECHO/EU news, and IFRC, was used only for the years 2021-2022 to bridge the gap between the Hanze dataset and Copernicus’ data. However, the number of floods recorded for 2022 seems unusually low, raising concerns about the completeness of the data.

Copernicus, by design, does not track all floods in Europe. It only responds to requests made by "authorized users" via the European Response Coordination Centre (ERCC), according to their website. In 2023, for instance, Germany and Bulgaria triggered Copernicus services to track their floods (EMSR701, EMSR693), but no data was published due to “remote sensing limitations.”

Our analysis focused on Copernicus data we gathered from the past two years, as it is the only source with detailed flood extent, infrastructure, and population impact data. EU member states were prioritized to ensure the relevance of findings to policy-making.

The final dataset spans a decade, covering both EU member states and other countries, such as Serbia, to estimate the total number of European floods, affected regions, and people impacted, without comparing specific years or countries, as the datasets track flood occurrences and affected population differently.

Collecting data from Copernicus: A step-by-step guide

We collected data on floods observed by Copernicus for the years 2024 and 2023 by first searching all flood related data in their “Activations" page.

Each flood has a unique identification code and a dedicated webpage containing detailed information about the event’s impact.

During the course of this project, Copernicus changed the structure of its website. In the older version of a flood's webpage (see example, still active on March 30 2025), there is an option to access Copernicus’ open API, which is no longer shown in the newer versions of the flood webpages (see the same example in the new website). However, the data available through the API remains the same.

Once the unique IDs were collected from the “Activations” page, we used them to construct the URLs for each flood’s API. From there, we were able to extract data on flood extent, land use, infrastructure, buildings, and population affected, across all regions impacted.

Each flood typically affects more than one region. Copernicus refers to these as AOIs (Areas of Interest), which are defined zones of analysis that do not align with official administrative boundaries. Each AOI can include multiple observations, or “products,” as Copernicus calls them, since the AOIs can be observed for multiple days on different levels. Observations made after the actual flood date often record smaller impacted areas. These observations can be found in the ‘stats’ section of the ‘aois’ in each floods’ API.

You can use this notebook to download data for all AOIs and products from the API, even if you don’t have python installed. Simply open the notebook in Google Colab, make the necessary edits to the unique IDs, and run the code.

If the API becomes unavailable, or for smaller-scale investigations, the same data can be accessed manually. Navigate to the “Statistics” button of an AOI in the Viewer page of a flood, or download the relevant Excel file from the “Crisis Information” section on the “Details & Download” page. This method also allows users to manually select which AOIs and products they want to include.

Understanding Copernicus’ data

Which AOIs or products do we keep?

To avoid double counting flooded areas, we must select AOIs (Areas of Interest) that do not overlap. The image below, which displays AOI products with orange boundaries (each polygon representing one product), clearly illustrates overlapping AOIs. This means we need to either choose products within a larger AOI or retain only the measurements from the larger AOI.

Screenshot from the webpage mapping the Thessaly flood in Greece which started on 5 September 2023, retrieved at 30.3.2025, showing overlapping AOIs.

There is no single correct answer for which product to choose. Our goal is to prevent double counting. We prioritized AOI products that were closest to the event date and/or included the highest reported population impact in their statistics.

How is the affected population measured?

Population was a priority, as it also reflected flood extent. Copernicus estimates affected population using the Global Human Settlement Population (GHS-POP) Grid, part of the Copernicus Human Settlement Layer (GHSL). This dataset combines census data with satellite imagery to estimate population over flooded hectares, spreading population over small grid squares.

How is the flood extent measured?

Copernicus classifies both flooded areas and flood traces as part of the Observed Event Area, which is visible in the Viewer via the “Legend” option for each product.

Screenshot from the webpage mapping the Thessaly flood in Greece on 9 September 2023, retrieved at 25.3.2025.

A flood trace indicates that water was present during the flood event but had receded by the time satellite imagery was captured. In some cases, Copernicus includes flood traces in its measurements under “Maximum Water Extent.”

According to Copernicus: "Indeed the flood traces are included in the maximum water extent. We assume that there is a very high probability that flooding was present where traces of flooding are identified on the satellite image. Therefore we include flood traces in the maximum extent of the flood (maximum water extent)."

Both flood traces and flooded areas are counted when describing total flood extent. For instance, in reports such as Bulletins No. 167 and 166, Copernicus combines the two to represent total impact. One map for Valencia’s floods, shows affected areas in blue and flood traces in light green, reporting over 53,000 hectares affected. However, the Summary Table for the same flood lists only 22,638.6 hectares, because it excludes flood traces.

However, in the "Summary Table" on the Copernicus "Details & Download" page for the same flood, only 22,638.6 hectares are listed as affected—because the summary table does not include flood traces.

During our analysis, we discovered that the summary tables were not always up to date “due to evolving procedures and products during the activation period” resulting in errors and inconsistencies in the data that were corrected from Copernicus upon communication. However, flood traces are still not included in the summary tables.

As a result, we did not rely on the summary tables for this investigation. Instead, we manually calculated the flood extent by reviewing all relevant product measurements for each AOI in the Copernicus “Statistics” section on the “Viewer” page, selecting the highest recorded flood extent.

An initial analysis during the project indicated that the statistics (e.g., on arable land, roads, pipelines) of a flood are taking flood trace, and not just flooded area, into account.

Our method for measuring flood extent per AOI

Since Copernicus presents flood extent measurements inconsistently, we manually determined the highest flood extent per AOI using the following approach:

This approach ensured the most accurate possible measurement per AOI, while avoiding duplication. In each case, we selected the product with the highest reported extent per AOI.

How did we normalize the data?

The data retrieved from the Copernicus API needed to be both accessible and standardized for analysis. To organize the information from each observation, we created a “subcategory unit” column, which included the raw description of the affected element (e.g. roads, farmland), along with its broader category and unit of measurement. From this column, we derived four additional fields to structure the dataset: Category unit, unit of measurement, and two normalized categories, one detailed and one broader.

The broader normalized category was created manually, based on editorial judgment. This allowed us to group similar entries under a common classification. For example, if data existed only for secondary roads in one AOI and only for primary roads in another, both were grouped under “road infrastructure.”

During this process, we observed that residential buildings, a key subcategory, were not consistently measured even for the same flood. In some products, they were reported in hectares; in others, by building count. Due to this inconsistency, we excluded residential building data from the final analysis.

We also found that Copernicus did not always apply the same unit of measurement to the same category across different floods, leading to further inconsistencies. Some infrastructure data was lost as a result.

Identifying the region of an AOI

Copernicus AOIs do not correspond to official regional boundaries. Instead, they act as reference zones designated for satellite monitoring. To associate each AOI with an official region, we manually searched AOI names using online resources and the Eurostat’s Local Administrative Units (LAU) dataset. This dataset links each locality to its corresponding NUTS 3 code, as per (Eurostat’s NUTS classification, which also categorizes regions as coastal, rural, urban, or intermediate.

Constructing the decade-long dataset

After creating the Copernicus dataset, we developed a smaller version to merge it with Public EM-DAT and Hanze. In the dataset spanning over the decade each row represents an impacted region.

To ensure consistency across the three sources, we standardized the dataset structure by including shared columns: a unique flood ID, year, date, country, affected regions, regional typologies, affected population, fatalities, flood type and cause (when available), and data sources.

As mentioned earlier, we manually added the latest NUTS region codes to both the Copernicus and Public EM-DAT datasets. We used NUTS 3 codes, the smallest statistical regions available, to allow for classification by urban-rural and coastal-inland typologies using Eurostat’s NUTS regions file. We used the same file to update Hanze’s region codes, that corresponded to the year 2021, to ensure alignment with the latest classifications.

The most challenging part was translating region names into English, as Eurostat provides them in their respective national languages.

Official data gap on natural disasters

There is a clear need for a unified European dataset with standardized data collection methodologies, especially as natural disasters become an increasing environmental and humanitarian crisis. Addressing this data gap should be a priority in European policy.

Such data is typically collected by national authorities, and its availability depends on each country's policies and commitment to transparency. Our research showed that most data on deaths caused by floods have originates from media publications and local reporting rather than official sources, while economic loss data, often gathered by insurance companies, remains unavailable to the public.

In this context, Copernicus satellite data stands out as an open, free, and reliable resource for tracking the impact of natural disasters. This methodology can also be adapted to support other European projects focused on disasters such as wildfires and earthquakes, or to strengthen local-level monitoring, preparedness, and response.

Jupyter notebooks for collection, merging, and analysis here.