# EuroCrops (Cloud-Native Geo distribution)

This dataset is a copy of the [EuroCrops](https://github.com/maja601/EuroCrops) dataset, offering the data in [cloud-native geospatial](https://cloudnativegeo.org) formats. The overview of the original dataset is:

> EuroCrops is a dataset collection combining all publicly available self-declared crop reporting datasets from countries of the European Union. The project is funded by the German Space Agency at DLR on behalf of the Federal Ministry for Economic Affairs and Climate Action (BMWK). This work is licensed under a [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/).

> Right now EuroCrops only includes vector data, but stay tuned for a version that includes satellite imagery!

> For any questions, please refer to our [FAQs](https://github.com/maja601/EuroCrops/wiki/FAQs) or use the Discussions/Issues to reach out to us. 

You can read more details on the dataset itself on the main EuroCrops sites at https://github.com/maja601/EuroCrops and https://www.eurocrops.tum.de/

## About the data formatting and structure

The original EuroCrops dataset is distributed as Shapefiles, one per country, and takes the original data from the country providers as is, enhancing it with 3-4 attributes that are consistent across all the country files. These data come in a variety of projections, since the source data was not consistent. The folder named [unprojected](https://beta.source.coop/cholmes/eurocrops/browse/unprojected) contains the original data structure in different formats.

Note that a few France boundaries have points that are way outside of France, and the Romania boundaries seem to overlap each other. We'll work with the source provider to try to clean those up there, and will redistribute here when ready.

### Modifications

There are several different data types in this repository. The first set have no changes to the original files, but puts them in some alternate formats ([GeoParquet](https://geoparquet.org) and [Flatgeobuf](https://flatgeobuf.org))

The second set projects all geometries into long / lat to make the data easy to work as a single dataset (about half the source files were already in that projection, the rest were country specific).

Then third set of datasets also are all in long lat and additionally remove all the attributes except those that were made common across datasets by the Eurocrops project:

| Attribute Name | Explanation                                                 |
| -------------- | ----------------------------------------------------------- |
| EC_trans_n     | The original crop name translated into English              |
| EC_hcat_n      | The machine-readable HCAT name of the crop                  |
| EC_hcat_c      | The 10-digit HCAT code indicating the hierarchy of the crop |

The NUTS3 attribute was not included as it was not consistently in the datasets. A future iteration of this dataset may try to add that or other country information in the data itself to be able to do parquet partitioning against it.

## Access the data

The easiest way to visualize the data is with this [Felt map](https://felt.com/map/Eurocrops-DRrR1lxoQEqvmZBz6e9APPC), where you can see (almost) all the data visualized and styled. You can also directly access [this pmtiles](https://s3.us-west-2.amazonaws.com/us-west-2.opendata.source.coop/cholmes/eurocrops/eurocrops-all.pmtiles), and see it [shown with the PMTiles Viewer](https://protomaps.github.io/PMTiles/?url=https%3A%2F%2Fs3.us-west-2.amazonaws.com%2Fus-west-2.opendata.source.coop%2Fcholmes%2Feurocrops%2Feurocrops-all.pmtiles#map=2.36/41.89/31.42). These just contain the harmonized attributes, and are projected into the web mercator projection.

The original shapefiles are in the [unprojected/shapefiles](https://beta.source.coop/cholmes/eurocrops/browse/unprojected/shapefiles) directory, and you can find alternate formats in [unprojected/flatgeobuf](https://beta.source.coop/cholmes/eurocrops/browse/unprojected/flatgeobuf) and [unprojected/geoparquet](https://beta.source.coop/cholmes/eurocrops/browse/unprojected/geoparquet).

The [geoparquet-projected/](https://beta.source.coop/cholmes/eurocrops/browse/geoparquet-projected) folder has all the data projected to long / lat, but retains all the original fields. 

And then you can get the data as a [single Flatgeobuf file](https://data.source.coop/cholmes/eurocrops/eurocrops-harmonized-only.fgb) (10.8 gb) that is projected and only has the final harmonized fields. We're also making available a [DuckDB](https://duckdb.org) database to experiment with distributing it directly. The data likely will just show up as a blob that you'll have to parse in with the [spatial extension](https://duckdb.org/docs/extensions/spatial).

## Dataset License

The data is licensed under [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/).

## Authors

* Maja Schneider
* Amelie Broszeit
* Marco Körner

### Additional Processing

* Chris Holmes

## Citation & DOI

From https://github.com/maja601/EuroCrops#reference

**Disclaimer**: The official reference will follow soon. Please also reference the countries' dependent source in case you're using their data.

```
@Misc{schneider2022eurocrops21,
 author     = {Schneider, Maja and K{\"o}rner, Marco},
 title      = {EuroCrops},
 DOI        = {10.5281/zenodo.6866846},
 type       = {Dataset},
 publisher  = {Zenodo},
 year       = {2022}
}
```

Additional references:

```
@InProceedings{Schneider2022Challenges,
  title     = {Challenges and Opportunities of Large Transnational Datasets: A Case Study on European Administrative Crop Data},
  author    = {Schneider, Maja and Marchington, Christian and K{\"o}rner, Marco},
  booktitle = {Workshop on Broadening Research Collaborations in ML (NeurIPS 2022)},
  year      = {2022}
}
```

```
@InProceedings{Schneider2022Harnessing,
  title         = {Harnessing Administrative Data Inventories to Create a Reliable Transnational Reference Database for Crop Type Monitoring},
  author        = {Schneider, Maja and K{\"o}rner, Marco},
  booktitle     = {IGARSS 2022-2022 IEEE International Geoscience and Remote Sensing Symposium},
  pages         = {5385--5388},
  year          = {2022},
  organization  = {IEEE}
}
```

```
@InProceedings{Schneider2021EPE,
  author        = {Schneider, Maja and Broszeit, Amelie and K{\"o}rner, Marco},
  booktitle     = {Proceedings of the Conference on Big Data from Space (BiDS)},
  title         = {{EuroCrops}: A Pan-European Dataset for Time Series Crop Type Classification},
  editor        = {Soille, Pierre and Loekken, Sveinung and Albani, Sergio},
  publisher     = {Publications Office of the European Union},
  date          = {2021-05-18},
  doi           = {10.2760/125905},
  eprint        = {2106.08151},
  eprintclass   = {eess.IV,cs.CV,cs.LG},
  eprinttype    = {arxiv}
}
```

```
@Misc{Schneider2021TEC,
  author       = {Schneider, Maja and K{\"o}rner, Marco},
  date         = {2021-06-15},
  title        = {{TinyEuroCrops}},
  doi          = {10.14459/2021MP1615987},
  organization = {Technical University of Munich (TUM)},
  type         = {Dataset},
  url          = {https://mediatum.ub.tum.de/1615987}
}
```