# Forecasts submitted to the Ecological Forecasting Initiative's NEON Challenge

***In development, experimental***

*At this time, canonical sources are still hosted on* `data.ecoforecast.org`.


## Resources

- [STAC Catalog](https://radiantearth.github.io/stac-browser/#/external/raw.githubusercontent.com/eco4cast/neon4cast-catalog/main/stac/catalog.json)


## Quickstart

Arrow provides an easy way to access remote parquet files from most languages widely used in data science.  Here we access all forecasts submitted to a particular theme. (Users looking to load only a single model should specify that on the path for faster access.  The STAC catalog can be used to explore available models).


The examples below show 'cloud-native' connections to the data -- 'lazy' connections that do not download the entire asset, but allow us to filter, subset, and operate directly on the remote data product.  

### R Access

```{r}
library(arrow)
base = "s3://anonymous@us-west-2.opendata.source.coop"
repo = "eco4cast/neon4cast-forecasts"
theme = "aquatics"
uri = glue::glue("{base}/{repo}/parquet/{theme}?region=us-west-2")

open_dataset(uri)

```

### Python Access


```{python}
import pyarrow.dataset as ds

base = "s3://anonymous@us-west-2.opendata.source.coop"
repo = "eco4cast/neon4cast-forecasts"
theme = "aquatics"
uri = f"{base}/{repo}/parquet/{theme}?region=us-west-2"

ds.dataset(uri, format="parquet") 
```


### duckdb

At this time, `duckdb` access substantially faster than `arrow`.


### R + duckdb

R users can get a dplyr-compatible lazy remote tibble as follows:

```{r}
# remotes::install_github("cboettig/duckdbfs")
library(duckdbfs)

base = "s3://anonymous@us-west-2.opendata.source.coop"
repo = "eco4cast/neon4cast-forecasts"
theme = "aquatics"
uri = glue::glue("{base}/{repo}/parquet/{theme}?region=us-west-2")

df = open_dataset(uri)
```

### Python + duckdb

[ibis](https://ibis-project.org/) provides a more Pythonic interface to SQL:

```{python}
import ibis
con = ibis.duckdb.connect() 

base = "s3://us-west-2.opendata.source.coop"
repo = "eco4cast/neon4cast-forecasts"
theme = "aquatics"
uri = f"{base}/{repo}/parquet/{theme}/**"

con.raw_sql(f"""
INSTALL httpfs;
LOAD httpfs;
SET s3_region='us-west-2';
""")

db = con.read_parquet(uri)
```

