# EXIOBASE 3: Global Input-Output Database

EXIOBASE 3 is a global, multi-regional input-output database, offering a detailed view of the world economy and its environmental impacts. It covers 49 regions and 200 sectors from 1995 to 2022. This dataset is ideal for analyzing trade, economic structures, and environmental footprints.

## Data Structure

The data is stored in Parquet format and partitioned by `year`, `format` (ixi/pxp), and `matrix` type (Z, Y, F_satellite, F_impacts) to allow for efficient querying.

```
s3://us-west-2.opendata.source.coop/youssef-harby/exiobase-3/
└── 4588235/
    ├── metadata/
    │   ├── f_impacts_units_1995_ixi.parquet
    │   ├── ... (112 files)
    └── parquet/
        ├── year={1995,1996,...,2022}/
        │   ├── format={ixi,pxp}/
        │   │   ├── matrix={F_impacts,F_satellite,Y,Z}/
        │   │   │   └── data.parquet
```

## Quickstart with DuckDB

The following examples demonstrate how to effectively query the EXIOBASE 3 dataset using DuckDB.

### A Note on Performance

For complex queries, such as time-series analysis or combining large matrices, direct queries may time out. A more robust approach is to break down the query into smaller parts and combine the results in your client application (e.g., Python with Pandas, R, etc.). The examples below follow this best practice.

### Top 10 Exporters to the US (2022)

To calculate the total exports to the United States, you should query the inter-industry (`Z`) and final demand (`Y`) matrices separately and then sum the results.

**Step 1: Get Exports from Inter-Industry Transactions**
```sql
SELECT origin_region, SUM(value) AS total_z
FROM 's3://us-west-2.opendata.source.coop/youssef-harby/exiobase-3/4588235/parquet/year=2022/format=ixi/matrix=Z/data.parquet'
WHERE dest_region = 'US' AND origin_region <> 'US' AND origin_region NOT LIKE 'W%'
GROUP BY origin_region
ORDER BY total_z DESC
LIMIT 20;
```

**Step 2: Get Exports from Final Demand**
```sql
SELECT origin_region, SUM(value) AS total_y
FROM 's3://us-west-2.opendata.source.coop/youssef-harby/exiobase-3/4588235/parquet/year=2022/format=ixi/matrix=Y/data.parquet'
WHERE dest_region = 'US' AND origin_region <> 'US' AND origin_region NOT LIKE 'W%'
GROUP BY origin_region
ORDER BY total_y DESC
LIMIT 20;
```
*Combine the `total_z` and `total_y` results for each `origin_region` in your client to get the final ranking.*

### China-US Trade (1995-2022)

Similarly, for time-series analysis, query each matrix across all years and then join the results.

**Step 1: Get Time-Series from Inter-Industry Transactions**
```sql
SELECT year, SUM(value) AS z_value
FROM read_parquet('s3://us-west-2.opendata.source.coop/youssef-harby/exiobase-3/4588235/parquet/year=*/format=ixi/matrix=Z/data.parquet', hive_partitioning=1)
WHERE origin_region = 'CN' AND dest_region = 'US' AND year BETWEEN 1995 AND 2022
GROUP BY year
ORDER BY year;
```

**Step 2: Get Time-Series from Final Demand**
```sql
SELECT year, SUM(value) AS y_value
FROM read_parquet('s3://us-west-2.opendata.source.coop/youssef-harby/exiobase-3/4588235/parquet/year=*/format=ixi/matrix=Y/data.parquet', hive_partitioning=1)
WHERE origin_region = 'CN' AND dest_region = 'US' AND year BETWEEN 1995 AND 2022
GROUP BY year
ORDER BY year;
```
*Join the results on the `year` column and sum `z_value` and `y_value` to get the total annual exports.*

### Top 20 CO2 Emitters (2022)

This query identifies the top 20 regions by total CO2 emissions, summing all CO2-related stressors from the `F_satellite` matrix.

```sql
SELECT
    region,
    SUM(value) AS total_co2,
    FIRST(unit) AS unit
FROM 's3://us-west-2.opendata.source.coop/youssef-harby/exiobase-3/4588235/parquet/year=2022/format=ixi/matrix=F_satellite/data.parquet'
WHERE stressor LIKE '%CO2%'
GROUP BY region
ORDER BY total_co2 DESC
LIMIT 20;
```
