DBT: Diet Trase Coffee 2020 External
File location: s3://trase-storage/diet-trase/diet_trase_coffee_2020_external.parquet
DBT model name: diet_trase_coffee_2020_external
Explore on Metabase: Full table; summary statistics
DBT details
- Lineage
-
Dbt path:
trase_production.main.diet_trase_coffee_2020_external -
Containing yaml link: trase/data_pipeline/models/diet_trase/_schema.yml
-
Model file: trase/data_pipeline/models/diet_trase/diet_trase_coffee_2020_external.py
-
Tags:
external,coffee,diet-trase-coffee
Description
Shareable version of the results with a selected set of fields and only including exports.
Details
| Column | Type | Description |
|---|---|---|
year |
BIGINT |
|
country_of_production |
VARCHAR |
|
country_of_production_iso2 |
VARCHAR |
|
port_of_export_name |
VARCHAR |
|
hs6 |
VARCHAR |
|
exporter_name |
VARCHAR |
|
exporter_node_id |
BIGINT |
|
exporter_group |
VARCHAR |
|
importer_name |
VARCHAR |
|
importer_group |
VARCHAR |
|
country_of_first_import |
VARCHAR |
|
country_of_first_import_iso2 |
VARCHAR |
|
country_of_first_import_economic_bloc |
VARCHAR |
|
mass_tonnes |
DOUBLE |
|
fob |
DOUBLE |
Models / Seeds
model.trase_duckdb.diet_trase_coffee_2020
No called script or script source not found.
"""
Shareable version of the Diet Trase results with a selected set of fields and only including exports.
"""
import polars as pl
def model(dbt, cursor):
dbt.config(
materialized="external",
)
lf = dbt.ref("diet_trase_coffee_2020").pl(lazy=True)
lf = lf.filter(~pl.col("is_domestic"))
columns = [
"year",
"country_of_production",
"country_of_production_iso2",
"port_of_export_name",
"hs6",
"exporter_name",
"exporter_node_id",
"exporter_group",
"importer_name",
"importer_group",
"country_of_first_import",
"country_of_first_import_iso2",
"country_of_first_import_economic_bloc",
"mass_tonnes",
"fob",
]
lf = lf.select(columns)
# aggregate by mass_tonnes and fob
group_cols = [c for c in columns if c not in ("mass_tonnes", "fob")]
lf = (
lf.group_by(group_cols)
.agg(
[
pl.sum("mass_tonnes").alias("mass_tonnes"),
pl.sum("fob").alias("fob"),
]
)
.with_columns(
[
pl.col("mass_tonnes").round(4),
pl.col("fob").round(2),
]
)
)
return lf