Gold Ethiopia Coffee 2020
s3://trase-storage/ethiopia/trade/bol/2020/gold/gold_ethiopia_coffee_2020.parquet
Dbt path: trase_production.main_ethiopia_coffee.gold_ethiopia_coffee_2020
Explore on Metabase: Full table; summary statistics
Containing yaml file link: trase/data_pipeline/models/ethiopia/trade/bol/2020/_schema_ethiopia_coffee.yml
Model file link: trase/data_pipeline/models/ethiopia/trade/bol/2020/gold_ethiopia_coffee_2020.py
Dbt test runs & lineage: Test results ยท Lineage
Full dbt_docs page: Open in dbt docs (includes lineage graph -at the bottom right-, tests, and downstream dependencies)
Tags: gold, ethiopia, coffee, trade, 2020, diet-trase-coffee
gold_ethiopia_coffee_2020
Description
Pre-processing done initially through an R script called by silver_ethiopia_coffee_2020 which
* Loads the vendor excel file
* Cleans the country names
* Fixes 3 outliers based on exchange rate, and a record without FOB
* Checks against comtrade in total, and per country
Then a python file called by gold_ethiopia_coffee_2020:
* Cleans additional country names, using as reference the database
* Gets the related trader information from the database
* Note that a runbook ingested the missing traders
Details
| Column | Type | Description |
|---|---|---|
year |
INTEGER |
|
exporter_label |
VARCHAR |
|
exporter_node_id |
INTEGER |
|
exporter_group_name |
VARCHAR |
|
country_of_destination |
VARCHAR |
|
mass_tonnes |
DOUBLE |
|
fob |
DOUBLE |
|
importer_label |
VARCHAR |
|
port_of_export_label |
VARCHAR |
|
hs6 |
VARCHAR |
Models / Seeds
model.trase_duckdb.silver_ethiopia_coffee_2020
No called script or script source not found.
# gold_ethiopia_coffee_2020
from trase.data.ethiopia.trade.bol.y2020.gold import gold_ethiopia_coffee_2020
def model(dbt, session):
dbt.config(materialized="external")
# Declaring the sources so they appear in the documentation / lineage
silver_trade_data = dbt.ref("silver_ethiopia_coffee_2020")
df = gold_ethiopia_coffee_2020.get_gold_ethiopia_coffee_2020()
final_data = session.sql(
f"""
WITH final_fields AS (
SELECT
YEAR::INT AS year,
EXPORTER_NAME AS exporter_label,
EXPORTER_ID AS exporter_node_id,
EXPORTER_GROUP AS exporter_group_name,
COUNTRY_DESTINATION AS country_of_destination,
NET_WEIGHT/1000 AS mass_tonnes,
FOB_VALUE_IN_USD AS fob,
BUYER_NAME AS importer_label,
CAST(NULL AS VARCHAR) AS port_of_export_label,
HS_CODE AS hs6
FROM df
)
SELECT * FROM final_fields
"""
).arrow()
# Saving as arrow as pandas converts the port_of_export_label to int32 even with casting
return final_data