Diet Trase Coffee Trader Parents
s3://trase-storage/diet-trase/coffee_traders/diet_trase_coffee_trader_parents.parquet
Dbt path: trase_production.main.diet_trase_coffee_trader_parents
Explore on Metabase: Full table; summary statistics
Containing yaml file link: trase/data_pipeline/models/diet_trase/coffee_traders/_schema.yml
Model file link: trase/data_pipeline/models/diet_trase/coffee_traders/diet_trase_coffee_trader_parents.sql
Dbt test runs & lineage: Test results ยท Lineage
Full dbt_docs page: Open in dbt docs (includes lineage graph -at the bottom right-, tests, and downstream dependencies)
Tags: coffee, diet-trase, traders
diet_trase_coffee_trader_parents
Description
Helper table based on data manually built linking trader groups to parent companies
Details
| Column | Type | Description |
|---|---|---|
db_trader_name |
VARCHAR |
|
trader_node_id |
BIGINT |
|
db_group_name |
VARCHAR |
|
group_parent |
VARCHAR |
|
group_parent_start_date |
DATE |
|
group_parent_end_date |
DATE |
|
group_parent_source |
VARCHAR |
|
group_parent_notes |
VARCHAR |
Models / Seeds
source.trase_duckdb.trase-storage-raw.diet_trase_coffee_2020_trader_group_parents-
model.trase_duckdb.postgres_tradersMacros
macro.trase_duckdb.excel_extension
Sources
['trase-storage-raw', 'diet_trase_coffee_2020_trader_group_parents']
No called script or script source not found.
{{
config(
pre_hook='{{ excel_extension() }}',
materialized='external'
)
}}
WITH
excel_data AS (
SELECT DISTINCT
x."Old Company Name" AS trader_label,
x."Cleaned Company Name" AS initial_proposed_name,
x."Group Name" AS initial_proposed_group,
x."Parent Name" AS group_proposed_parent,
CAST(x."Start date" AS DATE) AS group_parent_start_date,
CAST(x."End date" AS DATE) AS group_parent_end_date,
x."Parent Source" AS group_parent_source,
x."Notes" AS group_parent_notes
FROM {{ source('trase-storage-raw', 'diet_trase_coffee_2020_trader_group_parents') }} AS x
WHERE trader_label NOT IN ('UNKNOWN', 'UNKNOWN AFFILIATION')
AND group_proposed_parent NOT IN ('M/S TATA COFFEE') -- filtering out ambigous parent for TATA COFFEE
),
db_traders AS (
SELECT
label AS db_label,
name AS db_trader_name,
json_extract_string(groups[1], '$.group') AS db_group_name,
trader_node_id
FROM {{ ref('postgres_traders') }}
CROSS JOIN UNNEST(labels) AS t(label)
),
trader_parent AS (
SELECT DISTINCT
db.db_trader_name,
db.trader_node_id,
db.db_group_name,
e.group_proposed_parent AS group_parent,
-- The following are shared values, sometimes specified only once for each trader
ANY_VALUE(e.group_parent_start_date) AS group_parent_start_date,
ANY_VALUE(e.group_parent_end_date) AS group_parent_end_date,
ANY_VALUE(e.group_parent_source) AS group_parent_source,
ANY_VALUE(e.group_parent_notes) AS group_parent_notes
FROM excel_data AS e
LEFT JOIN db_traders AS db
ON e.trader_label = db.db_label
GROUP BY ALL
),
-- Used for debugging but not for final output
duplicated AS (
SELECT
db_trader_name,
COUNT(*) AS cnt
FROM trader_parent
GROUP BY 1
HAVING cnt > 1
)
SELECT *
FROM trader_parent
--WHERE db_trader_name IN (SELECT db_trader_name FROM duplicated)
ORDER BY db_trader_name