DBT: Diet Trase Coffee Trader Parents
File location: s3://trase-storage/diet-trase/coffee_traders/diet_trase_coffee_trader_parents.parquet
DBT model name: diet_trase_coffee_trader_parents
Explore on Metabase: Full table; summary statistics
Explore dependencies/lineage: link
Description
Helper table linking coffee trader groups (as known in the Trase PostgreSQL
database) to their parent companies, based on a manually curated Excel
workbook (CompanyWorkbookMappingsFinal_08_09_25.xlsx).
The Excel workbook lists, for each historical trader label, the cleaned
company name, group, and parent, along with validity dates,
a source for the parent attribution, and free-text notes. This model
joins those rows to postgres_traders via the trader label, so the
output exposes the canonical DB trader name and node id alongside the
proposed parent.
Notes:
- Rows with
trader_labelequal to'UNKNOWN'or'UNKNOWN AFFILIATION'are excluded. - The ambiguous
'M/S TATA COFFEE'parent row is excluded. - The join is a
LEFT JOINfrom the Excel data topostgres_traders, so Excel entries that do not match any DB trader label appear with NULLdb_trader_name,trader_node_id, anddb_group_name. - A trader can appear multiple times if the workbook records more than one parent for it (for example, across different validity periods).
Details
| Column | Type | Description |
|---|---|---|
db_trader_name |
VARCHAR |
Canonical trader name from postgres_traders.name, matched to the Excel Old Company Name via the trader's labels array. `NULL`` when the Excel label has no match in the DB. |
trader_node_id |
BIGINT |
Trase node id of the matched trader in postgres_traders. NULL when the Excel label has no match in the DB. |
db_group_name |
VARCHAR |
Group name associated with the trader in the DB, extracted from the first element of postgres_traders.groups ($.group). May differ from the workbook's proposed group. |
group_parent |
VARCHAR |
Parent company proposed for the trader's group, from the workbook's Parent Name column. |
group_parent_start_date |
DATE |
Start date of the parent attribution, from the workbook's Start date column. Aggregated with ANY_VALUE since the workbook sometimes specifies it on only one of several rows for the same (trader, parent) pair. |
group_parent_end_date |
DATE |
End date of the parent attribution, from the workbook's End date column. Aggregated with ANY_VALUE (see group_parent_start_date). |
group_parent_source |
VARCHAR |
Source supporting the parent attribution, from the workbook's Parent Source column. Aggregated with ANY_VALUE. |
group_parent_notes |
VARCHAR |
Free-text notes about the parent attribution, from the workbook's Notes column. Aggregated with ANY_VALUE. |
No data tests defined 🧐
Models
Sources
Macros
No called script or script source not found.
-- Links coffee trader groups (as known in `postgres_traders`) to their parent
-- companies, based on a manually curated Excel workbook.
--
-- The workbook (`CompanyWorkbookMappingsFinal_08_09_25.xlsx`) is keyed on
-- historical trader labels ("Old Company Name") and supplies, for each label,
-- a proposed parent company plus optional validity dates, source, and notes.
-- We join those rows to `postgres_traders` through the trader's `labels`
-- array to expose the canonical DB trader name, node id, and group alongside
-- the parent attribution.
--
-- The `excel_extension` pre-hook installs the DuckDB extension required to
-- read the Excel source.
{{
config(
pre_hook='{{ excel_extension() }}',
materialized='external'
)
}}
WITH
-- Workbook rows, with placeholder labels and an ambiguous TATA COFFEE parent
-- removed. Column aliases rename the workbook headers into snake_case fields
-- used by the rest of the model.
excel_data AS (
SELECT DISTINCT
x."Old Company Name" AS trader_label,
x."Cleaned Company Name" AS initial_proposed_name,
x."Group Name" AS initial_proposed_group,
x."Parent Name" AS group_proposed_parent,
CAST(x."Start date" AS DATE) AS group_parent_start_date,
CAST(x."End date" AS DATE) AS group_parent_end_date,
x."Parent Source" AS group_parent_source,
x."Notes" AS group_parent_notes
FROM {{ source('trase-storage-raw', 'diet_trase_coffee_2020_trader_group_parents') }} AS x
WHERE trader_label NOT IN ('UNKNOWN', 'UNKNOWN AFFILIATION')
AND group_proposed_parent NOT IN ('M/S TATA COFFEE') -- filtering out ambigous parent for TATA COFFEE
),
-- One row per (trader, label) in `postgres_traders`. The `labels` array is
-- unnested so each historical label can be matched against the workbook's
-- `trader_label`. `db_group_name` is read from the first entry of the
-- `groups` JSON array.
db_traders AS (
SELECT
label AS db_label,
name AS db_trader_name,
json_extract_string(groups[1], '$.group') AS db_group_name,
trader_node_id
FROM {{ ref('postgres_traders') }}
CROSS JOIN UNNEST(labels) AS t(label)
),
-- Join workbook entries to DB traders by label. LEFT JOIN: workbook rows that
-- don't match any DB trader still come through, with NULL db_* fields. The
-- workbook may carry several rows per (trader, parent) — for example, only
-- one of them filled in the start/end date — so we group and pick any value.
trader_parent AS (
SELECT DISTINCT
db.db_trader_name,
db.trader_node_id,
db.db_group_name,
e.group_proposed_parent AS group_parent,
-- The following are shared values, sometimes specified only once for each trader
ANY_VALUE(e.group_parent_start_date) AS group_parent_start_date,
ANY_VALUE(e.group_parent_end_date) AS group_parent_end_date,
ANY_VALUE(e.group_parent_source) AS group_parent_source,
ANY_VALUE(e.group_parent_notes) AS group_parent_notes
FROM excel_data AS e
LEFT JOIN db_traders AS db
ON e.trader_label = db.db_label
GROUP BY ALL
),
-- Debug-only CTE: traders that end up with more than one row in the output
-- (e.g. multiple parents over different validity periods). Not part of the
-- final SELECT — uncomment the WHERE clause below to inspect them.
duplicated AS (
SELECT
db_trader_name,
COUNT(*) AS cnt
FROM trader_parent
GROUP BY 1
HAVING cnt > 1
)
SELECT *
FROM trader_parent
--WHERE db_trader_name IN (SELECT db_trader_name FROM duplicated)
ORDER BY db_trader_name
-
Dbt path:
trase_production.main.diet_trase_coffee_trader_parents -
Containing yaml link: trase/data_pipeline/models/diet_trase/coffee_traders/_schema.yml
-
Model file: trase/data_pipeline/models/diet_trase/coffee_traders/diet_trase_coffee_trader_parents.sql
-
Tags:
coffee,diet-trase,traders