Skip to content

Diet Trase Coffee Trader Parents

s3://trase-storage/diet-trase/coffee_traders/diet_trase_coffee_trader_parents.parquet

Dbt path: trase_production.main.diet_trase_coffee_trader_parents

Explore on Metabase: Full table; summary statistics

Containing yaml file link: trase/data_pipeline/models/diet_trase/coffee_traders/_schema.yml

Model file link: trase/data_pipeline/models/diet_trase/coffee_traders/diet_trase_coffee_trader_parents.sql

Dbt test runs & lineage: Test results ยท Lineage

Full dbt_docs page: Open in dbt docs (includes lineage graph -at the bottom right-, tests, and downstream dependencies)

Tags: coffee, diet-trase, traders


diet_trase_coffee_trader_parents

Description

Helper table based on data manually built linking trader groups to parent companies


Details

Column Type Description
db_trader_name VARCHAR
trader_node_id BIGINT
db_group_name VARCHAR
group_parent VARCHAR
group_parent_start_date DATE
group_parent_end_date DATE
group_parent_source VARCHAR
group_parent_notes VARCHAR

Models / Seeds

  • source.trase_duckdb.trase-storage-raw.diet_trase_coffee_2020_trader_group_parents
  • model.trase_duckdb.postgres_traders

    Macros

    • macro.trase_duckdb.excel_extension

Sources

  • ['trase-storage-raw', 'diet_trase_coffee_2020_trader_group_parents']

No called script or script source not found.

{{ 
    config(
        pre_hook='{{ excel_extension() }}',
        materialized='external'
    ) 
}}

WITH
excel_data AS (
    SELECT DISTINCT
        x."Old Company Name" AS trader_label,
        x."Cleaned Company Name" AS initial_proposed_name,
        x."Group Name" AS initial_proposed_group,
        x."Parent Name" AS group_proposed_parent,
        CAST(x."Start date" AS DATE) AS group_parent_start_date,
        CAST(x."End date" AS DATE) AS group_parent_end_date,
        x."Parent Source" AS group_parent_source,
        x."Notes" AS group_parent_notes
    FROM {{ source('trase-storage-raw', 'diet_trase_coffee_2020_trader_group_parents') }} AS x
    WHERE trader_label NOT IN ('UNKNOWN', 'UNKNOWN AFFILIATION')
    AND group_proposed_parent NOT IN ('M/S TATA COFFEE') -- filtering out ambigous parent for TATA COFFEE
),
db_traders AS (
    SELECT
        label AS db_label,
        name AS db_trader_name,
        json_extract_string(groups[1], '$.group') AS db_group_name,
        trader_node_id
    FROM  {{ ref('postgres_traders') }}
    CROSS JOIN UNNEST(labels) AS t(label)
),
trader_parent AS (
    SELECT DISTINCT
        db.db_trader_name,
        db.trader_node_id,
        db.db_group_name,
        e.group_proposed_parent AS group_parent,
        -- The following are shared values, sometimes specified only once for each trader
        ANY_VALUE(e.group_parent_start_date) AS group_parent_start_date,
        ANY_VALUE(e.group_parent_end_date) AS group_parent_end_date,
        ANY_VALUE(e.group_parent_source) AS group_parent_source,
        ANY_VALUE(e.group_parent_notes) AS group_parent_notes
    FROM excel_data AS e
    LEFT JOIN db_traders AS db
    ON e.trader_label = db.db_label
    GROUP BY ALL
),
-- Used for debugging but not for final output
duplicated AS (
    SELECT 
        db_trader_name,
        COUNT(*) AS cnt
    FROM trader_parent
    GROUP BY 1
    HAVING cnt > 1
)
SELECT *
FROM trader_parent
--WHERE db_trader_name IN (SELECT db_trader_name FROM duplicated)
ORDER BY db_trader_name