Hs2017
s3://trase-storage/world/metadata/codes/hs/HS2017.csv
Dbt path: trase_production.main.hs2017
Explore on Metabase: Full table; summary statistics
Containing yaml file link: trase/data_pipeline/models/world/metadata/codes/hs/_schema.yml
Model file link: trase/data_pipeline/models/world/metadata/codes/hs/hs2017.py
Calls script: trase/data/world/metadata/codes/hs/HS2017.py
Dbt test runs & lineage: Test results ยท Lineage
Full dbt_docs page: Open in dbt docs (includes lineage graph -at the bottom right-, tests, and downstream dependencies)
Tags: mock_model, codes, hs, metadata, world
hs2017
Description
This model was auto-generated based off .yml 'lineage' files in S3. The DBT model just raises an error; the actual script that created the data lives elsewhere. The script is located at trase/data/world/metadata/codes/hs/HS2017.py [permalink]. It was last run by Harry Biddle.
Details
| Column | Type | Description |
|---|---|---|
Models / Seeds
source.trase_duckdb.trase-storage-raw.classificationh5
Sources
['trase-storage-raw', 'classificationh5']
import pandas as pd
from trase.tools.aws.aws_helpers import read_json
from trase.tools.aws.metadata import write_csv_for_upload
data = read_json("world/metadata/codes/hs/originals/classificationH5.json")
assert data["className"] == "HS2017"
df = pd.DataFrame(row for row in data["results"])
# exclude section headings
df = df[df["parent"] != "#"].copy()
df["type"] = df["id"].str.len().apply(lambda l: {2: "hs2", 4: "hs4", 6: "hs6"}[l])
# exclude "not specified" codes
df = df[~df["id"].isin(["99", "9999", "999999"])]
# strip code from description
df["text"] = df.apply(lambda row: row["text"].lstrip(f"{row['id']} - "), axis=1)
# rename columns
df = df.rename(columns={"id": "code", "text": "description"}, errors="raise")
# select and re-order columns
df = df[["type", "code", "description"]]
write_csv_for_upload(df, "world/metadata/codes/hs/HS2017.csv")
import pandas as pd
def model(dbt, cursor):
dbt.source("trase-storage-raw", "classificationh5")
raise NotImplementedError()
return pd.DataFrame({"hello": ["world"]})