Fish Production 2019

s3://trase-storage/brazil/flow_constraints/production/FISH_PRODUCTION_2019.csv

Dbt path: trase_production.main_brazil.fish_production_2019

Explore on Metabase: Full table; summary statistics

Containing yaml file link: trase/data_pipeline/models/brazil/flow_constraints/production/_schema.yml

Model file link: trase/data_pipeline/models/brazil/flow_constraints/production/fish_production_2019.py

Calls script: trase/data/brazil/flow_constraints/production/FISH_PRODUCTION_2019.py

Dbt test runs & lineage: Test results · Lineage

Full dbt_docs page: Open in dbt docs (includes lineage graph -at the bottom right-, tests, and downstream dependencies)

Tags: mock_model, brazil, flow_constraints, production

fish_production_2019

Description

This model was auto-generated based off .yml 'lineage' files in S3. The DBT model just raises an error; the actual script that created the data lives elsewhere. The script is located at trase/data/brazil/flow_constraints/production/FISH_PRODUCTION_2019.py [permalink]. It was last run by Harry Biddle.

Details

ColumnsDepends OnCalled script codeModel code

Column	Type	Description

Models / Seeds

source.trase_duckdb.trase-storage-raw.aquaculture_production_2019

Sources

['trase-storage-raw', 'aquaculture_production_2019']

from trase.tools.aws.aws_helpers_cached import get_pandas_df_once
from trase.tools.aws.metadata import write_csv_for_upload

SPECIES_THAT_CONSUME_CORN_OR_SOY = [
    "CARPA",
    "CURIMATA, CURIMBATA",
    "DOURADO",
    "JATUARANA, PIABANHA E PIRACANJUBA",
    "LAMBARI",
    "MATRINXÃ",
    "PACU E PATINGA",
    "PIAU, PIAPARA, PIAUÇU, PIAVA",
    "PINTADO, CACHARA, CACHAPIRA E PINTACHARA, SURUBIM",
    "PIRAPITINGA",
    "PIRARUCU",
    "TAMBACU, TAMBATINGA",
    "TAMBAQUI",
    "TILAPIA",
    "TRAÍRA E TRAIRÃO",
    "TRUTA",
    "TUCUNARE",
    "OUTROS PEIXES",
    "CAMARAO",
]
SPECIES_THAT_CONSUME_NEITHER_CORN_NOR_SOY = [
    "ALEVINOS",
    "LARVAS E PIS-LARVAS DE CAMARAO",
    "SEMENTES DE MOLUSCOS",
    "OSTRAS, VIEIRAS E MEXILHOES",
    "OUTROS PRODUTOS",
]

# read file using Pandas
# we explicitly read in all species to make sure that they are all present and we haven't
# got a string-naming issue
all_species = (
    SPECIES_THAT_CONSUME_CORN_OR_SOY + SPECIES_THAT_CONSUME_NEITHER_CORN_NOR_SOY
)
df = get_pandas_df_once(
    "brazil/production/statistics/ibge/aquaculture/AQUACULTURE_PRODUCTION_2019.csv",
    sep=";",
    converters={"GEOCODMUN": str, **{species: int for species in all_species}},
)

# sum up grain species
df = df.drop(columns=SPECIES_THAT_CONSUME_NEITHER_CORN_NOR_SOY)

# filter and add a few handy columns
df["Kg"] = df[SPECIES_THAT_CONSUME_CORN_OR_SOY].sum(axis=1)
df = df[["GEOCODMUN", "Kg"]]
df["TYPE"] = "AQUACULTURE"
df["YEAR"] = 2019

# some quick QA
assert all(df["GEOCODMUN"].str.len() == 7)
assert df["GEOCODMUN"].is_unique
assert all(df["Kg"] >= 0)

# done! we let the user upload to S3
write_csv_for_upload(df, "brazil/flow_constraints/production/FISH_PRODUCTION_2019.csv")

import pandas as pd


def model(dbt, cursor):
    dbt.source("trase-storage-raw", "aquaculture_production_2019")

    raise NotImplementedError()
    return pd.DataFrame({"hello": ["world"]})