Bigquery Gtas Farms And Slaughterhouses
s3://trase-storage/brazil/logistics/gta_farms_and_slaughterhouses//.parquet
Dbt path: trase_production.main_brazil_logistics.bigquery_gtas_farms_and_slaughterhouses
Explore on Metabase: Full table; summary statistics
Containing yaml file link: trase/data_pipeline/models/brazil/logistics/gta_farms_and_slaughterhouses/_schema_brazil_farms_and_slaughterhouses.yml
Model file link: trase/data_pipeline/models/brazil/logistics/gta_farms_and_slaughterhouses/bigquery_gtas_farms_and_slaughterhouses.py
Dbt test runs & lineage: Test results ยท Lineage
Full dbt_docs page: Open in dbt docs (includes lineage graph -at the bottom right-, tests, and downstream dependencies)
Tags: silver, brazil, logistics, gtas, farms, slaughterhouses
bigquery_gtas_farms_and_slaughterhouses
Description
Reads farm and slaughterhouse information from the GTAS in BigQuery and saves them in a partitioned parquet (by YEAR field) for convenience, as its usually the timeframe of analysis, and the whole dataset is ~450MB
Details
| Column | Type | Description |
|---|---|---|
STATE |
VARCHAR |
|
TAX_NUMBER |
VARCHAR |
|
CITY |
VARCHAR |
|
GEOCODE |
VARCHAR |
|
GEOCODE_DETAIL |
VARCHAR |
|
PROPERTY_OWNER |
VARCHAR |
|
PROPERTY_NAME |
VARCHAR |
|
IS_SLAUGHTERHOUSE |
BOOLEAN |
|
NUM_GTAS |
BIGINT |
|
ANIMALS_IN |
BIGINT |
|
ANIMALS_OUT |
BIGINT |
|
YEAR |
BIGINT |
No called script or script source not found.
# bigquery_gtas_farms_and_slaughterhouses
"""
Reads farm and slaughterhouse information from the GTAS in BigQuery
and saves them in a partitioned parquet (by YEAR field) for convenience
as its usually the timeframe of analysis, and the whole dataset is ~450MB.
The partition definition is on the corresponding schema yaml file.
"""
from google.cloud import bigquery
BQ_TABLE = "dopastoaoprato-278fd.gtas.gtas_consolidated_2025_02"
def model(dbt, session):
dbt.config(materialized="external")
client = bigquery.Client()
query = f"""
WITH origin_data AS (
SELECT
TRANSPORT_YEAR AS YEAR,
ORIGIN_STATE AS STATE,
ORIGIN_TAX_NUMBER AS TAX_NUMBER,
ORIGIN_CITY AS CITY,
ORIGIN_GEOCODE AS GEOCODE,
ORIGIN_CODE AS GEOCODE_DETAIL,
ORIGIN_FARMER AS PROPERTY_OWNER,
ORIGIN_NAME AS PROPERTY_NAME,
FALSE AS IS_SLAUGHTERHOUSE,
COUNT(ID) AS NUM_GTAS,
CAST(0 AS INT) AS ANIMALS_IN,
SUM(TOTAL_ANIMALS_SENT) AS ANIMALS_OUT
FROM `{BQ_TABLE}`
GROUP BY
YEAR,
STATE,
TAX_NUMBER,
CITY,
GEOCODE,
GEOCODE_DETAIL,
PROPERTY_OWNER,
PROPERTY_NAME
),
destination_data AS (
SELECT
TRANSPORT_YEAR AS YEAR,
DESTINATION_STATE AS STATE,
DESTINATION_TAX_NUMBER AS TAX_NUMBER,
DESTINATION_CITY AS CITY,
DESTINATION_GEOCODE AS GEOCODE,
DESTINATION_CODE AS GEOCODE_DETAIL,
DESTINATION_FARMER AS PROPERTY_OWNER,
DESTINATION_NAME AS PROPERTY_NAME,
CAST(
SUM(CAST(FINAL_MOVEMENT AS INT64)) > (COUNT(FINAL_MOVEMENT) / 2)
AS BOOL
) AS IS_SLAUGHTERHOUSE,
COUNT(ID) AS NUM_GTAS,
SUM(TOTAL_ANIMALS_SENT) AS ANIMALS_IN,
CAST(0 AS INT) AS ANIMALS_OUT
FROM `{BQ_TABLE}`
GROUP BY
YEAR,
STATE,
TAX_NUMBER,
CITY,
GEOCODE,
GEOCODE_DETAIL,
PROPERTY_OWNER,
PROPERTY_NAME
),
added AS (
SELECT * FROM origin_data
UNION ALL
SELECT * FROM destination_data
)
SELECT
YEAR,
STATE,
TAX_NUMBER,
CITY,
GEOCODE,
GEOCODE_DETAIL,
PROPERTY_OWNER,
PROPERTY_NAME,
IS_SLAUGHTERHOUSE,
SUM(NUM_GTAS) AS NUM_GTAS,
SUM(ANIMALS_IN) AS ANIMALS_IN,
SUM(ANIMALS_OUT) AS ANIMALS_OUT
FROM added
GROUP BY
YEAR,
STATE,
TAX_NUMBER,
CITY,
GEOCODE,
GEOCODE_DETAIL,
PROPERTY_OWNER,
PROPERTY_NAME,
IS_SLAUGHTERHOUSE
ORDER BY YEAR DESC, STATE ASC, ANIMALS_IN DESC
"""
query_job = client.query(query)
query_df = query_job.result().to_arrow()
return query_df