Skip to content

Br Bovine Slaughter 2015 2023

s3://trase-storage/brazil/production/statistics/sigsif/out/br_bovine_slaughter_2015_2023.parquet

Dbt path: trase_production.main_brazil.br_bovine_slaughter_2015_2023

Explore on Metabase: Full table; summary statistics

Containing yaml file link: trase/data_pipeline/models/brazil/production/statistics/sigsif/out/_schema.yml

Model file link: trase/data_pipeline/models/brazil/production/statistics/sigsif/out/br_bovine_slaughter_2015_2023.py

Calls script: trase/data/brazil/production/statistics/sigsif/out/br_bovine_slaughter_2015_2023.r

Dbt test runs & lineage: Test results · Lineage

Full dbt_docs page: Open in dbt docs (includes lineage graph -at the bottom right-, tests, and downstream dependencies)

Tags: mock_model, brazil, out, production, sigsif, statistics, slaughter


br_bovine_slaughter_2015_2023

Description

Clean municipal slaughter data (SIGSIF) for bovine in Brazil (2020-2023)

This data contains the proportion of cattle in each muncipality slaughter per state (SIGSIF data). The script is based on Erasmsses .rmd script: ' TRASE/trase/data/brazil/production/statistics/sigsif/clean_bovine_sigsif_municipal_slaughter_data.Rmd '
' TRASE/trase/data/brazil/production/statistics/sigsif/clean_bovine_sigsif_municipal_slaughter_data.html '

Note: next update we may need to consider changing the way we update the data. Currently we aggregate all years, which worked perfectly fine for the first release in 2022, but sourcng has become more variable, since then (see R2 plot in script).

What the dataset contains

The dataset contains the following columns:

  • STATE: State abbreviation (e.g. SP for São Paulo)
  • GEOCODE: IBGE code for the municipality
  • SUM_QUANTITY: Total quantity of cattle slaughtered in the municipality between 2015 and 2023
  • PROP_FLOWS: Percent of cattle slaughtered in the municipality relative to the total slaughter in the state
  • BIOME: Predominant biome of each municipality (e.g. Amazon, Cerrado), based in IBGE data from 2024: ' object = "s3://trase-storage/brazil/spatial/boundaries/ibge/br_municipality_biome/Bioma_Predominante_por_Municipio_2024.csv", '

How we use the dataset in Trase

Used in the SEI-PCS beef model

(For external datasets) How to fetch the data from the source

  • Update frequency: YEARLY
  • Next expected update: ?

We have a script which uses the API to fetch the data:

Rscript trase/data/brazil/production/statistics/sigsif/original/SIGSIF_SLAUGHTER_2023_2024.r

Alternatively you can also download it manually at "https://dados.agricultura.gov.br/dataset/062166e3-b515-4274-8e7d-68aadd64b820/resource/239eaa90-35cd-4b67-8902-d34eda3dca53/download/sigsifquantitativoanimaisabatidoscategoriauf.csv"

The script that is used to process/clean the dataset.

The file trase/data/brazil/production/statistics/sigsif/out/br_bovine_slaughter_2015_2023.r

History

  • 2025-06: Florian Gollnow, extend with new SIGSIF data
  • 2023-04: Update by Yan Prada
  • 2022-06: Initial data cleaning by Erasmus TRASE/trase/data/brazil/production/statistics/sigsif/clean_bovine_sigsif_municipal_slaughter_data.Rmd .

Details

Column Type Description

Models / Seeds

  • source.trase_duckdb.trase-storage-raw.br_municipalities_wgs84_2023
  • source.trase_duckdb.trase-storage-raw.br_municipality_biome_2024
  • source.trase_duckdb.trase-storage-raw.br_sigsif_slaughter_original_2020
  • source.trase_duckdb.trase-storage-raw.br_sigsif_slaughter_original_2021
  • source.trase_duckdb.trase-storage-raw.br_sigsif_slaughter_original_2022
  • source.trase_duckdb.trase-storage-raw.br_sigsif_slaughter_original_2023
  • model.trase_duckdb.sif_2015_2017
  • model.trase_duckdb.sif_2018
  • model.trase_duckdb.sif_2019

Sources

  • ['trase-storage-raw', 'br_municipalities_wgs84_2023']
  • ['trase-storage-raw', 'br_municipality_biome_2024']
  • ['trase-storage-raw', 'br_sigsif_slaughter_original_2020']
  • ['trase-storage-raw', 'br_sigsif_slaughter_original_2021']
  • ['trase-storage-raw', 'br_sigsif_slaughter_original_2022']
  • ['trase-storage-raw', 'br_sigsif_slaughter_original_2023']
# The output has the columns:
# `STATE_SLAUGHTER`
# `GEOCODE`
# `PROP_FLOWS`
# `QUANTITY`
# `BIOME`


library(tidyverse)
library(aws.s3)
library(sf)
library(arrow)
library(viridis)
aws.signature::use_credentials()
Sys.setenv("AWS_DEFAULT_REGION" = "eu-west-1")
ts <- "trase-storage"

str_trans <- function(x) {
    x %>%
        stringi::stri_trans_toupper() %>%
        stringi::stri_trans_general("Latin-ASCII")
}

# Load the municipal boundaries
munis <- s3read_using(
    object = "s3://trase-storage/brazil/spatial/boundaries/ibge/2023/br_municipalities_wgs84_2023.geojson",
    FUN = read_sf,
    bucket = "trase-storage",
    opts = c("check_region" = T)
)
munis <- munis %>%
    mutate(GEOCODE = as.character(ibge_municipality_id))
munis <- munis %>% st_simplify()
biomes <- s3read_using(
    object = "s3://trase-storage/brazil/spatial/boundaries/ibge/br_municipality_biome/Bioma_Predominante_por_Municipio_2024.csv",
    FUN = read_delim,
    bucket = "trase-storage",
    opts = c("check_region" = T)
)
biomes <- biomes %>%
    mutate(BIOME = str_trans(`Bioma predominante`))


# [A] Load raw SIGSIF for 2015-2019
# Note: these data are semi_colon delimted
path <- "brazil/production/statistics/sigsif/out/"
keys <- map_chr(get_bucket(ts, prefix = path), "Key")
keys <- keys[!grepl("old", keys)]
keys_semi_colon <- keys[grepl("SIF_2015_2017|SIF_2018|SIF_2019", keys)]
keys_semi_colon <- keys_semi_colon[grepl("csv", keys_semi_colon)]
sigsif_2015_2019 <-
    map_df(
        keys_semi_colon,
        ~ s3read_using(
            object = .x,
            FUN = read_delim,
            col_types = cols(GEOCODE = col_character()),
            delim = ";",
            bucket = ts,
            opts = c("check_region" = T)
        )
    )

# [B] Load raw SIGSIF for 2020, 2021, 2022
# Note: these data are not comma delimted
# keys_comma <- keys[grepl("SIF_2020|SIF_2021|SIF_2022", keys)]
# keys_comma <- keys_comma[grepl("csv", keys_comma)]
# sigsif_2020_2022 <-
#     map_df(
#         keys_comma,
#         ~ s3read_using(
#             object = .x,
#             FUN = read_delim,
#             col_types = cols(GEOCODE = col_character()),
#             delim = ";",
#             bucket = ts,
#             opts = c("check_region" = T)
#         )
#     )
# any(is.na(sigsif_2020_2022$GEOCODE))
sigsif_2020 <- s3read_using(
    object = "s3://trase-storage/brazil/production/statistics/sigsif/original/SIF_2020_extracted_2025-03-27.parquet",
    FUN = read_parquet,
    bucket = ts,
    opts = c("check_region" = T)
)
sigsif_2020 <- sigsif_2020 %>% rename(GEOCODE = ibge_municipality_id)
sigsif_2021 <- s3read_using(
    object = "s3://trase-storage/brazil/production/statistics/sigsif/original/SIF_2021_extracted_2025-03-27.parquet",
    FUN = read_parquet,
    bucket = ts,
    opts = c("check_region" = T)
)
sigsif_2021 <- sigsif_2021 %>% rename(GEOCODE = ibge_municipality_id)
sigsif_2022 <- s3read_using(
    object = "s3://trase-storage/brazil/production/statistics/sigsif/original/SIF_2022_extracted_2025-03-27.parquet",
    FUN = read_parquet,
    bucket = ts,
    opts = c("check_region" = T)
)
sigsif_2022 <- sigsif_2022 %>% rename(GEOCODE = ibge_municipality_id)
sigsif_2023 <- s3read_using(
    object = "s3://trase-storage/brazil/production/statistics/sigsif/original/SIF_2023_extracted_2025-03-27.parquet",
    FUN = read_parquet,
    bucket = ts,
    opts = c("check_region" = T)
)
sigsif_2023 <- sigsif_2023 %>% rename(GEOCODE = ibge_municipality_id)

# [C] Join 2015-2023 data together
sigsif <- bind_rows(sigsif_2015_2019, sigsif_2020, sigsif_2021, sigsif_2022, sigsif_2023)
# sigsif <- sigsif %>% filter(YEAR <= 2021)

(any(is.na(sigsif$GEOCODE)) == FALSE)
# Check for consistency in SIGSIF data across years
sigsif %>%
    group_by(YEAR, TYPE) %>%
    summarise(SUM_HEAD = sum(QUANTITY)) %>%
    ungroup() %>%
    drop_na(TYPE) %>%
    ggplot(aes(YEAR, SUM_HEAD)) +
    geom_bar(stat = "identity", position = "dodge") +
    facet_wrap(~TYPE, scales = "free_y") +
    theme_classic() +
    labs(
        title = "Check consistency",
        subtitle = "2015-2023"
    )

# Identify the proportion per GEOCODE, per STATE_SLAUGHTER
sigsif_bov_annual <-
    sigsif %>%
    filter(TYPE == "CATTLE") %>%
    group_by(YEAR, STATE_SLAUGHTER) %>%
    mutate(TOTAL_SLAUGHTERED_HEADS = sum(QUANTITY)) %>%
    group_by(YEAR, STATE_SLAUGHTER, GEOCODE) %>%
    summarise(PROP_FLOWS = sum(QUANTITY) / unique(TOTAL_SLAUGHTERED_HEADS)) %>%
    ungroup()

# Identify the proportion per GEOCODE, per STATE_SLAUGHTER
extract_sq_per_state <- function(state_name, data) {
    data <- data %>% filter(STATE_SLAUGHTER == state_name)
    if (length(unique(data$YEAR)) > 1 & length(unique(data$GEOCODE)) > 1) {
        data$GEOCODE <- as.factor(data$GEOCODE)
        if (nlevels(data$GEOCODE) > 1) {
            m1 <-
                lm(
                    data = data,
                    PROP_FLOWS ~ GEOCODE
                )
            model_output <- summary(m1)
            model_r_sq <- model_output$r.squared
            tibble(
                "STATE" = state_name,
                "R_SQ" = model_r_sq
            )
        } else {
            NULL
        }
    } else {
        NULL
    }
}


res <-
    map_df(
        .x = unique(sigsif_bov_annual$STATE_SLAUGHTER),
        .f = extract_sq_per_state,
        data = sigsif_bov_annual
    )
# Visualise the consistency across years
ggplot(
    data = res,
    aes(STATE, R_SQ)
) +
    geom_point() +
    scale_y_continuous(limits = c(0, 1)) +
    theme_bw() +
    geom_hline(yintercept = 0.9, linetype = "dashed") +
    labs(
        title = "R-sq of municipal-level sourcing of SIFs per state",
        subtitle = "R-sq = 0.9 shown as dashed line"
    )

# chk <- inner_join(munis, sigsif_bov_annual %>% filter(STATE_SLAUGHTER == "SP"), by = "GEOCODE")
# ggplot() +
#     # geom_sf(data = brazil, fill = "lightgrey") +
#     geom_sf(data = chk, aes(fill = PROP_FLOWS), col = NA) +
#     facet_wrap(~YEAR) +
#     theme_minimal() +
#     # coord_sf(datum = NA) +
#     scale_fill_viridis(
#         option = "magma",
#         name = "% SOURCING",
#         na.value = "white",
#         direction = -1
#     )
# Make cleaned output dataset

# Erasmus choose to aggregate these data across years (i.e. have one municipal-level supply shed per state), because:

# * [i] the above show that, for the most part, there's little variation between years;
# * [ii] the main use of these data are for SEI-PCS/the dPaP app, where they are paired with GTA data to identify sourcing per logistics hub. Here, the GTA data are also aggregated across years (in that case, because of patchy data), also aggregating SIGSIF therefore leads to more methodological consistency;
# * [iii] here we only load SIGSIF data from 2015-2023, though we intend to map exports for 2010-2023, aggregating across years avoids the need to go back and download SIGSIF data for earlier years (i.e. 2010-2014).

# Make cleaned output dataset
# Identify the proportion per GEOCODE, per STATE_SLAUGHTER
sigsif_bov_summary <-
    sigsif %>%
    filter(TYPE == "CATTLE") %>%
    group_by(STATE_SLAUGHTER) %>%
    mutate(TOTAL_SLAUGHTERED_HEADS = sum(QUANTITY)) %>%
    group_by(STATE_SLAUGHTER, GEOCODE) %>%
    summarise(
        SUM_QUANTITY = sum(QUANTITY),
        PROP_FLOWS = SUM_QUANTITY / unique(TOTAL_SLAUGHTERED_HEADS)
    ) %>%
    ungroup()


# plot_sourcing <- function(state_name, data) {
#     munis_to_plot <-
#         inner_join(munis %>% mutate(GEOCODE = ibge_municipality_id), sigsif_bov_summary, by = "GEOCODE")
#     p <- ggplot() +
#         # geom_sf(data = brazil, fill = 'grey') +
#         geom_sf(
#             data = munis_to_plot,
#             aes(fill = PROP_FLOWS * 100)
#         ) +
#         labs(title = paste("Municipal-level sourcing of SIFs in", state_name)) +
#         theme_minimal() +
#         coord_sf(datum = NA) +
#         scale_fill_viridis(
#             option = "magma",
#             name = "% SOURCING",
#             na.value = "white",
#             direction = -1
#         ) +
#         rm(munis_to_plot)
#     return(p)
# }
# map(
#     .x = unique(sigsif_bov_summary$STATE_SLAUGHTER),
#     .f = plot_sourcing,
#     data = sigsif_bov_summary
# )

# add biome
sigsif_bov_summary <- sigsif_bov_summary %>%
    left_join(
        biomes %>% select(GEOCODE = `Geocódigo`, BIOME),
        by = "GEOCODE"
    )
# export
sigsif_bov_summary <- sigsif_bov_summary %>%
    rename(QUANTITY = SUM_QUANTITY) %>%
    s3write_using(
        sigsif_bov_summary,
        object = "brazil/production/statistics/sigsif/out/br_bovine_slaughter_2015_2023.parquet",
        FUN = write_parquet,
        bucket = "trase-storage",
        opts = c("check_region" = T)
    )
import pandas as pd


def model(dbt, cursor):
    dbt.source("trase-storage-raw", "br_municipalities_wgs84_2023")
    dbt.source("trase-storage-raw", "br_municipality_biome_2024")
    dbt.ref("sif_2015_2017")
    dbt.ref("sif_2018")
    dbt.ref("sif_2019")
    dbt.source("trase-storage-raw", "br_sigsif_slaughter_original_2020")
    dbt.source("trase-storage-raw", "br_sigsif_slaughter_original_2021")
    dbt.source("trase-storage-raw", "br_sigsif_slaughter_original_2022")
    dbt.source("trase-storage-raw", "br_sigsif_slaughter_original_2023")

    raise NotImplementedError()
    return pd.DataFrame({"hello": ["world"]})