Slaughterhouses

View or edit on GitHub

This page is synchronized from trase/data/brazil/logistics/slaughterhouses/readme.md. Last modified on 2025-12-14 23:19 CET by Trase Admin. Please view or edit the original file there; changes should be reflected here after a midnight build (CET time), or manually triggering it with a GitHub action (link).

Slaughterhouse Map Data Pipeline (Brazil)

This document outlines the process for creating the Brazilian slaughterhouse map, a product built using data from two main sources: SIF (Federal Inspection System) and SISBI (Brazilian Integrated System of Animal Products Inspection). This pipeline consists of a series of R scripts that scrape, process, and consolidate data to generate the final map. Intermediate files are stored on S3 for data management and reproducibility.

Data Sources

We leverage two primary data sources for this project:

SIF (Federal Inspection Service): Federal sanitary inspection data in Brazil. Data is initially obtained by scraping CSV files provided by the Brazilian government.
SISBI (Brazilian Integrated System of Animal Products Inspection): State and Municipal level sanitary inspection data, also containing some SIF data. Data is scraped from various APIs associated with the SISBI system.

Pipeline Scripts and Process

The data pipeline is structured into the following steps, with each step corresponding to an R script located within the repository.

1. SIF Data Collection and Processing

This section describes the steps to process data from the SIF

1.1. Scraping SIF Data (`TRASE/trase/data/brazil/logistics/sanitary_inspections/animal_products/SIF/in/scrape_SIF_all.r`)

Input: CSV files provided by the Brazilian government (scraped by the script).
Functionality: This script scrapes two CSV files:
- List of SIF registered facilities.
- List of SIF approved exporters (indicating authorization to export to specific countries).
Output: Two CSV files stored in the in subdirectory, named with the creation date: TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SIF/in/.
- One file containing the list of facilities.
- One file containing the list of approved exporters.

1.2. Harmonizing and Geocoding SIF Data (`TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SIF/out/20XX-XX-XX-SIF_new.r`)

Input: The two CSV files generated in step 1.1, located in the in directory.
Functionality: This script takes the raw SIF data and performs the following operations:
- Harmonizes the data structure.
- Adds metadata describing data type and resolution.
- Collects geocodes (latitude and longitude) for each facility.
- Identifies the commodity associated with each facility.
Output: A harmonized CSV file stored in the out subdirectory, named with the system date of data creation: TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SIF/out/[SYSTEM_DATE]_SIF.csv.

2. SISBI Data Collection and Processing

This section describes the steps to process data from the SISBI.

2.1. Scraping SISBI Data (`TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SISBI/in/scrape_SISBI_all.r`)

Input: Various APIs associated with the SISBI data system.
Functionality: This script leverages multiple APIs to scrape comprehensive SISBI data. The process involves several steps within the script:
- Step 1: API Scraping: Initial data retrieval from SISBI APIs. Output: Exports 3 raw csv files with information on establishments, products, and capacities\

2.1. Cleaning SISBI Data (`TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SISBI/out/20XX-XX-XX_SISBI_ALL.R`)

Input: CSV files generated in step 2.1.
- Step 1: Loading Data: Loading csv files
- Step 2: Data Harmonization: Combining data fragments from different API endpoints to create a unified dataset.
- Step 3: Capacity Integration: Scraping and integrating slaughterhouse capacity information.
- Step 4: Address Scraping: Retrieving address information using IDs available from the APIs.
- Step 4.1: Location and Zip Code Scraping: Extracting location details and zip codes.
- Step 4.2: Latitude and Longitude Scraping: Obtaining geographic coordinates for each facility.
- Step 4.3 (and subsequent steps): Further data refinement and harmonization (details within the script).
Output: A consolidated and harmonized CSV file containing SISBI data, stored in the out subdirectory, named with the system date of data creation and _all suffix: TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SISBI/out/[SYSTEM_DATE]_SISBI_all.csv.

3. Slaughterhouse Map Consolidation

This final step combines the processed SIF and SISBI data to generate the final slaughterhouse map.

3.1. Consolidating SIF and SISBI Data (`TRASE/trase/data/Brazil/logistics/slaughterhouses/slaughterhouse_map_v5/slaughterhouse_map_v5.r`)

Input:
- Output file from SIF processing (step 1.2): TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SIF/out/[SYSTEM_DATE]_SIF.csv
- Output file from SISBI processing (step 2.1): TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SISBI/out/[SYSTEM_DATE]_SISBI_all.csv
Functionality: This script performs the following:
- Reads the processed SIF and SISBI data files.
- Harmonizes the datasets further to ensure consistency.
- Merges the SIF (federal data) and SISBI (state/municipal and some federal data) datasets.
- Cleans the data to be ingested in TRASE
Output: The final slaughterhouse map CSV file, stored in the slaughterhouse_map_v4 subdirectory, named with the system date and BR_beef_logistics_map_v4.csv: TRASE/trase/data/Brazil/logistics/slaughterhouses/slaughterhouse_map_v4/[SYSTEM_DATE]_BR_beef_logistics_map_v4.csv.

Directory Structure

TRASE/
└── trase/
    └── data/
        └── Brazil/
            └── logistics/
                ├── sanitary_inspections/
                │   └── animal_products/
                │       └── SIF/
                │           ├── in/           # Input directory for SIF raw data
                │           ├── out/          # Output directory for processed SIF data
                │           ├── in/scrape_SIF_all.r
                │           └── out/20XX-XX-XX-SIF_new.r
                │       └── SISBI/
                |           ├── in/           # Input directory for SIF raw data
                │           ├── out/          # Output directory for processed SISBI data
                │           └── in/scrape_SISBI_all.r
                │           └── out/20XX-XX-XX_SISBI_ALL.R
                ├── slaughterhouses/
                │   └── slaughterhouse_map_v4/
                │       ├── slaughterhouse_map_v4.r
                │       └── [SYSTEM_DATE]_BR_beef_logistics_map_v4.csv # Example output file
                │   └── slaughterhouse_map_v5/
                │       ├── slaughterhouse_map_v5.r
                │       ├── 2025_03_25_br_beef_logistics_map_v5_new # Additional cleaning implemented

Running the Pipeline

To execute the pipeline, you need to run the R scripts in the order described above:

TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SIF/in/scrape_SIF_all.r
TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SIF/out/20XX-XX-XX-SIF_new.r
TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SISBI/in/scrape_SISBI_all.r
TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SISBI/out/20XX-XX-XX_SISBI_ALL.R ~~5. TRASE/trase/data/Brazil/logistics/slaughterhouses/slaughterhouse_map_v5/slaughterhouse_map_v5.r~~
TRASE/trase/data/Brazil/logistics/slaughterhouses/slaughterhouse_map_v5/2025_03_25_br_beef_logistics_map_v5_new

Prerequisites: * R and Python environment installed. * Necessary R packages for data manipulation, web scraping, and geocoding (check script dependencies). * Access to S3 to store intermediate files (configuration details will be within the scripts).

Release to production

Follow the instructions in this doc. In step 2, ensure that you delete columns "level" and "resolution".

This README provides a high-level overview of the Slaughterhouse Map data pipeline. For detailed information on specific script functionality, see the comments in each R script. For any questions or contributions, please contact vivian.ribeiro@diversa.earth IMPORTANT: This readme file was created by AI and may have minor file path distortions.