Slaughterhouses

slaughterhouse_map_v6

View or edit on GitHub

This page is synchronized from trase/data/brazil/logistics/slaughterhouses/readme.md. Last modified on 2026-05-07 15:52 CEST by Nicolas Martin. Please view or edit the original file there; changes should be reflected here after a midnight build (CET time), or manually triggering it with a GitHub action (link).

Slaughterhouse Map Data Pipeline (Brazil)

This document outlines the process for creating the Brazilian slaughterhouse map, a product built using data from three main sources: SIF (Federal Inspection System), SIE (State Inspection System) and SISBI (Brazilian Integrated System of Animal Products Inspection). This pipeline consists of a series of R scripts that scrape, process, and consolidate data to generate the final map. Intermediate files are stored on S3 for data management and reproducibility.

Data Sources

We leverage 3 primary data sources for this project:

SIF (Federal Inspection Service): Federal sanitary inspection data in Brazil. Data is initially obtained by scraping CSV files provided by the Brazilian government.
SISBI (Brazilian Integrated System of Animal Products Inspection): State and Municipal level sanitary inspection data, also containing some SIF data. Data is scraped from various APIs associated with the SISBI system.
SIE (State Inspection Service): State sanitary inspection data in Brazil. Data is obtained by downloading CSV and PDF files from each state website.

Pipeline Scripts and Process

The data pipeline is structured into the following steps, with each step corresponding to an R script located within the repository.

1. SIF Data Collection and Processing

This section describes the steps to process data from the SIF

1.1. Scraping SIF Data (`TRASE/trase/data/brazil/logistics/sanitary_inspections/animal_products/SIF/in/scrape_SIF_all.r`)

Input: CSV files provided by the Brazilian government (scraped by the script).
Functionality: This script scrapes two CSV files:
- List of SIF registered facilities.
- List of SIF approved exporters (indicating authorization to export to specific countries).
Output: Two CSV files stored in the in subdirectory, named with the creation date: TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SIF/in/.
- One file containing the list of facilities.
- One file containing the list of approved exporters.

1.2. Harmonizing and Geocoding SIF Data (`TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SIF/out/20XX-XX-XX-SIF_new.r`)

Input: The two CSV files generated in step 1.1, located in the in directory.
Functionality: This script takes the raw SIF data and performs the following operations:
- Harmonizes the data structure.
- Adds metadata describing data type and resolution.
- Collects geocodes (latitude and longitude) for each facility.
- Identifies the commodity associated with each facility.
Output: A harmonized CSV file stored in the out subdirectory, named with the system date of data creation: TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SIF/out/[SYSTEM_DATE]_SIF.csv.

2. SISBI Data Collection and Processing

This section describes the steps to process data from the SISBI.

2.1. Scraping SISBI Data (`TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SISBI/in/scrape_SISBI_all.r`)

Input: Various APIs associated with the SISBI data system.
Functionality: This script leverages multiple APIs to scrape comprehensive SISBI data. The process involves several steps within the script:
- Step 1: API Scraping: Initial data retrieval from SISBI APIs. Output: Exports 3 raw csv files with information on establishments, products, and capacities\

2.2. Cleaning SISBI Data (`TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SISBI/out/20XX-XX-XX_SISBI_ALL.R`)

Input: CSV files generated in step 2.1.
- Step 1: Loading Data: Loading csv files
- Step 2: Data Harmonization: Combining data fragments from different API endpoints to create a unified dataset.
- Step 3: Capacity Integration: Scraping and integrating slaughterhouse capacity information.
- Step 4: Address Scraping: Retrieving address information. In the most recent update (2026), the API previously used to retrieve address information is no longer functioning. Therefore, a new API is now being considered as an alternative. However, this API only provides address information for CNPJs (company registrations). As a result, registries associated with individuals (CPFs) will either retain the same address information as in previous files or have no address information available. In cases where no address is available, the process later assigns the centrality of the registered municipality.
- Step 4.1: Location and Zip Code Scraping: Extracting location details and zip codes.
- Step 4.2: Latitude and Longitude Scraping: Obtaining geographic coordinates for each facility.
- Step 4.3 (and subsequent steps): Further data refinement and harmonization (details within the script).
Output: A consolidated and harmonized CSV file containing SISBI data, stored in the out subdirectory, named with the system date of data creation and _all suffix: TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SISBI/out/[SYSTEM_DATE]_SISBI_all.csv.

3. SIE Data Collection and Processing

This section describes the steps to process data from SIE.

3.1 Scraping SIE Data

Process: The file SIE_states.xlsx contains the list of links used to download the SIE data for each state (file located at the S3 path: trase-storage/brazil/logistics/sanitary_inspections/animal_products/sie/).
- Step 1: Downloading Data: Check each state website and download the most recent version of the list of establishments registered in the SIE system. Some states provide a CSV file for download, but in other cases, the data may be available as a PDF file or only as a list displayed on the website. In these situations, additional processing may be required to convert the PDF or webpage list into an Excel file so that it can later be merged into a unified SIE dataset.
- Step 2: Uploading Data into S3: Once all data has been extracted, the Excel files should be uploaded to the S3 path: trase-storage/brazil/logistics/sanitary_inspections/animal_products/sie/in/. A folder corresponding to the current year must be created within this directory, and all files should be stored in that folder.

3.2 Cleaning SIE Data (`TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SIE/clean_brazil_sie_2026.R`)

Input: All Excel files saved as described in Step 3.1.
Step 1: Loading Data: The Excel file for each state is loaded.
Step 2: Standardizing Files: Each state provides different levels of information. Therefore, for each state it is necessary to rename columns and create any missing variables. When a state does not provide a specific variable, the variable should be created as a column filled with NA.
- Step 2.1: Column Names: All files must contain the following columns: CNPJ, INSPECTION_NUM, COMPANY, ADDRESS, STATUS, MUNICIPALITY, TYPE, CEP, SISBI, ESTABLISHMENT, SOURCE, STATE, INSPECTION_LEVEL.
In many cases the information already exists and only requires column renaming. When a variable is not available, a column filled with NA must be created. * COMPANY = RAZAO SOCIAL * ADDRESS = ENDERECO * MUNICIPALITY = MUNICIPIO * TYPE = CLASSIFICACAO * ESTABLISHMENT = NOME FANTASIA If only one name field is available, set ESTABLISHMENT = COMPANY. * CEP = POSTAL CODE * SISBI = Some states provide information on whether the company is also registered in the SISBI platform. * STATUS = SITUACAO (e.g., active, inactive, cancelled, etc.) * INSPECTION_NUM = SIE number * SOURCE = Name of the state inspection agency (manually inserted). * STATE = Abbreviation of the state (manually inserted; corresponds to the state from which the data was downloaded). * INSPECTION_LEVEL = 'SIE' (manually inserted).
Step 3: Merging Files: Once the data from each state has the same variables, all files are merged into a single data frame.
Step 4: Adding Geocode: Using a reference dataset containing all Brazilian municipalities with their official IBGE geocodes, the merged SIE data is updated to include the corresponding geocode for each municipality.
- Step 4.1: Verification: SIE data can contain misspelt municipality names. To verify accuracy, check whether the first two digits of the geocode match the state of the company. For cases where they do not match, manually correct the municipality name and update the geocode information accordingly.
Step 5: Variable creation: Create variables LEVEL and COMMODITY
Step 6: Lat and Long: Using google API we update the latitude nad longitude information for the cases where the address information is available.
Step 7: Finding Address Information: Using an API that contains address information for comapnies, the address information on the SIE data is updated and once again the Google API is used to retrive lat and long information. When such information is not available we use the centroid of the registered municipality.

Output: A consolidated and harmonized CSV file containing SIE data, stored in the out subdirectory, named with the system date of data creation and _all suffix: TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SIE/out/[SYSTEM_DATE]_SIE_all.csv.

4. Slaughterhouse Map Consolidation

This final step combines the processed SIF and SISBI data to generate the final slaughterhouse map.

4.1 Consolidating SIF and SISBI Data (`TRASE/trase/data/Brazil/logistics/slaughterhouses/slaughterhouse_map_v5/20XX_XX_XX_br_beef_logistics_map_v6.R`)

Input:
- Output file from SIF processing (step 1.2): TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SIF/out/[SYSTEM_DATE]_SIF.csv
- Output file from SISBI processing (step 2.1): TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SISBI/out/[SYSTEM_DATE]_SISBI_all.csv
- Output file from SIE processing (step 3.2): TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SIE/out/[SYSTEM_DATE]_SIE_all.csv
Functionality: This script performs the following:
- Reads the processed SIF, SIE and SISBI data files.
- Harmonizes the datasets further to ensure consistency.
- Merges the SIF (federal data), SIE (state data) and SISBI (state/municipal and some federal data) datasets.
- Cleans the data to be ingested in TRASE
- Includes approved exported countries
- Separate English and Portuguese versions of metadata
Output: The final slaughterhouse map CSV file, stored in the slaughterhouse_map_v6 subdirectory, named with the system date and BR_beef_logistics_map_v6.csv: TRASE/trase/data/Brazil/logistics/slaughterhouses/slaughterhouse_map_v6/[SYSTEM_DATE]_BR_beef_logistics_map_v6.csv.

Directory Structure

TRASE/
└── trase/
    └── data/
        └── Brazil/
            └── logistics/
                ├── sanitary_inspections/
                │   └── animal_products/
                │       └── SIF/
                │           ├── in/           # Input directory for SIF raw data
                │           ├── out/          # Output directory for processed SIF data
                │           ├── in/scrape_SIF_all.r
                │           └── out/20XX-XX-XX-SIF_new.r
                │       └── SISBI/
                |           ├── in/           # Input directory for SISBI raw data
                │           ├── out/          # Output directory for processed SISBI data
                │           └── in/scrape_SISBI_all.r
                │           └── out/20XX-XX-XX_SISBI_ALL.R
                │       └── SIE/
                |           ├── in/           # Input directory for SIE raw data
                │           ├── out/          # Output directory for processed SIE data
                ├── slaughterhouses/
                │   └── slaughterhouse_map_v6/
                │       ├── 20XX-XX-XX_br_beef_logistics_map_v6.R

Running the Pipeline

To execute the pipeline, you need to run the R scripts in the order described above:

TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SIF/in/scrape_SIF_all.r
TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SIF/out/20XX-XX-XX-SIF_new.r
TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SISBI/in/scrape_SISBI_all.r
TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SISBI/out/20XX-XX-XX_SISBI_ALL.R
Download each state SIE database
TRASE/trase/data/Brazil/logistics/sanitary_inspections/animal_products/SIE/clean_brazil_sie_2026.R
TRASE/trase/data/Brazil/logistics/slaughterhouses/slaughterhouse_map_v6/20XX_XX_XX_br_beef_logistics_map_v6.R

Prerequisites: * R environments installed. * Necessary R packages for data manipulation, web scraping, and geocoding (check script dependencies). * Access to S3 to store intermediate files (configuration details will be within the scripts).

Release to production

Follow the instructions in this doc. In step 2, ensure that you delete columns "level" and "resolution".

This README provides a high-level overview of the Slaughterhouse Map data pipeline. For detailed information on specific script functionality, see the comments in each R script. For any questions or contributions, please contact thais.pacheco@uclouvain.be IMPORTANT: This readme file was created by AI and may have minor file path distortions.