Rotadogado

View or edit on GitHub

This page is synchronized from trase/data/brazil/logistics/gta/originals/rotadogado/Readme.md. Last modified on 2026-02-03 10:30 CET by Jason J. Benedict. Please view or edit the original file there; changes should be reflected here after a midnight build (CET time), or manually triggering it with a GitHub action (link).

Download of GTAs from 'Rota do Gado'

These scripts download the GTA information from Reporter Brazil's service 'Rota do Gado', using its REST API (see Rota_do_Gado_API.pdf in this directory).

The API allows for retrieving: - A summary of a given slaughterhouse or farm, including the total animal movements and their purposes, the geographic polygons of associated physical establishments, fines, embargos, among others. - The GTAs where a given slaughterhouse or farm is an origin or destination.

Main script (`rotadogado_download.py`)

Summary

Will save in a specified S3 destination the summary and GTAs of a list of CPFs/CNPJs of slaughterhouses and farms. If will only retrieve and save new or updated information based on the contents of the summary data.

Keep in mind that each record takes in average 1.4 seconds to process. When processing hundreds of thousands of records, running the script will take some days to complete.

Main steps

The script will process two data sources of slaughterhouses: the official mapping that includes all their information (e.g. brazil/logistics/gta/originals/rotadogado/2024-08-07-br_beef_logistics_map_v4.csv, generated by Erasmus), and a list with the tax numbers of slaughterhouses previously found in the GTAs (brazil/logistics/gta/farms_and_slaughterhouses/taxn_lists/silver_gtas_slaughterhouses_taxn.parquet, generated by a corresponding dbt-duckdb model - trase/database/dbt_duckdb/models/brazil/logistics/gta/farms_slaughterhouses/silver_gtas_slaughterhouses_taxn.py). It will merge the tax numbers from both lists, and will download its summaries and GTAs from Rota do Gado API.
It will also process a list of farms based on previous GTA records (taken from BigQuery)
It will keep a local log reporting the progress of the script, including checkpoints every 20 minutes in case the process needs to be resumed
It uploads results to S3 as they are generated, and uplloads the log when the script has finished processing

Special remarks

The main script processes a CSV containing the identification of slaughterhouses (usually a CNPJ with 14 digits, or a 11 digit CPF). It then iterates through the id's doing the following: - For CNPJs (14 digits), it replaces the last 6 digits with 'xxxxxx' (e.g. '01535759000131' turns to '01535759xxxxxx', as the API expects CNPJs this way). For id's of different lengths, it tries to find associated records to the id as it is, as well as completing the first digits with 0's at the beggining until reaching 11 or 14 digits (e.g. '740185667' will also look for the record '00740185667', and if both exist they will be saved into different files, as usually they contain different information - even if they refer to the same farm). - Saves in S3 the summaries and GTAs related to each slaughterhouse and farm. If it didn't found information in Rota do Gado it saves the 'organization id' in a file called missing_records.txt. - The log is currently configured at INFO level. In case of having connectivity, authentication or other unidentified issues, consider adjusting to DEBUG.

S3 file structure

The associated information (summary and GTAs) of each slaughterhouse or farm is saved into json files. Note that if processing tens thousands of farms and slaughterhouses, each one will have a corresponding json. The data is saved within the following structure (configurable within the script), and is intended to be available in s3://trase-storage/brazil/logistics/gta/originals/rotadogado:

└── rotadogado                       # Base path for saving data
      ├── slaughterhouses.csv        # Existing csv with slaughterhouses id's
      ├── sending_farms.csv          # Generated csv with the farms to get info from
      ├── slaughterhouses
      |     ├── missing_records.txt  # ids of missing records if any
      │     ├── summaries
      │     │     ├── #####.json     # json with all the summary information
      │     │     └── ...            # One file per slaughterhouse
      │     └── gtas
      │          ├── #####.json      # All associated GTAs of a given slaughterhouse
      │          └── ...             # One file per slaughterhouse
      └── farms
            ├── missing_records.txt
            ├── summaries
            │     ├── #####.json
            │     └── ...
            └── gtas
                  ├── #####.json
                  └── ...

Main control options

The script includes options to control the following through the process_orgs method: - If it should 'resume' a previous process (skip processing slaughterhouses or farms that already have associated files saved locally) - If it should first check if the already downloaded summaries are different from the online ones, and if yes download the updated GTA records. - If it should retry currently missing records that were saved in a 'missing_records.txt' file (currently False).