Ibge Biome Municipality 2024 Gold
s3://trase-storage/brazil/spatial/boundaries/ibge/br_municipality_biome/ibge_biome_municipality_2024_gold.parquet
Dbt path: trase_production.main_brazil.ibge_biome_municipality_2024_gold
Explore on Metabase: Full table; summary statistics
Containing yaml file link: trase/data_pipeline/models/brazil/spatial/boundaries/ibge/br_municipality_biome/_schema.yml
Model file link: trase/data_pipeline/models/brazil/spatial/boundaries/ibge/br_municipality_biome/ibge_biome_municipality_2024_gold.py
Calls script: trase/data/brazil/spatial/boundaries/ibge/br_municipality_biome/ibge_biome_municipality_2024_gold.py
Dbt test runs & lineage: Test results · Lineage
Full dbt_docs page: Open in dbt docs (includes lineage graph -at the bottom right-, tests, and downstream dependencies)
Tags: brazil
ibge_biome_municipality_2024_gold
Description
IBGE – Relationship Between Municipalities and Biomes (2024)
This dataset contains the relationship between Brazilian municipalities and their predominant biome, as published by the Brazilian Institute of Geography and Statistics (IBGE) in June 2024. It is part of IBGE’s Biomas project, which supports environmental analysis, policymaking, and statistical studies.
What the dataset contains
The source data — Bioma_Predominante_por_Municipio_2024 — assigns one biome to each municipality or special administrative area in Brazil. This biome is the one covering the largest territorial area in that municipality.
The file includes:
- 5,568 municipalities
- The Federal District (Brasília – DF)
- The State District of Fernando de Noronha – PE
- Two State Operational Areas (Lagoa dos Patos and Lagoa Mirim – RS)
Total: 5,572 rows, one per location.
How we use the dataset in Trase
This dataset is:
- Used through our Brazil pipelines whenever we need to access biome information.
- Ingested into the Trase database
How to fetch the data from the source
- Update frequency: Irregular (based on IBGE releases)
- Latest release: June 2024
- Next expected update: Unknown; depends on IBGE biomes project updates
The original IBGE data can be downloaded manually from https://www.ibge.gov.br/.
The script used to process/clean the dataset
The file trase/data/brazil/spatial/boundaries/ibge/br_municipality_biome/ibge_biome_muncipality_2024_gold.py processes the IBGE list into a clean, analysis-ready dataset.
History
- 2025-08: Cleaned by Harry
Details
| Column | Type | Description |
|---|---|---|
MUNICIPALITY_GEOCODE_IBGE |
VARCHAR |
7-digit IBGE municipality geocode (or special area code) from the source file. |
MUNICIPALITY_LABEL |
VARCHAR |
Municipality name (accent/spacing preserved as in IBGE, cleaned for consistency). |
MUNICIPALITY_STATE_UF |
VARCHAR |
Two-letter state (UF) abbreviation. |
BIOME_LABEL_ORIGINAL |
VARCHAR |
Biome label exactly as provided by IBGE (“Bioma predominante”). |
MUNICIPALITY_TRASE_ID |
VARCHAR |
Trase node identifier for the municipality, formatted as BR- |
MUNICIPALITY_NODE_ID |
BIGINT |
Integer node id in of the muncipality in the Trase PostgreSQL database. |
BIOME_LABEL_CLEANED |
VARCHAR |
Cleaned biome label (uppercased/diacritics-normalized via clean_string) used for node matching. |
BIOME_NAME |
VARCHAR |
Canonical/default node name for the matched biome in the Trase PostgreSQL database. |
BIOME_TRASE_ID |
VARCHAR |
Trase node identifier for the biome. |
Models / Seeds
source.trase_duckdb.trase-storage-raw.br_municipality_biome_2024
Sources
['trase-storage-raw', 'br_municipality_biome_2024']
import pandas as pd
from psycopg2 import sql
from trase.tools import get_node_sub_type_id, get_country_id, CNX
from trase.tools.aws import get_pandas_df
from trase.tools.aws.metadata import write_parquet_for_upload
from trase.tools.pandasdb.find import find_nodes_by_trase_id, find_nodes_by_name
from trase.tools.utilities.helpers import clean_string
def process(df):
# the bottom of the CSV contains a footer with metadata - let's skip this
geocodes_in_footer = [
"--------",
"CAMPO",
"Geocódigo",
"Nome do município",
"Sigla da UF",
"Bioma predominante",
"Referência",
]
df = df[~df["Geocódigo"].isin(geocodes_in_footer)]
# remove two lagoons
lagoons = [
"4300001", # Lagoa Mirim
"4300002", # Lagoa dos Patos
]
df = df[~df["Geocódigo"].isin(lagoons)]
# add trase id
ibge_geocodes = df["Geocódigo"]
assert all(
ibge_geocodes.str.isdigit() & (ibge_geocodes.str.len() == 7)
), "Geocódigos must be 7 digits long"
df["trase_id"] = "BR-" + ibge_geocodes
# add municipality node id
df[["municipality_node_id"]] = find_nodes_by_trase_id(
df[["trase_id"]],
returning=["node_id"],
)
assert not any(df["municipality_node_id"].isna())
df["municipality_node_id"] = df["municipality_node_id"].astype(int)
assert all(df["municipality_node_id"] > 0)
# identify the biomes
biome_sub_type_id = get_node_sub_type_id("BIOME")
brazil_country_node_id = get_country_id("BRAZIL")
df["biome_label"] = df["Bioma predominante"].apply(clean_string)
df[["biome_name", "biome_trase_id"]] = find_nodes_by_name(
df,
returning=["default_name", "trase_id"],
name=sql.Identifier("biome_label"),
on_extra_columns="ignore",
sub_type_id=sql.Literal(biome_sub_type_id),
parent_id=sql.Literal(brazil_country_node_id),
)
assert not any(df["biome_name"].isna())
# are there any municipalities without a biome?
# yes - but mostly -AGGREGATED, -XXXX etc
municipality_sub_type_id = get_node_sub_type_id("MUNICIPALITY")
df_all_municipalities = pd.read_sql(
f"""
select id as municipality_node_id, trase_id from main.nodes
where sub_type_id = {municipality_sub_type_id}
and trase_id like 'BR-%'
""",
CNX.cnx,
)
df_municipalities_without_biome = pd.merge(
df_all_municipalities,
df[["trase_id"]],
on="trase_id",
how="outer",
indicator=True,
)
indicator = df_municipalities_without_biome.pop("_merge")
assert not any(indicator == "right_only")
df_municipalities_without_biome = df_municipalities_without_biome[
indicator == "left_only"
]
trase_ids_without_biome = df_municipalities_without_biome["trase_id"]
assert all(
trase_ids_without_biome.str.endswith("-AGGREGATED")
| trase_ids_without_biome.str.endswith("XXXXX")
| trase_ids_without_biome.str.endswith("IMPORTED-MUNICIPALITY")
)
# match conventional names for other indicators
df.columns = [clean_string(column) for column in df.columns]
df = df.rename(
columns={
"GEOCODIGO": "MUNICIPALITY_GEOCODE_IBGE",
"NOME DO MUNICIPIO": "MUNICIPALITY_LABEL",
"SIGLA DA UF": "MUNICIPALITY_STATE_UF",
"BIOMA PREDOMINANTE": "BIOME_LABEL_ORIGINAL",
"BIOME_LABEL": "BIOME_LABEL_CLEANED",
"TRASE_ID": "MUNICIPALITY_TRASE_ID",
},
errors="raise",
)
return df
if __name__ == "__main__":
df_original = get_pandas_df(
"brazil/spatial/boundaries/ibge/br_municipality_biome/Bioma_Predominante_por_Municipio_2024.csv",
sep=";",
dtype=str,
na_filter=False,
)
df = process(df_original)
write_parquet_for_upload(
df,
"brazil/spatial/boundaries/ibge/br_municipality_biome/ibge_biome_municipality_2024_gold.parquet",
)
from trase.data.brazil.spatial.boundaries.ibge.br_municipality_biome.ibge_biome_municipality_2024_gold import (
process,
)
def model(dbt, cursor):
dbt.config(materialized="external")
df = dbt.source("trase-storage-raw", "br_municipality_biome_2024").df()
return process(df)