Skip to content

Empresas Cadastro 2021

s3://trase-storage/brazil/auxiliary/secex/cleaned/EMPRESAS_CADASTRO_2021.csv

Dbt path: trase_production.main_brazil.empresas_cadastro_2021

Explore on Metabase: Full table; summary statistics

Containing yaml file link: trase/data_pipeline/models/brazil/auxiliary/secex/_schema.yml

Model file link: trase/data_pipeline/models/brazil/auxiliary/secex/cleaned/empresas_cadastro_2021.py

Calls script: trase\tools\aws\metadata.py

Dbt test runs & lineage: Test results ยท Lineage

Full dbt_docs page: Open in dbt docs (includes lineage graph -at the bottom right-, tests, and downstream dependencies)

Tags: mock_model, auxiliary, brazil, cleaned, secex


empresas_cadastro_2021

Description

SECEX Empresas Cadastro

The Cadastro de Empresas Exportadoras e Importadoras (Registration of Exporting and Importing Companies) was a dataset produced by the Subsecretariat of Intelligence and Foreign Trade Statistics (SECEX), part of Brazil's Ministry of Development, Industry, Commerce, and Services (MDIC). This dataset comprised a list of establishments that have engaged in export, import, or both activities during a given reference year. It included information on where the CNPJs are registered (with zip code (CEP), municipality, etc.).

The dataset was discontinued due to privacy concerns. The reasoning is detailed in this document (also available on S3 under s3://trase-storage/brazil/auxiliary/secex/Nota-sobre-lista-de-exportadores-e-importadores.pdf). The document presents a problem of confidentiality on tax info for exporting and importing companies, when people merge two different datasets available: (a) List of companies exporting/importing and (b) Municipalities exporting/importing. Because of that they decided to remove one of them, and the decision was in favor of removing the list of companies, considering that the municipality statistics are more popular and useful for public policies.

The latest data provided was 2021, which can be seen on the last web archive snapshot of the download page. We downloaded and processed all years up to and including 2021 and stored it on S3.

How we use the dataset in Trase

This dataset was used in the "MDIC Disaggregation" model, which is an important input to the Brazil beef SEI-PCS model. It was also used as part of the Brazil soy SEI-PCS model.

How often the dataset is updated, and when the next update is likely to be.

This dataset is not being updated anymore.

How to re-fetch the dataset from the original source.

This used to be available at https://www.gov.br/produtividade-e-comercio-exterior/pt-br/assuntos/comercio-exterior/estatisticas/outras-estatistica-de-comercio-exterior but the link is no longer active. However, the data is stored on our AWS S3 account.

The script that is used to process/clean the dataset

trase/data/brazil/auxiliary/secex/cleaned/EMPRESAS_CADASTRO_201X.py.

When the dataset was last updated, and by whom

The 2021 dataset was last updated by Nanxu Su on 2023-07-26. Previous years were updated by Harry Biddle.

A history of changes/notes of the dataset

  • 2023: the above PDF notice was posted and the website was taken down

Acceptance criteria for sufficient level of quality of the dataset

There are no acceptance criteria defined.


Details

Column Type Description
cnpj VARCHAR
label VARCHAR
street VARCHAR
house_number VARCHAR
neighbourhood VARCHAR
cep VARCHAR
municipality.label VARCHAR
state.uf VARCHAR
activity.description VARCHAR
legal_nature.description VARCHAR
activity.cnae VARCHAR
legal_nature.code BIGINT
state.code BIGINT
state.trase_id VARCHAR
municipality.name VARCHAR
municipality.trase_id VARCHAR

Models / Seeds

  • source.trase_duckdb.trase-storage-raw.uf
  • source.trase_duckdb.trase-storage-raw.empresas_cadastro_2021_original

Sources

  • ['trase-storage-raw', 'uf']
  • ['trase-storage-raw', 'empresas_cadastro_2021_original']

No called script or script source not found.

from trase.data.brazil.auxiliary.secex.cleaned.EMPRESAS_CADASTRO_201X import process


def model(dbt, cursor):
    dbt.config(materialized="external")

    dbt.source("trase-storage-raw", "uf")
    dbt.source("trase-storage-raw", "empresas_cadastro_2021_original")

    return process(2021)