Skip to content

View or edit on GitHub

This page is synchronized from trase/models/brazil/soy/CNPJ_evaluation_2022.ipynb. Last modified on 2026-05-06 16:54 CEST by Trase Admin. Please view or edit the original file there; changes should be reflected here after a midnight build (CET time), or manually triggering it with a GitHub action (link).

import pandas as pd

pd.options.display.float_format = "{:,.2f}".format

Load Dataset

The datasets below were generated during the running of soy model for 2022.

flows = pd.read_csv("2022/prepared/flows.csv", sep=";")
cnpj = pd.read_csv("2022/prepared/cnpj.csv", sep=";")

Check CNPJ correspondency in BoL 2022

matched_cnpjs = flows.merge(
    cnpj, how="outer", left_on=["exporter_cnpj"], right_on=["cnpj"], indicator=True
)
d = matched_cnpjs.groupby(by=["_merge"])[["vol"]].sum()
d["perc"] = d["vol"].div(d["vol"].sum()).mul(100)

d
/var/folders/42/_z7t3f3n2b74bggwwctg77z80000gn/T/ipykernel_80648/1822364778.py:1: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  d = matched_cnpjs.groupby(by=['_merge'])[['vol']].sum()
vol perc
_merge
left_only 0.00 0.00
right_only 0.00 0.00
both 107,675,943,132.86 100.00

Checking if there are known municipalities for EU economic bloc and UNKNOWN COUNTRY EUROPEAN UNION

check_mun = matched_cnpjs[
    (matched_cnpjs["economic_bloc"] == "EUROPEAN UNION")
    & (matched_cnpjs["country_of_destination"] == "UNKNOWN COUNTRY EUROPEAN UNION")
    & (matched_cnpjs["_merge"] == "both")
]

check_mun["municipality_check"] = "UNKNOWN"
check_mun.loc[check_mun["tax_municipality_y"] != "BR-XXXXXXX", "municipality_check"] = (
    "CNPJ_WITH_MUNICIPALITY"
)
/var/folders/42/_z7t3f3n2b74bggwwctg77z80000gn/T/ipykernel_80648/2083031557.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  check_mun['municipality_check'] = 'UNKNOWN'
a = check_mun.groupby(by=["municipality_check"])[["vol"]].sum()
a["perc"] = a["vol"].div(a["vol"].sum()).mul(100)

a
vol perc
municipality_check
CNPJ_WITH_MUNICIPALITY 4,318,777,004.87 99.31
UNKNOWN 29,899,000.00 0.69