View or edit on GitHub
This page is synchronized from trase/models/brazil/soy/CNPJ_evaluation_2022.ipynb. Last modified on 2026-05-06 16:54 CEST by Trase Admin.
Please view or edit the original file there; changes should be reflected here after a midnight build (CET time),
or manually triggering it with a GitHub action (link).
import pandas as pd
pd.options.display.float_format = "{:,.2f}".format
Load Dataset
The datasets below were generated during the running of soy model for 2022.
flows = pd.read_csv("2022/prepared/flows.csv", sep=";")
cnpj = pd.read_csv("2022/prepared/cnpj.csv", sep=";")
Check CNPJ correspondency in BoL 2022
matched_cnpjs = flows.merge(
cnpj, how="outer", left_on=["exporter_cnpj"], right_on=["cnpj"], indicator=True
)
d = matched_cnpjs.groupby(by=["_merge"])[["vol"]].sum()
d["perc"] = d["vol"].div(d["vol"].sum()).mul(100)
d
/var/folders/42/_z7t3f3n2b74bggwwctg77z80000gn/T/ipykernel_80648/1822364778.py:1: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
d = matched_cnpjs.groupby(by=['_merge'])[['vol']].sum()
| vol | perc | |
|---|---|---|
| _merge | ||
| left_only | 0.00 | 0.00 |
| right_only | 0.00 | 0.00 |
| both | 107,675,943,132.86 | 100.00 |
Checking if there are known municipalities for EU economic bloc and UNKNOWN COUNTRY EUROPEAN UNION
check_mun = matched_cnpjs[
(matched_cnpjs["economic_bloc"] == "EUROPEAN UNION")
& (matched_cnpjs["country_of_destination"] == "UNKNOWN COUNTRY EUROPEAN UNION")
& (matched_cnpjs["_merge"] == "both")
]
check_mun["municipality_check"] = "UNKNOWN"
check_mun.loc[check_mun["tax_municipality_y"] != "BR-XXXXXXX", "municipality_check"] = (
"CNPJ_WITH_MUNICIPALITY"
)
/var/folders/42/_z7t3f3n2b74bggwwctg77z80000gn/T/ipykernel_80648/2083031557.py:9: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
check_mun['municipality_check'] = 'UNKNOWN'
a = check_mun.groupby(by=["municipality_check"])[["vol"]].sum()
a["perc"] = a["vol"].div(a["vol"].sum()).mul(100)
a
| vol | perc | |
|---|---|---|
| municipality_check | ||
| CNPJ_WITH_MUNICIPALITY | 4,318,777,004.87 | 99.31 |
| UNKNOWN | 29,899,000.00 | 0.69 |