In [ ]:
%%HTML 
<script>
    function luc21893_refresh_cell(cell) {
        if( cell.luc21893 ) return;
        cell.luc21893 = true;
        console.debug('New code cell found...' );
        
        var div = document.createElement('DIV');            
        cell.parentNode.insertBefore( div, cell.nextSibling );
        div.style.textAlign = 'right';
        var a = document.createElement('A');
        div.appendChild(a);
        a.href='#'
        a.luc21893 = cell;
        a.setAttribute( 'onclick', "luc21893_toggle(this); return false;" );

        cell.style.visibility='hidden';
        cell.style.position='absolute';
        a.innerHTML = '[show code]';        
                
    }
    function luc21893_refresh() {                
        if( document.querySelector('.code_cell .input') == null ) {            
            // it apeears that I am in a exported html
            // hide this code
            var codeCells = document.querySelectorAll('.jp-InputArea')
            codeCells[0].style.visibility = 'hidden';
            codeCells[0].style.position = 'absolute';                        
            for( var i = 1; i < codeCells.length; i++ ) {
                if (i % 2 == 0){
                luc21893_refresh_cell(codeCells[i].parentNode)}
                else {}
                
            }
            window.onload = luc21893_refresh;
        }                 
        else {
            // it apperas that I am in a jupyter editor
            var codeCells = document.querySelectorAll('.code_cell .input')
            for( var i = 0; i < codeCells.length; i++ ) {
                if ([1,3,4,5,7].includes(i)){
                luc21893_refresh_cell(codeCells[i])}
                else {}
            }            
            window.setTimeout( luc21893_refresh, 1000 )
        }        
    }
    
    function luc21893_toggle(a) {
        if( a.luc21893.style.visibility=='hidden' ) {
            a.luc21893.style.visibility='visible';        
            a.luc21893.style.position='';
            a.innerHTML = '[hide code]';
        }
        else {
            a.luc21893.style.visibility='hidden';        
            a.luc21893.style.position='absolute';
            a.innerHTML = '[show code]';
        }
    }
    
    luc21893_refresh()
</script>

Brazilian beef SEI-PCS 2.2.0 QA¶

Yan Prada Moro 03/03/2023

Introduction¶

Trase is a research initiative that aims to improve the transparency of global supply chains, with a focus on commodities that have a significant impact on deforestation and other environmental issues. The main goal of Trase is to increase understanding of the trade flows, actors, and governance mechanisms that shape the sustainability of global commodity supply chains, and to use this information to support decision-making and policy development in the public and private sectors. The initiative uses a combination of data analysis, research, and stakeholder engagement to achieve this goal.

Beef is one of the commodities that Trase focuses on in its research. The beef supply chain is complex and often opaque, with many actors and intermediaries involved, making it difficult to trace the origin and movement of the product. Trase aims to increase transparency in the beef supply chain by mapping out the trade flows and actors that shape the sustainability of the industry. This includes identifying the countries and regions where beef is produced, the companies and traders that are involved in the supply chain, and the various regulations and policies that govern the industry.

For the Brazilian beef supply chain, a new method is being developed to derive the flows for the years of 2018 and beyond. The method is based on the use of bills of lading (BoL) instead of customs declarations (CD), that has been used from 2010 until middle of 2018, serving as an input to build SEI-PCS model.

The customs declaration is a document that is required by customs authorities when goods are imported or exported across international borders. The declaration provides information about the goods being shipped, such as the quantity, value, and type of goods, as well as information about the shipper and recipient. The purpose of a customs declaration is to ensure that the goods being imported or exported comply with the laws and regulations of the country they are entering or leaving, and to assess any applicable duties or taxes. The declaration may also be used to detect and prevent the smuggling of illegal goods. The document is usually filled out by the importer/exporter and it may be done electronically or on paper.

On the other hand, the bill of lading is a legal document that serves as a receipt for goods being shipped by a carrier, such as a trucking company or shipping line. It serves as proof of ownership and details the type, quantity, and destination of the goods being shipped. It also serves as a contract between the shipper and carrier, outlining the terms and conditions of the shipment. In international trade, a bill of lading is often used as a document of title, which can be used to take possession of the goods and transfer ownership to the buyer.

The difference between the two documents affects how SEI-PCS is developed. To test the possible differences between the two methods, and also to verify inconsistencies in the execution of the new model, we analyzed the SEI-PCS version 2.2.0 model and compared volume and fob with SEI-PCS version 2.0, MDIC-port and CD-BoL sets of data. For the analysis, we consider that the MDIC-port volume is the "source of truth", as it is not used to determine the SEI-PCS model.

The MDIC Port Database from COMEX STAT is a system provided by the Brazilian Ministry of Development, Industry and Foreign Trade (MDIC) that allows the registration and consultation of data on the movement of goods through the country's ports. This system was created to make the flow of goods more efficient and transparent by providing real-time data on the movement of goods through the ports, including import and export data, cargo volumes, transit times, and customs clearance times. The MDIC Port Database is integrated with the COMEX STAT system, which provides statistical data on international trade, allowing users to access a wide range of information on the movement of goods through Brazil's ports. The data is used by government agencies, businesses, and researchers to analyze trade trends and make informed decisions.

Sources:

  • For SEI-PCS model v2.2.0, we used two sources:
    • Amazon S3 bucket ("brazil/beef/sei_pcs/v2.2.0/SEIPCS_BRAZIL_BEEF_{year}.csv")
    • Splitgraph("trase-development/data-source:latest"."supply-chains-latest")
    each source is provided on the graph description.
  • For SEI-PCS previous version model, we used two sources*:
    • Amazon S3 bucket ("brazil/beef/sei_pcs/v2.1.0/resubmission/BRAZIL_SEIPCS_BEEF_AGGREGATED_{year}_part_{part}.csv")
    • Splitgraph("trase/supply-chains:latest"."supply-chains")
    *each source is provided on the graph description.
  • For CD we used amazon s3 bucket:
    • For years before 2018: "brazil/beef/trade/cd/combined/CD_COMBINED_BEEF_{year}_NEW.csv"
    • 2018 ownwards: "brazil/beef/trade/cd/disaggregated/CD_DISAGGREGATED_BEEF_{year}_NEW.csv"
  • For MDIC port we used amazon s3 bucket:
    • For 2018: "brazil/trade/mdic/port/brazil_mdic_port_{year}_redownload.csv"
    • For the other years: "brazil/trade/mdic/port/brazil_mdic_port_{year}.csv"

High-level QA findings¶

Some conclusions are:

  1. 2019 and 2020 looks in line with previous years

  2. We increase the number of flows categorized as branch 1.1 for 2010-2017 with the new match between MDIC and CD
In [ ]:
import plotly.io as pio
import numpy as np
pio.renderers.default = "plotly_mimetype+notebook"
from trase.models.brazil.beef.qa_beef.imports.functions import (check_number_low_volume_flows, 
                                                                import_sei_pcs_unknown, 
                                                                compare_versions_sei_pcs
                                                                )
from trase.models.brazil.beef.qa_beef.imports.reader import (load_downloaded_data_s3,
                                                             load_downloaded_data_db
                                                            )
from trase.models.brazil.beef.qa_beef.imports.plots_general import (plot_volume_per_year,
                                                                    plot_diff_vol_raw_product_per_hs6,
                                                                    plot_zero_values_per_year,
                                                                    plot_branch_percentage,
                                                                    plot_branch_percentage_exporter,
                                                                    plot_geosource_percentage,
                                                                    plot_geosource_percentage_per_hs_code,
                                                                    plot_bar_comparison,
                                                                    plot_diff_volume_economic_bloc_dfs,
                                                                    plot_number_low_volume_flows,
                                                                    plot_mdic_vias,
                                                                    plot_unknowns,
                                                                    plot_branch_percentage_state,
                                                                    plot_hs_code_comparision_years,
                                                                    plot_comparision_versions_sei_pcs
                                                                    )
dfs = load_downloaded_data_s3()
sei_pcs_s3 = dfs['sei_pcs_s3']
sei_old_s3 = dfs['sei_old_s3']
mdic_port = dfs['mdic_port']
cd = dfs['cd']
merged_df = dfs['merged_df']

dfs_DB = load_downloaded_data_db()
sei_pcs_db = dfs_DB['sei_pcs_db']
sei_old_db = dfs_DB['sei_old_db']
dfs_DB['sei_old_db'].loc[
                        dfs_DB['sei_old_db']['EXPORTER_GROUP']=='MARFRIG GLOBAL FOODS',
                         'EXPORTER_GROUP'
                         ] = 'MARFRIG'
dfs_DB['sei_old_db']['LVL6_TRASE_ID_LH']= np.where(dfs_DB['sei_old_db']['LVL6_TRASE_ID_LH'].str.split().str[0]== 'UNKNOWN', 
                                                   'UNKNOWN', 
                                                   dfs_DB['sei_old_db']['LVL6_TRASE_ID_LH'])

High Level QA¶

SEI PCS Volume Raw¶

Volume Raw is the volume (i.e. tons of beef or offal) converted into ‘carcass weight equivalents’, using standard conversion coefficients, so that all volumes are comparable. In this document, we will refer to volume product as the original exported volume, and volume raw as the carcass weight equivalent.

The plot below displays the total volume raw of beef commodity, measured in tonnes, that was exported each year. The horizontal axis represents the years and the vertical axis represents the volume raw. The plot is based on data from the SEI-PCS model, which is the model used to trace commodity trade flows.

In [ ]:
plot_volume_per_year(sei_pcs_s3)

Ratio between SEI-PCS volume raw and volume product per HS6 code per year¶

The chart shows the ratio between 'VOLUME_RAW' and 'VOLUME_PRODUCT' by year and HS6 code. The data used to create the chart is taken from SEI-PCS model from Amazon s3. HS codes that just include beef are in red, and the ones that include others cattle types are in gray.

HS codes (Harmonized System codes) are a standardized system of names and numbers that are used to classify goods in international trade. They are used to identify the products that are being traded and to calculate tariffs and taxes. Each hs code have a conversion coefficient to transform volume product into a carcass weight equivalent.

To calculate the ratio between the two volumes we grouped the volumes per year and per hs6 code, and then diveded the volume raw by the volume product for each hs6 for each year.

In [ ]:
plot_diff_vol_raw_product_per_hs6(sei_pcs_s3, "VOLUME_RAW")

Difference between version 2.1 and 2.2 of beef for years 2015-2017¶

In [ ]:
key = "LVL6_TRASE_ID_LH"
plot_df = compare_versions_sei_pcs(dfs_DB, key)
plot_comparision_versions_sei_pcs(plot_df, key)

Number of SEI-PCS flows with zero volume¶

This plot shows the number of flows (i.e. trade transactions) with zero volume (i.e. quantity of goods traded) per year. The x-axis represents the year and the y-axis represents the number of flows with zero volume in that year. It provides insight into the number of trade transactions with no goods being traded in each year. This information could be useful for identifying any potential data errors or missing information in the trade data. The source of data is from the SEI-PCS model from Splitgraph.

If no plot is provided, it means that there is no zero volume flows in the SEI-PCS model data from Splitgraph.

In [ ]:
plot_zero_values_per_year()
WE DO NOT HAVE ZERO FLOWS

SEI-PCS volume raw percentage per branch per year¶

This plot shows the percentage of SEI_PCS volume per branch per year using data from Splitgraph. It is a bar chart with the x-axis representing the years and y-axis representing the percentage of volume. The bars are color-coded according to the branch they represent. The plot is useful to see how the volume of different branches changes over time and how they compare to each other.

  • Branch 0: The product exported is “live cattle” ⇒ accept the CNPJ municipality of taxation as the LH
  • Branch 1: The product exported is “meat product”, and the state of production matches the CNPJ’s state
    • 1.1. The CNPJ is a SIF slaughterhouse ⇒ accept the CNPJ municipality of taxation as the LH
    • 1.2. The exporter operates a SIF slaughterhouse in the CNPJ municipality of taxation ⇒ accept CNPJ municipality of taxation as the LH
    • 1.3. A subsidiary of the exporter group operates a SIF slaughterhouse in the CNPJ municipality of taxation ⇒ accept the CNPJ municipality of taxation as the LH
    • 1.4. The exporter operates a non-SIF slaughterhouse in the CNPJ municipality of taxation ⇒ accept the CNPJ municipality of taxation as the LH
    • 1.5. The exporter operates SIF slaughterhouses in the state of production within 250 km of the CNPJ municipality of taxation
    • 1.5.1. There are SIF slaughterhouses within the state of production with export permit to the country of import ⇒ municipalities of slaughterhouses accepted as LH, export volume split between all slaughterhouses in inverse proportion to the distance
    • 1.5.2. The country of import does not appear in the SIF dataset ⇒ all exporter SIF slaughterhouses in state within range are accepted as LH, export volume split between slaughterhouses in inverse proportion to the distance.
    • 1.6. Subsidiaries of the exporter group operate SIF slaughterhouses in the state of production within 250 km of the CNPJ municipality of taxation with export permit to the country of import ⇒ municipalities of slaughterhouses accepted as LH, export volume split between all slaughterhouses in inverse proportion to the distance
    • 1.7. If all previous branches have failed, the export is linked to an unknown slaughterhouse in the state of production.
  • Branch 2: The product exported is “meat product” and the exporter operates a single SIF in the state of production, licensed to export to country
    • 2.1. If yes, accept municipality of slaughterhouse as LH
    • 2.2. If no, link to “Unknown slaughterhouse”
    Branch 3: State of production is unknown, link to “Unknown”
In [ ]:
plot_branch_percentage(sei_pcs_s3)

SEI-PCS volume raw percentage per branch per exporter per year¶

This plot shows the percentage of SEI_PCS volume per branch per exporter per year using data from Splitgraph. It is a bar chart with the x-axis representing the years and y-axis representing the percentage of volume. The bars are color-coded according to the branch they represent and text representing the exporter group. The plot is useful to see how the exporters changes for different branches over time.

In [ ]:
plot_branch_percentage_exporter(sei_pcs_s3)

Same plot, but per state.

In [ ]:
plot_branch_percentage_state(sei_pcs_s3)

Volume percentage per geocode surce per year¶

This plot shows the percentage of SEI-PCS volume raw per geocode source per year. The x-axis represents the year and the y-axis represents the volume percentage. The chart is also labeled with the name of the geocode source and is color-coded according to the input colors.

The Guia de Transporte Animal (GTA) geocode source is a document used in Brazil to track the movement of cattle and other animals being transported within the country. The GTA includes information about the origin and destination of the animals, as well as information about the animals themselves, such as their species, breed, and identification numbers. It is used to ensure compliance with regulations related to animal welfare and disease control, and to track the movement of animals for security and traceability reasons. The document is issued by the Ministry of Agriculture, Livestock and Food Supply, and must be presented to authorities during animal transportation.

SIGSIF stands for Sistema Integrado de Gestão de Sanidade de Animais de Produção e Fiscalização (Integrated System for the Management of Animal Health, Production, and Inspection) in Brazil. It is a database and information management system used by the Brazilian government to track and manage the health, production, and inspection of animals for agricultural and livestock purposes. The system is used by various government agencies and organizations, including the Ministry of Agriculture, Livestock and Supply, and the Brazilian Institute of Agriculture and Livestock Research. It is intended to improve the management of animal health and production, as well as to increase the efficiency of the inspection process.

In [ ]:
plot_geosource_percentage(sei_pcs_s3)

Volume percentage per geocode source per branch per year¶

This plot shows the percentage of volume per year, per branch (HS4 code), and per geocode source. The data used to create the chart is taken from SEI-PCS model from Amazon bucket and is grouped by year, geocode source, and HS4 code.

HS codes (Harmonized System codes) are a standardized system of names and numbers that are used to classify goods in international trade. They are used to identify the products that are being traded and to calculate tariffs and taxes.

  • HS code 1602: This code refers to products in the "Prepared or preserved meat, meat offal or blood; edible flours and meals of meat or meat offal" category. This can include items such as canned meats, sausages, and processed meats.
  • HS code 0202: This code refers to products in the "Meat and edible meat offal, fresh, chilled or frozen" category. This can include items such as fresh beef.
  • HS code 0201: This code refers to products in the "Meat of bovine animals, fresh or chilled" category. This can include items such as fresh beef and veal.
  • HS code 0102: This code refers to products in the "Live bovine animals" category. This can include items such as live cattle.
  • HS code 0206: This code refers to products in the "Meat of bovine animals, frozen" category. This can include items such as frozen beef and veal.
In [ ]:
plot_geosource_percentage_per_hs_code(sei_pcs_s3)

Volume product differences between MDIC-port, CD-BoL and SEI-PCS¶

This plot compares the volume product between three sources, SEI_PCS, MDIC, and CD, per year. The chart is grouped by year and source, and the color of the bars indicates the source. MDIC port volume should be the "source of truth" for the SEI-PCS model volume, so we expect that the volume between these two datasets to match.

As the cd (that represents the Custom Declaration for 2010-2018 and Bills of Lading for 2018-2020) is an input for SEI-PCS model, we also expect that the volume between these datasets to match.

In [ ]:
plot_bar_comparison(merged_df)

Diference in volume product between SEI-PCS, CD-BoL and MDIC per Economic Bloc¶

China¶

This plot compares the volume product between three sources, SEI_PCS, MDIC, and CD, per year for flows that are going to China. The chart is grouped by year and source, and the color of the bars indicates the source. The X-axis represents the year and Y-axis shows the volume product in tonnes.

China Economic Bloc is devided in CHINA (MAINLAND) and CHINA (HONG KONG).

In [ ]:
plot_diff_volume_economic_bloc_dfs(merged_df, "CHINA")

European Union¶

This plot compares the volume product between three sources, SEI_PCS, MDIC, and CD, per year for flows that are going to EU. The chart is grouped by year and source, and the color of the bars indicates the source. The X-axis represents the year and Y-axis shows the volume product in tonnes.

European Union bloc is considered the flows that are going to AUSTRIA , BELGIUM , BULGARIA , CROATIA , CYPRUS , CZECH REPUBLIC , DENMARK , ESTONIA , FINLAND , FRANCE , GERMANY , GREECE , HUNGARY , IRELAND , ITALY , LATVIA , LITHUANIA , LUXEMBOURG , MALTA , NETHERLANDS , POLAND , PORTUGAL , ROMANIA , SLOVAKIA , SLOVENIA , SPAIN , SWEDEN and UNITED KINGDOM .

In [ ]:
plot_diff_volume_economic_bloc_dfs(merged_df, "EUROPEAN UNION")

Number of flows with low volume (less than 0.25 tonnes) per branch and per state over the years¶

This plot shows the number of rows with low volumes (<0.25 tonnes) per year, grouped by a specific category (e.g. branch) provided by the user. The chart displays the number of rows on the y-axis and the years on the x-axis. The bars are color-coded based on the specific category and the chart includes annotations with the source of the data.

Low volumes are considered to be those flows whose volume is less than 0.25 tonnes because they correspond to less than half of a mature cow's weigh.

In [ ]:
sei_pcs_low_volume = check_number_low_volume_flows("BRANCH")
plot_number_low_volume_flows(sei_pcs_low_volume, "BRANCH")

Same plot, but per state.

In [ ]:
sei_pcs_low_volume = check_number_low_volume_flows("REGION_PRODUCTION_2")
plot_number_low_volume_flows(sei_pcs_low_volume, "REGION_PRODUCTION_2")

Volume raw of MDIC-port from non-maritime via per year¶

This plot shows the volume product of MDIC vias that are not equal to 1, ie, are non-maritime flows. The chart shows the product volume on the y-axis and the year on the x-axis.

In [ ]:
plot_mdic_vias(mdic_port)

SEI-PCS volume raw for unknown flows (municipality of production is unknown)¶

This chart displays the volume product of unknown flows for MDIC port per year. Unknown flows are considered those in which the municipality of production is unknown.

In [ ]:
group_by = "BRANCH"
unknowns = import_sei_pcs_unknown(group_by)
plot_unknowns(unknowns, group_by)

MDIC-port, CD-BoL and SEI-PCS volume product comparision per HS4 code per year¶

This plot compares the volume product from different sources of data (BoL, MDIC, SEI v2.2, and v2.1) based on the specific HS4 codes that are on the databases.

In [ ]:
plot_hs_code_comparision_years(merged_df, "HS4")

MDIC-port, CD-BoL and SEI-PCS volume raw comparision per HS6 FROM 0202 hs4 CODE code per year¶

This plot compares the volume product from different sources of data (BoL, MDIC, SEI v2.2, and v2.1) based on the specific HS6 codes that are on the databases.

In [ ]:
plot_hs_code_comparision_years(merged_df, "HS6")

For hs code 1602, we don't have hs code 160250 for our model in 2018, and we don't have hscodes 160232 and 160231 for all years, neither for CD nor sei-pcs.

In [ ]:
plot_hs_code_comparision_years(merged_df, "HS6", "1602")