Introduction

This document summarises and checks the results from Brazilian soy 2.6.1 which extends the time series to 2021 and 2022. The method for version 2.6 is based on the use of bills of lading (BoL) rather than customs declarations (CD, used in 2004-2018).

We explore the following differences:

Then we carry out a more extensive QA check:

Additional note: In May 2024, Harry added more CNPJ’s to the RFB dataset taken from SICARM as a means to try and capture more CNPJs in branch 1. This .Rmd runs the scripts with these results.

## [1] "just downloaded CD for 2004"
## [1] "just downloaded CD for 2005"
## [1] "just downloaded CD for 2006"
## [1] "just downloaded CD for 2007"
## [1] "just downloaded CD for 2008"
## [1] "just downloaded CD for 2009"
## [1] "just downloaded CD for 2010"
## [1] "just downloaded CD for 2011"
## [1] "just downloaded CD for 2012"
## [1] "just downloaded CD for 2013"
## [1] "just downloaded CD for 2014"
## [1] "just downloaded CD for 2015"
## [1] "just downloaded CD for 2016"
## [1] "just downloaded CD for 2017"
## [1] "just downloaded MDIC for 2004"
## [1] "just downloaded MDIC for 2005"
## [1] "just downloaded MDIC for 2006"
## [1] "just downloaded MDIC for 2007"
## [1] "just downloaded MDIC for 2008"
## [1] "just downloaded MDIC for 2009"
## [1] "just downloaded MDIC for 2010"
## [1] "just downloaded MDIC for 2011"
## [1] "just downloaded MDIC for 2012"
## [1] "just downloaded MDIC for 2013"
## [1] "just downloaded MDIC for 2014"
## [1] "just downloaded MDIC for 2015"
## [1] "just downloaded MDIC for 2016"
## [1] "just downloaded MDIC for 2017"
## [1] "just downloaded MDIC for 2018"
## [1] "just downloaded MDIC for 2019"
## [1] "just downloaded MDIC for 2020"
## [1] "just downloaded MDIC for 2021"
## [1] "just downloaded MDIC for 2022"

Focus on v2.6.1 results

Differences in volumes of product(s) exported from Brazil

We compare total volume exported from Brazil from the input data (CD and BoL) and Trase results in 2004-2022:

Check volume differences between original trade data and the SEI-PCS results: total, per product, per country.

Note: the MDIC data uses the “VIA” field to designate the type of export that took place:

CO_VIA;“NO_VIA” 99;“VIA DESCONHECIDA” 13;“POR REBOQUE” 11;“COURIER” 15;“VICINAL FRONTEIRICO” 14;“DUTOS” 12;“EM MAOS” 00;“VIA NAO DECLARADA” 01;“MARITIMA” 02;“FLUVIAL” 03;“LACUSTRE” 04;“AEREA” 05;“POSTAL” 06;“FERROVIARIA” 07;“RODOVIARIA” 08;“CONDUTO/REDE DE TRANSMISSAO” 09;“MEIOS PROPRIOS” 10;“ENTRADA/SAIDA FICTA”

So maritime shipments are VIA == “01”

Total volumes

The above table shows that the input trade data is equal to the SEI-PCS output in all years. There is only a minor differences in the MDIC-port data in 2019 and 2022.

We also look at FOB

We also look at maritime and non-maritime trade separately, especially in 2019 and 2022 to see how close we can match the BoL to MDIC port. The number of non maritime shipments are quite small betwee 2019 and 2022:

  • 2,568,389 tonnes in 2019
  • 802,321 tonnes in 2020
  • 534,625 tonnes in 2021
  • 440,048 tonnes in 2022

By removing the non-maritime trade, we can do a 1:1 comparison between the 2019/2020 SEI-PCS results and MDIC. As shown below we are still slightly off by 2-5 Mtonnes.

##   year   tot_exp sei_pcs_tot_exp mdic_tot_exp_m ratio_sei_pcs_trade
## 1 2022  93733759        93733865      101951478                   1
## 2 2021 101621194       101621220      104958244                   1
## 3 2020  97835599        97927560      100778445                   1
## 4 2019  83962027        84226282       89777197                   1
##   ratio_sei_pcs_mdic_m
## 1                 0.92
## 2                 0.97
## 3                 0.97
## 4                 0.94

We are still not fully caught up to MDIC maritime data, so we can saw that there are missing volumes in the BoL, especially in 2019 and 2022.

We then look at specific products.

Total product volumes

The results show that all results are the same at the product level with some cases where MDIC is up to 20% lower than the trade data that we have. Soybean oil in 2022 and 2020 are the worst when compared to MDIC.

Estimated of Domestic Consumption

Then we look at the size of the domestic consumption as estimated through the LP step.

## # A tibble: 19 × 2
##     year    tonnes
##    <dbl>     <dbl>
##  1  2004 13288080.
##  2  2005 11756972.
##  3  2006 13148043.
##  4  2007 19520685.
##  5  2008 20892949.
##  6  2009 14815715.
##  7  2010 24219764.
##  8  2011 25511773.
##  9  2012 17036062.
## 10  2013 23922516.
## 11  2014 25828543.
## 12  2015 26304507.
## 13  2016 28797294.
## 14  2017 30765213.
## 15  2018 18369454.
## 16  2019 30090547.
## 17  2020 23893389.
## 18  2021 33177959.
## 19  2022 26967166.

There are many small flows, some of which might need anonimizing.

Total from countries of destination

We then look at the breakdown per country of destination to see any major differences across years.

The above table highlights major differences in expected destination countries with MDIC in 2019-2022 (which is expected). Some key countries to pay close attention to are:

  • Singapore (trade might be going to China)
  • South Africa (not sure why; potentially going to China?)
  • Malaysia
  • Philippines
  • Colombia

To name a few. We notice that the volumes going to China are actually quite close for China (Mainland) when comparing CD/BoL/SEI-PCS results with MDIC, but the difference could be made up with the Singapore/Hong Kong volumes.

Let’s now compare the trajectory of total trade to both the EU and China, paying close attention to the “switch” in data sources from 2017 to 2020-2022.

## # A tibble: 8 × 5
## # Groups:   year [4]
##    year economic_bloc sei_pcs_tot mdic_tot    diff
##   <dbl> <chr>               <dbl>    <dbl>   <dbl>
## 1  2022 CHINA                46.6     53.9  7.27  
## 2  2022 EU                   16.0     17.0  1.00  
## 3  2021 CHINA                55.4     60.9  5.56  
## 4  2021 EU                   17.2     16.9 -0.351 
## 5  2020 CHINA                57.4     60.9  3.48  
## 6  2020 EU                   16.9     17.0  0.0970
## 7  2019 CHINA                52.7     58.2  5.58  
## 8  2019 EU                   13.0     14.6  1.53

We see an ever larger deviation between our results and those reported by MDIC, especially in China even when allocating Singapore and Malaysian imports to China. This means that our “blind spot” is increasing over time with 7 Mtonnes missing in our results in 2022.

Starting 2020, the BoL have a large amount of volume assigned to “UNKNOWN COUNTRY EUROPEAN UNION” which needs to be communicated somehow:

  • 2,258,583 tonnes in 2020
  • 1,153,303 tonnes in 2021
  • 4,333,727 tonnes in 2022

Differences in volumes exported by traders (exporter)

Now we turn to traders to see the evolution of the results from 2004-2022 and any major change in the top traders (those trading 80% of volume)

## # A tibble: 19 × 3
##     year exporter_group      tons
##    <dbl> <chr>              <dbl>
##  1  2004 UNKNOWN FLOWS  10382300.
##  2  2005 UNKNOWN FLOWS   8928665.
##  3  2006 UNKNOWN FLOWS   4987124.
##  4  2007 UNKNOWN FLOWS   6651757.
##  5  2008 UNKNOWN FLOWS   5235957.
##  6  2009 UNKNOWN FLOWS   3661466.
##  7  2010 UNKNOWN FLOWS   3720854.
##  8  2011 UNKNOWN FLOWS   4387314.
##  9  2012 UNKNOWN FLOWS   4767949.
## 10  2013 UNKNOWN FLOWS   7448990.
## 11  2014 UNKNOWN FLOWS   6758166.
## 12  2015 UNKNOWN FLOWS   6334513.
## 13  2016 UNKNOWN FLOWS   7090174.
## 14  2017 UNKNOWN FLOWS   8350842.
## 15  2018 UNKNOWN FLOWS  18024458.
## 16  2019 UNKNOWN FLOWS  14073292.
## 17  2020 UNKNOWN FLOWS  17491650.
## 18  2021 UNKNOWN FLOWS  21139709.
## 19  2022 UNKNOWN FLOWS  21525751.

Let’s how check the change in market share over time of each company

The size of the “UNKNOWN CUSTOMER” (Exporter) and “UNKNOWN” (Exporter group) keeps increasing and is basically assigned either “Unknown” or Branch 3.1.

Differences in volumes exported by traders (exporter) per logistics hub

Our method will assign different municipalities as logistics hub (LH) year-on-year. We track this here and notice a large increase in the number of LH in 2019 and 2020. This is likely due to the matching/method.

The question now is what are the decision tree branches that are linked to these new LH. We check that next.

The results are overwhelmingly in branches 3 with makes our method quite questionable (and perhaps making the context more like Paraguay soy).

Let’s now compare to the overall results with no cutoff and a focus on large traders.

Focus on importers

We know that the BoL do not have as much information on importers as we hoped for 2019 and onwards. Let’s first compare the SEI-PCS results with the original BoL data.

## # A tibble: 1,828 × 5
## # Groups:   year [4]
##     year importer.label      tonnes   tot_exp   pct
##    <dbl> <chr>                <dbl>     <dbl> <dbl>
##  1  2020 UNKNOWN CUSTOMER 76716156.  97835599    78
##  2  2021 UNKNOWN CUSTOMER 78498778. 101621194    77
##  3  2022 UNKNOWN CUSTOMER 71850922.  93733759    77
##  4  2019 UNKNOWN CUSTOMER 59552624.  83962027    71
##  5  2020 BUNGE             5453884.  97835599     6
##  6  2019 BUNGE             3815371.  83962027     5
##  7  2021 BUNGE             5005230. 101621194     5
##  8  2022 BUNGE             5153794.  93733759     5
##  9  2019 CARGILL           1320313.  83962027     2
## 10  2019 COFCO             1566624.  83962027     2
## # ℹ 1,818 more rows

We note that the volume of “Unknown customers” is:

  • 77% in 2022
  • 77% in 2021
  • 78% in 2020
  • 71% in 2019

This is unfortunately the nature of the data and we cannot do anything about it.

Commodity ratios

For v.2.6.1 we also need to check that commodity ratios have properly been applied.

These are:

## # A tibble: 0 × 12
## # ℹ 12 variables: year <dbl>, product_type <chr>,
## #   municipality_of_production_trase_id <chr>, tons_prod <dbl>, tons <dbl>,
## #   tons_demand <dbl>, tons_demand_tot <dbl>, ratio <dbl>, balance <dbl>,
## #   test_balance <chr>, test_ratio_cake <chr>, test_ratio_oil <chr>

From the above checks, the commodity ratios are respected and the production constraint (export + domestic consumption <= production) as well.

Summary of key concerns