This document summarises and checks the results from Brazilian soy 2.6.1 which extends the time series to 2021 and 2022. The method for version 2.6 is based on the use of bills of lading (BoL) rather than customs declarations (CD, used in 2004-2018).
We explore the following differences:
Then we carry out a more extensive QA check:
Additional note: In May 2024, Harry added more CNPJ’s to the RFB dataset taken from SICARM as a means to try and capture more CNPJs in branch 1. This .Rmd runs the scripts with these results.
## [1] "just downloaded CD for 2004"
## [1] "just downloaded CD for 2005"
## [1] "just downloaded CD for 2006"
## [1] "just downloaded CD for 2007"
## [1] "just downloaded CD for 2008"
## [1] "just downloaded CD for 2009"
## [1] "just downloaded CD for 2010"
## [1] "just downloaded CD for 2011"
## [1] "just downloaded CD for 2012"
## [1] "just downloaded CD for 2013"
## [1] "just downloaded CD for 2014"
## [1] "just downloaded CD for 2015"
## [1] "just downloaded CD for 2016"
## [1] "just downloaded CD for 2017"
## [1] "just downloaded MDIC for 2004"
## [1] "just downloaded MDIC for 2005"
## [1] "just downloaded MDIC for 2006"
## [1] "just downloaded MDIC for 2007"
## [1] "just downloaded MDIC for 2008"
## [1] "just downloaded MDIC for 2009"
## [1] "just downloaded MDIC for 2010"
## [1] "just downloaded MDIC for 2011"
## [1] "just downloaded MDIC for 2012"
## [1] "just downloaded MDIC for 2013"
## [1] "just downloaded MDIC for 2014"
## [1] "just downloaded MDIC for 2015"
## [1] "just downloaded MDIC for 2016"
## [1] "just downloaded MDIC for 2017"
## [1] "just downloaded MDIC for 2018"
## [1] "just downloaded MDIC for 2019"
## [1] "just downloaded MDIC for 2020"
## [1] "just downloaded MDIC for 2021"
## [1] "just downloaded MDIC for 2022"
We compare total volume exported from Brazil from the input data (CD and BoL) and Trase results in 2004-2022:
Check volume differences between original trade data and the SEI-PCS results: total, per product, per country.
Note: the MDIC data uses the “VIA” field to designate the type of export that took place:
CO_VIA;“NO_VIA” 99;“VIA DESCONHECIDA” 13;“POR REBOQUE” 11;“COURIER” 15;“VICINAL FRONTEIRICO” 14;“DUTOS” 12;“EM MAOS” 00;“VIA NAO DECLARADA” 01;“MARITIMA” 02;“FLUVIAL” 03;“LACUSTRE” 04;“AEREA” 05;“POSTAL” 06;“FERROVIARIA” 07;“RODOVIARIA” 08;“CONDUTO/REDE DE TRANSMISSAO” 09;“MEIOS PROPRIOS” 10;“ENTRADA/SAIDA FICTA”
So maritime shipments are VIA == “01”
The above table shows that the input trade data is equal to the SEI-PCS output in all years. There is only a minor differences in the MDIC-port data in 2019 and 2022.
We also look at FOB
We also look at maritime and non-maritime trade separately, especially in 2019 and 2022 to see how close we can match the BoL to MDIC port. The number of non maritime shipments are quite small betwee 2019 and 2022:
By removing the non-maritime trade, we can do a 1:1 comparison between the 2019/2020 SEI-PCS results and MDIC. As shown below we are still slightly off by 2-5 Mtonnes.
## year tot_exp sei_pcs_tot_exp mdic_tot_exp_m ratio_sei_pcs_trade
## 1 2022 93733759 93733865 101951478 1
## 2 2021 101621194 101621220 104958244 1
## 3 2020 97835599 97927560 100778445 1
## 4 2019 83962027 84226282 89777197 1
## ratio_sei_pcs_mdic_m
## 1 0.92
## 2 0.97
## 3 0.97
## 4 0.94
We are still not fully caught up to MDIC maritime data, so we can saw that there are missing volumes in the BoL, especially in 2019 and 2022.
We then look at specific products.
The results show that all results are the same at the product level with some cases where MDIC is up to 20% lower than the trade data that we have. Soybean oil in 2022 and 2020 are the worst when compared to MDIC.
Then we look at the size of the domestic consumption as estimated through the LP step.
## # A tibble: 19 × 2
## year tonnes
## <dbl> <dbl>
## 1 2004 13288080.
## 2 2005 11756972.
## 3 2006 13148043.
## 4 2007 19520685.
## 5 2008 20892949.
## 6 2009 14815715.
## 7 2010 24219764.
## 8 2011 25511773.
## 9 2012 17036062.
## 10 2013 23922516.
## 11 2014 25828543.
## 12 2015 26304507.
## 13 2016 28797294.
## 14 2017 30765213.
## 15 2018 18369454.
## 16 2019 30090547.
## 17 2020 23893389.
## 18 2021 33177959.
## 19 2022 26967166.
There are many small flows, some of which might need anonimizing.
We then look at the breakdown per country of destination to see any major differences across years.
The above table highlights major differences in expected destination countries with MDIC in 2019-2022 (which is expected). Some key countries to pay close attention to are:
To name a few. We notice that the volumes going to China are actually quite close for China (Mainland) when comparing CD/BoL/SEI-PCS results with MDIC, but the difference could be made up with the Singapore/Hong Kong volumes.
Let’s now compare the trajectory of total trade to both the EU and China, paying close attention to the “switch” in data sources from 2017 to 2020-2022.
## # A tibble: 8 × 5
## # Groups: year [4]
## year economic_bloc sei_pcs_tot mdic_tot diff
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 2022 CHINA 46.6 53.9 7.27
## 2 2022 EU 16.0 17.0 1.00
## 3 2021 CHINA 55.4 60.9 5.56
## 4 2021 EU 17.2 16.9 -0.351
## 5 2020 CHINA 57.4 60.9 3.48
## 6 2020 EU 16.9 17.0 0.0970
## 7 2019 CHINA 52.7 58.2 5.58
## 8 2019 EU 13.0 14.6 1.53
We see an ever larger deviation between our results and those reported by MDIC, especially in China even when allocating Singapore and Malaysian imports to China. This means that our “blind spot” is increasing over time with 7 Mtonnes missing in our results in 2022.
Starting 2020, the BoL have a large amount of volume assigned to “UNKNOWN COUNTRY EUROPEAN UNION” which needs to be communicated somehow:
Now we turn to traders to see the evolution of the results from 2004-2022 and any major change in the top traders (those trading 80% of volume)
## # A tibble: 19 × 3
## year exporter_group tons
## <dbl> <chr> <dbl>
## 1 2004 UNKNOWN FLOWS 10382300.
## 2 2005 UNKNOWN FLOWS 8928665.
## 3 2006 UNKNOWN FLOWS 4987124.
## 4 2007 UNKNOWN FLOWS 6651757.
## 5 2008 UNKNOWN FLOWS 5235957.
## 6 2009 UNKNOWN FLOWS 3661466.
## 7 2010 UNKNOWN FLOWS 3720854.
## 8 2011 UNKNOWN FLOWS 4387314.
## 9 2012 UNKNOWN FLOWS 4767949.
## 10 2013 UNKNOWN FLOWS 7448990.
## 11 2014 UNKNOWN FLOWS 6758166.
## 12 2015 UNKNOWN FLOWS 6334513.
## 13 2016 UNKNOWN FLOWS 7090174.
## 14 2017 UNKNOWN FLOWS 8350842.
## 15 2018 UNKNOWN FLOWS 18024458.
## 16 2019 UNKNOWN FLOWS 14073292.
## 17 2020 UNKNOWN FLOWS 17491650.
## 18 2021 UNKNOWN FLOWS 21139709.
## 19 2022 UNKNOWN FLOWS 21525751.
Let’s how check the change in market share over time of each company
The size of the “UNKNOWN CUSTOMER” (Exporter) and “UNKNOWN” (Exporter
group) keeps increasing and is basically assigned either “Unknown” or
Branch 3.1.
Our method will assign different municipalities as logistics hub (LH) year-on-year. We track this here and notice a large increase in the number of LH in 2019 and 2020. This is likely due to the matching/method.
The question now is what are the decision tree branches that are linked to these new LH. We check that next.
The results are overwhelmingly in branches 3 with makes our method quite
questionable (and perhaps making the context more like Paraguay
soy).
Let’s now compare to the overall results with no cutoff and a focus on large traders.
We know that the BoL do not have as much information on importers as we hoped for 2019 and onwards. Let’s first compare the SEI-PCS results with the original BoL data.
## # A tibble: 1,828 × 5
## # Groups: year [4]
## year importer.label tonnes tot_exp pct
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 2020 UNKNOWN CUSTOMER 76716156. 97835599 78
## 2 2021 UNKNOWN CUSTOMER 78498778. 101621194 77
## 3 2022 UNKNOWN CUSTOMER 71850922. 93733759 77
## 4 2019 UNKNOWN CUSTOMER 59552624. 83962027 71
## 5 2020 BUNGE 5453884. 97835599 6
## 6 2019 BUNGE 3815371. 83962027 5
## 7 2021 BUNGE 5005230. 101621194 5
## 8 2022 BUNGE 5153794. 93733759 5
## 9 2019 CARGILL 1320313. 83962027 2
## 10 2019 COFCO 1566624. 83962027 2
## # ℹ 1,818 more rows
We note that the volume of “Unknown customers” is:
This is unfortunately the nature of the data and we cannot do anything about it.
For v.2.6.1 we also need to check that commodity ratios have properly been applied.
These are:
## # A tibble: 0 × 12
## # ℹ 12 variables: year <dbl>, product_type <chr>,
## # municipality_of_production_trase_id <chr>, tons_prod <dbl>, tons <dbl>,
## # tons_demand <dbl>, tons_demand_tot <dbl>, ratio <dbl>, balance <dbl>,
## # test_balance <chr>, test_ratio_cake <chr>, test_ratio_oil <chr>
From the above checks, the commodity ratios are respected and the production constraint (export + domestic consumption <= production) as well.