A new method is being developed to derive the Brazilian soy supply chain for years 2019 and beyond. This new method is based on the use of bills of lading (BoL) rather than customs declarations (CD, used in 2004-2018). In order to test the potential differences between the two methods, we derived 2 sets of results for the year 2017 for which we have both the CD and BoL approaches. This report explores the differences in the results as a validation step for the method.
We explore the following differences:
Then we carry out a more extensive QA check:
(start with these and then add more analysis related to the model results)
## [1] "just downloaded CD for 2004"
## [1] "just downloaded CD for 2005"
## [1] "just downloaded CD for 2006"
## [1] "just downloaded CD for 2007"
## [1] "just downloaded CD for 2008"
## [1] "just downloaded CD for 2009"
## [1] "just downloaded CD for 2010"
## [1] "just downloaded CD for 2011"
## [1] "just downloaded CD for 2012"
## [1] "just downloaded CD for 2013"
## [1] "just downloaded CD for 2014"
## [1] "just downloaded CD for 2015"
## [1] "just downloaded CD for 2016"
## [1] "just downloaded CD for 2017"
## [1] "just downloaded MDIC for 2004"
## [1] "just downloaded MDIC for 2005"
## [1] "just downloaded MDIC for 2006"
## [1] "just downloaded MDIC for 2007"
## [1] "just downloaded MDIC for 2008"
## [1] "just downloaded MDIC for 2009"
## [1] "just downloaded MDIC for 2010"
## [1] "just downloaded MDIC for 2011"
## [1] "just downloaded MDIC for 2012"
## [1] "just downloaded MDIC for 2013"
## [1] "just downloaded MDIC for 2014"
## [1] "just downloaded MDIC for 2015"
## [1] "just downloaded MDIC for 2016"
## [1] "just downloaded MDIC for 2017"
## [1] "just downloaded MDIC for 2018"
## [1] "just downloaded MDIC for 2019"
## [1] "just downloaded MDIC for 2020"
We compare total volume exported from Brazil from the input data (CD and BoL) and Trase results in 2004-2020:
Check volume differences between original trade data and the SEI-PCS results: total, per product, per country.
Note: the MDIC data uses the “VIA” field to designate the type of export that took place:
CO_VIA;“NO_VIA” 99;“VIA DESCONHECIDA” 13;“POR REBOQUE” 11;“COURIER” 15;“VICINAL FRONTEIRICO” 14;“DUTOS” 12;“EM MAOS” 00;“VIA NAO DECLARADA” 01;“MARITIMA” 02;“FLUVIAL” 03;“LACUSTRE” 04;“AEREA” 05;“POSTAL” 06;“FERROVIARIA” 07;“RODOVIARIA” 08;“CONDUTO/REDE DE TRANSMISSAO” 09;“MEIOS PROPRIOS” 10;“ENTRADA/SAIDA FICTA”
So maritime shipments are VIA == “01”
The above table shows that the input trade data is equal to the SEI-PCS output in all years. There is only a minor differences in the MDIC-port data in 2019.
We also look at FOB
reactable(tot_exp_compare_fob, pagination = TRUE, sortable = TRUE, filterable = TRUE, outlined = TRUE)
We also look at maritime and non-maritime trade separately, especially in 2019 and 2020 to see how close we can match the BoL to MDIC port. The number of non maritime shipments are quite small in 2019 and 2020:
By removing the non-maritime trade, we can do a 1:1 comparison between the 2019/2020 SEI-PCS results and MDIC. As shown below we are still slightly off by 2-5 Mtonnes.
## year tot_exp sei_pcs_tot_exp mdic_tot_exp_m ratio_sei_pcs_trade
## 1 2019 83962027 83995510 89777197 1
## 2 2020 97835599 97838045 100778445 1
## ratio_sei_pcs_mdic_m
## 1 0.9
## 2 1.0
We then look at specific products.
The results show that all results are the same at the product level with some cases where MDIC is up to 20% lower than the trade data that we have.
Then we look at the size of the domestic consumption as estimated through the LP step.
## # A tibble: 17 × 2
## year tonnes
## <dbl> <dbl>
## 1 2004 13288324.
## 2 2005 11757259.
## 3 2006 13148152.
## 4 2007 19520921.
## 5 2008 20893072.
## 6 2009 14815789.
## 7 2010 24219811.
## 8 2011 25511858.
## 9 2012 17036172.
## 10 2013 23922586.
## 11 2014 25828618.
## 12 2015 26304514.
## 13 2016 28797296.
## 14 2017 30765239.
## 15 2018 18369539.
## 16 2019 30076054.
## 17 2020 23870489.
Sep 16 2022: The domestic consumption was run with
values of 28 Mtonnes (2019) and 24 Mtonnes (2020). There are 5 flows
with domestic consumption < 1 tonne and then 430 that are < 100
tonnes. We need to discuss internally and then perhaps choose to
anonymize flows that are less than a threshold (say one truck’s worth or
4 tonnes).
Oct 19 2022: domestic consumption is now 30 Mtonnes in 2019 and remains 24 Mtonnes in 2020.
Nov 7 2022: domestic consumption is 2 tonnes in 2016 and 7 tonnes in 2015 for Unknown biome
We then look at the breakdown per country of destination to see any major differences across years.
The above table highlights major differences in expected destination countries with MDIC mostly in 2019 and 2020 (which is expected). Some key countries to pay close attention to are:
To name a few. We notice that the volumes going to China are actually quite close for China (Mainland) when comparing CD/BoL/SEI-PCS results with MDIC, but the difference could be made up with the Singapore/Hong Kong volumes.
Let’s now compare the trajectory of total trade to both the EU and China, paying close attention to the “switch” in data sources from 2017 to 2020.
## # A tibble: 4 × 5
## # Groups: year [2]
## year economic_bloc sei_pcs_tot mdic_tot diff
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 2019 CHINA 47.9 58.2 10.4
## 2 2019 EU 12.7 14.6 1.88
## 3 2020 CHINA 53.2 60.8 7.68
## 4 2020 EU 14.2 17.0 2.80
Now we turn to traders to see the evolution of the results from 2004-2020 and any major change in the top traders (those trading 80% of volume)
## Warning: The `x` argument of `as_tibble()` can't be missing as of tibble 3.0.0.
## # A tibble: 17 × 3
## year exporter_group tons
## <dbl> <chr> <dbl>
## 1 2004 UNKNOWN FLOWS 10382056.
## 2 2005 UNKNOWN FLOWS 8928379.
## 3 2006 UNKNOWN FLOWS 4987015.
## 4 2007 UNKNOWN FLOWS 6651521.
## 5 2008 UNKNOWN FLOWS 5235834.
## 6 2009 UNKNOWN FLOWS 3661392.
## 7 2010 UNKNOWN FLOWS 3720807.
## 8 2011 UNKNOWN FLOWS 4387229.
## 9 2012 UNKNOWN FLOWS 4767839.
## 10 2013 UNKNOWN FLOWS 7448920.
## 11 2014 UNKNOWN FLOWS 6758091.
## 12 2015 UNKNOWN FLOWS 6334506.
## 13 2016 UNKNOWN FLOWS 7090172.
## 14 2017 UNKNOWN FLOWS 8350816.
## 15 2018 UNKNOWN FLOWS 18024373.
## 16 2019 UNKNOWN FLOWS 13923781.
## 17 2020 UNKNOWN FLOWS 17554132.
Let’s how check the change in market share over time of each company
By looking at the bigger export companies, we see that some are clearly
missing some export volumes:
We will need to rerun this analysis once the trader names have been cleaned. After checking the original BoL in 2019/2020 these company names appear with a CNPJ, but somehow the output did not assign a trader id (likely due to synonyms).
Sep 16 2022: There are still some minor fixes to apply to a few company names Oct 19 2022: All fixed
Our method will assign different municipalities as logistics hub (LH) year-on-year. We track this here and notice a large increase in the number of LH in 2019 and 2020. This is likely due to the matching/method.
The question now is what are the decision tree branches that are linked to these new LH. We check that next
The above might be fixed with the new errors and bugs found in CNPJ. We
will need to revisit this.
We know that the BoL do not have as much information on importers as we hoped for 2019 and 2020. Let’s first compare the SEI-PCS results with the original BoL data.
## # A tibble: 835 × 5
## # Groups: year [2]
## year importer.label tonnes tot_exp pct
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 2020 UNKNOWN CUSTOMER 76716156. 97835599 78
## 2 2019 UNKNOWN CUSTOMER 59552624. 83962027 71
## 3 2020 BUNGE 5453884. 97835599 6
## 4 2019 BUNGE 3815371. 83962027 5
## 5 2019 CARGILL 1320313. 83962027 2
## 6 2019 COFCO 1566624. 83962027 2
## 7 2019 ADM ARCHER DANIELS MIDLAND 703445. 83962027 1
## 8 2019 AMAGGI INTERNATIONAL LTD 551499. 83962027 1
## 9 2019 BANGKOK PRODUCE MERCHANDISING 551596. 83962027 1
## 10 2019 CEFETRA BV 489608. 83962027 1
## # … with 825 more rows
## # ℹ Use `print(n = ...)` to see more rows
We note that the volume of “Unknown customers” is 78% in 2020 and 71% in 2019 in the BoL but only 20% or so in the SEI-PCS results. This is unfortunately the nature of the data and we cannot do anything about it.
Other checks needed on 2.6.0:
We then compare the two sets of results for the 2004-2018 period to see the kinds of differences the users will run into. The analysis includes: + check the total exports and per HS code export in each year + check differences in logistics hub (there should be no difference) + check volume differences per branch (there should be no difference) + check total export by main trader and country (no differences expected) + check differences in municipalities of production (we expect differences here from the ) + look at effects on EU and China imports, look at French imports
There are no differences between 2.5.1 and 2.6.0 on total exports and product-specific exports (PASS check). We then check volume differences per SEI-PCS branch.
Report per branch:
So we check specific LH/branches where the volumes are different.
A few high level observations:
Oct 19 2022: Some changes were implemented to relax the restrictions of branch 3.
Let’s now check company specific sourcing to LH and then to municipalities of production.
From a first check of the results, we see an increasing deviation of volume being replicated for the Trader-LH in the later years.
We check the big traders to see how different each trader-LH is either the same or different between v.2.5 and 2.6:
We can then check how these volumes change per branch. We expect that branches 3 and above would be selecting different LH.
We check the big traders to see how different each trader-LH (per branch) is either the same or different between v.2.5 and 2.6:
The majorty of the volume is being sourced in different municipalities
from the companies.
For v.2.6.0 we also need to check that commodity ratios have properly been applied.
These are:
## # A tibble: 0 × 12
## # … with 12 variables: year <dbl>, product_type <chr>,
## # region_production_1_trase_id <chr>, tons_prod <dbl>, tons <dbl>,
## # tons_demand <dbl>, tons_demand_tot <dbl>, ratio <dbl>, balance <dbl>,
## # test_balance <chr>, test_ratio_cake <chr>, test_ratio_oil <chr>
## # ℹ Use `colnames()` to see all variable names
From the above checks, the commodity ratios are respected and the production constraint (export + domestic consumption <= production) as well.