Introduction

A new method is being developed to derive the Brazilian soy supply chain for years 2019 and beyond. This new method is based on the use of bills of lading (BoL) rather than customs declarations (CD, used in 2004-2018). In order to test the potential differences between the two methods, we derived 2 sets of results for the year 2017 for which we have both the CD and BoL approaches. This report explores the differences in the results as a validation step for the method.

We explore the following differences:

Then we carry out a more extensive QA check:

(start with these and then add more analysis related to the model results)

## [1] "just downloaded CD for 2004"
## [1] "just downloaded CD for 2005"
## [1] "just downloaded CD for 2006"
## [1] "just downloaded CD for 2007"
## [1] "just downloaded CD for 2008"
## [1] "just downloaded CD for 2009"
## [1] "just downloaded CD for 2010"
## [1] "just downloaded CD for 2011"
## [1] "just downloaded CD for 2012"
## [1] "just downloaded CD for 2013"
## [1] "just downloaded CD for 2014"
## [1] "just downloaded CD for 2015"
## [1] "just downloaded CD for 2016"
## [1] "just downloaded CD for 2017"
## [1] "just downloaded MDIC for 2004"
## [1] "just downloaded MDIC for 2005"
## [1] "just downloaded MDIC for 2006"
## [1] "just downloaded MDIC for 2007"
## [1] "just downloaded MDIC for 2008"
## [1] "just downloaded MDIC for 2009"
## [1] "just downloaded MDIC for 2010"
## [1] "just downloaded MDIC for 2011"
## [1] "just downloaded MDIC for 2012"
## [1] "just downloaded MDIC for 2013"
## [1] "just downloaded MDIC for 2014"
## [1] "just downloaded MDIC for 2015"
## [1] "just downloaded MDIC for 2016"
## [1] "just downloaded MDIC for 2017"
## [1] "just downloaded MDIC for 2018"
## [1] "just downloaded MDIC for 2019"
## [1] "just downloaded MDIC for 2020"

Focus on v.2.6 results

Differences in volumes of product(s) exported from Brazil

We compare total volume exported from Brazil from the input data (CD and BoL) and Trase results in 2004-2020:

Check volume differences between original trade data and the SEI-PCS results: total, per product, per country.

Note: the MDIC data uses the “VIA” field to designate the type of export that took place:

CO_VIA;“NO_VIA” 99;“VIA DESCONHECIDA” 13;“POR REBOQUE” 11;“COURIER” 15;“VICINAL FRONTEIRICO” 14;“DUTOS” 12;“EM MAOS” 00;“VIA NAO DECLARADA” 01;“MARITIMA” 02;“FLUVIAL” 03;“LACUSTRE” 04;“AEREA” 05;“POSTAL” 06;“FERROVIARIA” 07;“RODOVIARIA” 08;“CONDUTO/REDE DE TRANSMISSAO” 09;“MEIOS PROPRIOS” 10;“ENTRADA/SAIDA FICTA”

So maritime shipments are VIA == “01”

Total volumes

The above table shows that the input trade data is equal to the SEI-PCS output in all years. There is only a minor differences in the MDIC-port data in 2019.

We also look at FOB

reactable(tot_exp_compare_fob, pagination = TRUE, sortable = TRUE, filterable = TRUE, outlined = TRUE)

We also look at maritime and non-maritime trade separately, especially in 2019 and 2020 to see how close we can match the BoL to MDIC port. The number of non maritime shipments are quite small in 2019 and 2020:

  • 2.6 Mtonnes in 2019
  • 0.8 Mtonnes in 2020

By removing the non-maritime trade, we can do a 1:1 comparison between the 2019/2020 SEI-PCS results and MDIC. As shown below we are still slightly off by 2-5 Mtonnes.

##   year  tot_exp sei_pcs_tot_exp mdic_tot_exp_m ratio_sei_pcs_trade
## 1 2019 83962027        83995510       89777197                   1
## 2 2020 97835599        97838045      100778445                   1
##   ratio_sei_pcs_mdic_m
## 1                  0.9
## 2                  1.0

We then look at specific products.

Total product volumes

The results show that all results are the same at the product level with some cases where MDIC is up to 20% lower than the trade data that we have.

Estimated of Domestic Consumption

Then we look at the size of the domestic consumption as estimated through the LP step.

## # A tibble: 17 × 2
##     year    tonnes
##    <dbl>     <dbl>
##  1  2004 13288324.
##  2  2005 11757259.
##  3  2006 13148152.
##  4  2007 19520921.
##  5  2008 20893072.
##  6  2009 14815789.
##  7  2010 24219811.
##  8  2011 25511858.
##  9  2012 17036172.
## 10  2013 23922586.
## 11  2014 25828618.
## 12  2015 26304514.
## 13  2016 28797296.
## 14  2017 30765239.
## 15  2018 18369539.
## 16  2019 30076054.
## 17  2020 23870489.

Sep 16 2022: The domestic consumption was run with values of 28 Mtonnes (2019) and 24 Mtonnes (2020). There are 5 flows with domestic consumption < 1 tonne and then 430 that are < 100 tonnes. We need to discuss internally and then perhaps choose to anonymize flows that are less than a threshold (say one truck’s worth or 4 tonnes).

Oct 19 2022: domestic consumption is now 30 Mtonnes in 2019 and remains 24 Mtonnes in 2020.

Nov 7 2022: domestic consumption is 2 tonnes in 2016 and 7 tonnes in 2015 for Unknown biome

Total from countries of destination

We then look at the breakdown per country of destination to see any major differences across years.

The above table highlights major differences in expected destination countries with MDIC mostly in 2019 and 2020 (which is expected). Some key countries to pay close attention to are:

  • Singapore (trade might be going to China)
  • South Africa (not sure why; potentially going to China?)
  • Malaysia
  • Philippines
  • Colombia

To name a few. We notice that the volumes going to China are actually quite close for China (Mainland) when comparing CD/BoL/SEI-PCS results with MDIC, but the difference could be made up with the Singapore/Hong Kong volumes.

Let’s now compare the trajectory of total trade to both the EU and China, paying close attention to the “switch” in data sources from 2017 to 2020.

## # A tibble: 4 × 5
## # Groups:   year [2]
##   year  economic_bloc sei_pcs_tot mdic_tot  diff
##   <chr> <chr>               <dbl>    <dbl> <dbl>
## 1 2019  CHINA                47.9     58.2 10.4 
## 2 2019  EU                   12.7     14.6  1.88
## 3 2020  CHINA                53.2     60.8  7.68
## 4 2020  EU                   14.2     17.0  2.80

Differences in volumes exported by traders (exporter)

Now we turn to traders to see the evolution of the results from 2004-2020 and any major change in the top traders (those trading 80% of volume)

## Warning: The `x` argument of `as_tibble()` can't be missing as of tibble 3.0.0.

## # A tibble: 17 × 3
##     year exporter_group      tons
##    <dbl> <chr>              <dbl>
##  1  2004 UNKNOWN FLOWS  10382056.
##  2  2005 UNKNOWN FLOWS   8928379.
##  3  2006 UNKNOWN FLOWS   4987015.
##  4  2007 UNKNOWN FLOWS   6651521.
##  5  2008 UNKNOWN FLOWS   5235834.
##  6  2009 UNKNOWN FLOWS   3661392.
##  7  2010 UNKNOWN FLOWS   3720807.
##  8  2011 UNKNOWN FLOWS   4387229.
##  9  2012 UNKNOWN FLOWS   4767839.
## 10  2013 UNKNOWN FLOWS   7448920.
## 11  2014 UNKNOWN FLOWS   6758091.
## 12  2015 UNKNOWN FLOWS   6334506.
## 13  2016 UNKNOWN FLOWS   7090172.
## 14  2017 UNKNOWN FLOWS   8350816.
## 15  2018 UNKNOWN FLOWS  18024373.
## 16  2019 UNKNOWN FLOWS  13923781.
## 17  2020 UNKNOWN FLOWS  17554132.

Let’s how check the change in market share over time of each company

By looking at the bigger export companies, we see that some are clearly missing some export volumes:

  • Louis Dreyfus exports 9.8 Mtonnes in 2018, but then < 1 Mtonnes in 2019/2020 (likely due to naming issues in the db) - Ok on Sep 16 2022
  • Cargill has a slight drop in market share but difficult to claim that it is due to missing data in the BoL
  • Same comment for Bunge - Ok on Sep 16 2022
  • Same comment for Granol that exported < 500,000 tonnes/y starting 2017, but there is a really small amount in 2020 (hard to tell) - Ok on Sep 16 2022
  • Cervejaria Petropolis has had a small market share anyway, so difficult to claim missing data - Ok on Sep 16 2022
  • Cofco has a very small market share in 2020, but there are some issue with the company names here (to fix in db as well) - Ok on Sep 16 2022
  • Bianchini, Engelhart, CHS have subsidiaries in 2019/2020 that do no t appear in the db - Ok on Oct 19 2022
  • Cutrale has a small market share which dropped in 2020 (< 300,000 tonnes in 2020)
  • Granol’s market share has been small since 2018, but difficult to state whether there are exports missing there
  • Mitsui & CO might be missing some market share starting in 2019 (it was quite small, but then it disappeared completely)

We will need to rerun this analysis once the trader names have been cleaned. After checking the original BoL in 2019/2020 these company names appear with a CNPJ, but somehow the output did not assign a trader id (likely due to synonyms).

Sep 16 2022: There are still some minor fixes to apply to a few company names Oct 19 2022: All fixed

Differences in volumes exported by traders (exporter) per logistics hub

Our method will assign different municipalities as logistics hub (LH) year-on-year. We track this here and notice a large increase in the number of LH in 2019 and 2020. This is likely due to the matching/method.

The question now is what are the decision tree branches that are linked to these new LH. We check that next

The above might be fixed with the new errors and bugs found in CNPJ. We will need to revisit this.

Focus on importers

We know that the BoL do not have as much information on importers as we hoped for 2019 and 2020. Let’s first compare the SEI-PCS results with the original BoL data.

## # A tibble: 835 × 5
## # Groups:   year [2]
##    year  importer.label                   tonnes  tot_exp   pct
##    <chr> <chr>                             <dbl>    <dbl> <dbl>
##  1 2020  UNKNOWN CUSTOMER              76716156. 97835599    78
##  2 2019  UNKNOWN CUSTOMER              59552624. 83962027    71
##  3 2020  BUNGE                          5453884. 97835599     6
##  4 2019  BUNGE                          3815371. 83962027     5
##  5 2019  CARGILL                        1320313. 83962027     2
##  6 2019  COFCO                          1566624. 83962027     2
##  7 2019  ADM ARCHER DANIELS MIDLAND      703445. 83962027     1
##  8 2019  AMAGGI INTERNATIONAL LTD        551499. 83962027     1
##  9 2019  BANGKOK PRODUCE MERCHANDISING   551596. 83962027     1
## 10 2019  CEFETRA BV                      489608. 83962027     1
## # … with 825 more rows
## # ℹ Use `print(n = ...)` to see more rows

We note that the volume of “Unknown customers” is 78% in 2020 and 71% in 2019 in the BoL but only 20% or so in the SEI-PCS results. This is unfortunately the nature of the data and we cannot do anything about it.

Other checks needed on 2.6.0:

  • check commodity factors are well applied

Compare v.2.6 to vs.2.5 (current version on the front end) - 2004-2018

We then compare the two sets of results for the 2004-2018 period to see the kinds of differences the users will run into. The analysis includes: + check the total exports and per HS code export in each year + check differences in logistics hub (there should be no difference) + check volume differences per branch (there should be no difference) + check total export by main trader and country (no differences expected) + check differences in municipalities of production (we expect differences here from the ) + look at effects on EU and China imports, look at French imports

There are no differences between 2.5.1 and 2.6.0 on total exports and product-specific exports (PASS check). We then check volume differences per SEI-PCS branch.

Report per branch:

So we check specific LH/branches where the volumes are different.

A few high level observations:

Oct 19 2022: Some changes were implemented to relax the restrictions of branch 3.

Let’s now check company specific sourcing to LH and then to municipalities of production.

From a first check of the results, we see an increasing deviation of volume being replicated for the Trader-LH in the later years.

We check the big traders to see how different each trader-LH is either the same or different between v.2.5 and 2.6:

We can then check how these volumes change per branch. We expect that branches 3 and above would be selecting different LH.

We check the big traders to see how different each trader-LH (per branch) is either the same or different between v.2.5 and 2.6:

The majorty of the volume is being sourced in different municipalities from the companies.

For v.2.6.0 we also need to check that commodity ratios have properly been applied.

These are:

## # A tibble: 0 × 12
## # … with 12 variables: year <dbl>, product_type <chr>,
## #   region_production_1_trase_id <chr>, tons_prod <dbl>, tons <dbl>,
## #   tons_demand <dbl>, tons_demand_tot <dbl>, ratio <dbl>, balance <dbl>,
## #   test_balance <chr>, test_ratio_cake <chr>, test_ratio_oil <chr>
## # ℹ Use `colnames()` to see all variable names

From the above checks, the commodity ratios are respected and the production constraint (export + domestic consumption <= production) as well.