Introduction

This document explores different options for representing IBGE’s soybean production accross Brazil for the purposes of calculating ET. IBGE data can be notoriously bad in some municipalities with reports of 1 ha of planted soy, or very low yields. It is unclear how some of the specific cases compare to reality, and therefore we might need to either (1) remove these specific cases, (2) flag them as problematic and keep out of analysis.

The analysis goes as follows:

  1. Review spread of IBGE data at municipality (area and yield) for 2010-2017
  2. Compare results to Song et al (2020)
  3. Check micro-region aggregation
  4. Make a final decision on how to use the data

Review the spread of IBGE data at municipality level

We first check whether there are cases where IBGE reported an area of soy of 0 ha and production.

## # A tibble: 0 × 11
## # … with 11 variables: TRASE_ID <chr>, year <chr>, NAME <chr>, STATE <chr>,
## #   MICROCOD <chr>, MESOCOD <chr>, tonnes <chr>, area_ha <chr>, yield <chr>,
## #   glad_ha <chr>, geometry <GEOMETRY>
## # ℹ Use `colnames()` to see all variable names

This basically never happens (good!)

We then look at the cases where IBGE does not report an area at all and also compare to Song et al (2020) do:

In these cases we would potentially have an estimate of soy deforestation in the municipality, but no meaningful benchmark for trade.

Then we have a look at very small areas, in groups of 10 ha.

We note that there are some municipalities that have a small area of soybean planted but no production for a given year, so we need to make sure that we consider both these factors when calculating ET.

When then check the municipalities that have yield < 1 and see if they fall within any specific category. Most of the production with yield < 1 tonne/ha is coming out of fields that are > 50 ha in size. Let’s see in which state these might be taking place.

So we clearly see an issue with yield in Southern Brazil. There may be some artefacts due to small yields when areas planted are small, but those do not represent the majority of the harvested area. In problematic years, these can represent quite a few municipalities in the state.

Review the spread of IBGE data at micro-region level

There are likely some artefacts in the data at municipality level; aggregating at the micro-region level might remove these artefacts and reveal areas that truly have a yield issue.

First we aggregate production and area at the micro-region level and calculate a new yield. This yield will have been weighted by production area and then repeat the above analysis.

This is basically the same graph as above, but with the problem now aggregated at the micro-region level. It is worth checking how our soy deforestation figures might need to be corrected for ibge production == 0, especially in Maranhão, Pará, and São Paulo (see further belwo).

Aggregating this way reduces the number of individual regions that become “problematic”. For instance, in RS there are 29 micro-regions with a low yield in 2005, which then become 1 to 4 in later years. This means that the problem is just concentrated to a few areas.

Let’s now check how the soy deforestation estimated using Song et al (2020) compares in municipalities where IBGE reports no soy being produced.

Results are ok, meaning that there is no soy deforestation predicted in municipalities for which IBGE does not report production.

Final decision

The preferred scale is the micro-level region to avoid to many issues with yield, ensuring that the areas with low yield, are actually due to low yield and not some artifact in the data collected by IBGE. The micro-region level is what we should use to estimate ET for crops before linking back to trade, considering the following steps:

References

Flach et al (2020) The effects of cropping intensity and cropland expansion of Brazilian soybean production on green water flows Environmental Research Communications 2(7): 071001, doi: 10.1088/2515-7620/ab9d04

Song et al (2020) Massive soybean expansion in South America since 2000 and implications for conservation Nature Sustainability 4(9): 784-792, doi: 10.1038/s41893-021-00729-z