Checking the coverage of ET from Flach et al (2020)

Initial exploration of the data from Flach et al (2020) showed that there are some municipalities with reported production of soybean, but for which there is no ET estimate. According to Rafaela, this is due to missing factors from Abrahão and Costa (2018) who predicted no production in those areas following other factors (precipitation, photo-period), but for which IBGE reports production.

We also note that there are some municipalities for which IBGE reports a soybean area (AREA_HA) but no production and no yield. At this stage we remove any municipality that does not have production.

We note some discrepancies with detection of soy crop area with Song et al (2021) (for which there is a cutoff at 20 ha, meaning that anything < 20 ha would be 0).

Ultimately for every area reported by IBGE, we need some estimate of ET, and perhaps use an assumption when there is no ET estimate, but soybean production reported (e.g. small area).

We check specific discrepancies at the municipality level and then at the micro-region level.

Municipality level

Table showing the discrepancies in each year:

Microregion level

Table showing the discrepancies in each year:

## Simple feature collection with 11 features and 2 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: -64.25515 ymin: -30.25236 xmax: -36.1785 ymax: 3.742139
## CRS:           NA
## # A tibble: 11 × 3
##    YEAR  IBGE_MICRO                                                     geometry
##    <chr>      <int>                                                   <GEOMETRY>
##  1 2003          11 MULTIPOLYGON (((-49.9487 -27.383, -49.9554 -27.40329, -49.9…
##  2 2004           9 MULTIPOLYGON (((-43.44032 -13.25589, -43.43228 -13.25862, -…
##  3 2005           5 MULTIPOLYGON (((-50.35759 -30.19734, -50.36295 -30.20553, -…
##  4 2006           7 MULTIPOLYGON (((-50.36295 -30.20553, -50.40496 -30.21607, -…
##  5 2007           9 MULTIPOLYGON (((-50.36295 -30.20553, -50.40496 -30.21607, -…
##  6 2008           4 MULTIPOLYGON (((-37.78683 -5.297149, -37.92536 -5.297149, -…
##  7 2009           5 MULTIPOLYGON (((-36.5664 -9.682164, -36.53601 -9.663043, -3…
##  8 2010           2 POLYGON ((-60.29533 3.38703, -60.2426 3.393664, -60.19702 3…
##  9 2011           5 MULTIPOLYGON (((-43.62935 -13.26838, -43.44032 -13.25589, -…
## 10 2012           4 MULTIPOLYGON (((-49.41646 -26.66107, -49.44998 -26.64741, -…
## 11 2013           5 MULTIPOLYGON (((-49.41646 -26.66107, -49.44998 -26.64741, -…

There is less discrepancy when selecting the micro-region level, but we lose the inter-municipal resolution.

Note that there are some municipalities with ET simulations but for which Flach et al (2020) do not report ET data. The results are also filtered to remove municipalities where there was not soybean area detected by Song et al (2021).

The mean number of municipalities with missing ET is 53 per year (between 2000 and 2013).

There seems to be missing ET mostly in the Atlantic Forest (currently missing Brazilian boundary).

We then check which of these municipalities are linked to the trase data filtered to the 3 main countries of interest: Brazil (domestic consumption), China (Mainland) and the EU which are the focus of the project.

The number of municipalities linked to exports to Brazil, China and the EU that do not have a soybean ET estimate is almost as high as production, which means that we need to think of a way to provide an estimate before carrying on.

To resolve these missing estimates, we can try to use a different boundary setting (e.g. larger boundaries such as micro regions). These per-municipality results were provided by weighted average of simulation units obtained in ET.

to do: check the spread of soybean ET per micro-region code, considering the sensitivity analysis results in each of the years so that we can provide an estimate in the missing municipalities between 2000 and 2013. Then we can use those values to check. There also needs to be a deeper check of the differences observed between Song et al (2021) and the IBGE results.

Checking yields from Flach et al (2020)

Rafaela provided simulated yields along with differences in ET estimates according to early or late planting, and short and long crop development cycles. The yields obtained from the simulations do not correspond to the IBGE yields in all cases, as the best simulated yields (and associated ET) were selected for the Flach et al (2020) paper.

However, the range of yields and ET provided in Flach et al (2020) allows us to apply the Doorenbos (1979) method to derive a climatic relationship between yield and ET that could help us derive ET for a given IBGE yield.

We first look at these differences in yield, considering different calendars:

The graphs below provide the average soybean yields over all years considering:

The main observation when comparing simulated with IBGE yields is that the IBGE yields are roughly ~3 tonnes/ha, so do not disply a clear range (naturally) compared to the simulated range of yields from Flach et al (2020) (~1.5-5 tonnes/ha).

Deriving the Ky curve (ET to yield relationship)

These first results consider a unique Ky value per municipality, considering a maximum ET and maximum yield across all years (simluated from Flach et al (2020)), and all calendar (early, mid- and late plating), and management (short, mid-, long development cycle) options.

As a result, in cases where IBGE yields were lower than expected, these Ky curve will provide a corrected ET so that the WF values are more realistic (i.e. linked to the actual yield).

This test also assumes that the changes in yield are primarily due to changes in precipitation (not other factors such as pests, fertilizer application, etc.).

To do so, we treat the individual yield simulations as possible yields that vary with meteorological conditions (from planting, meteorological conditions and development cycle) to derive values of Ky (yield response factor) from the relationship below (Doorenbos et al 1979):

\[1 - Y/Ymax = Ky*(1 - ET/ETmax)\] Where Y (ha ton-1) is the simulated yield from Flach et al (2020) for a given municipality and year, Ymax is the maximum yield across all simulations in multiple years for a given municipality. ET and ETmax are the corresponding simulated ET. The approach resembles the approach from Tuninetti et al (2017) but rather than using the approach to estimate yield, we have yield (from IBGE) and the simulated yields (from Flach et al (2020) and simulated ETmax).

We then calculate the Ky values per municipality to correct ET to IBGE yields.

Ky curves at the municipality level

Simple statistics of the Ky curve coefficients (municipality level):

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.650   0.960   1.010   1.018   1.070   1.310



The map below shows how the Ky curve coefficients are spread across the country.





Here is an example of a Ky curve for different municipalities:

Recall that the above curves were derived using maximum ET and yields (simulations from Flach et al (2020)) considering all calendars (planting dates, crop development cycles) as well as years. The argument here is that changes in precipitation will affect yields and so we can use multiple years to get the wider spread of yields and corresponding ET values, even when simulated.



Showing how Ky varies per calendar option:





Here we make a test estimating how Ky varies per calendar option, considering ET maximum and yield maximum per calendar option, across all years. Here are the results, using the example of the specific municipalities:

So clearly there is a spread of Ky values based on the planting dates and crop development cycles, but considering all years in the simulations.

##         calendar coef.TRASE_ID coef.Ky coef.R coef.pvalue coef.n
## 1    Early Short    BR-4118808    0.83   0.93           0     99
## 2  Eearly Medium    BR-4118808    0.80   0.92           0     99
## 3     Early Long    BR-4118808    0.79   0.94           0     99
## 4   Medium Short    BR-4118808    0.83   0.94           0     99
## 5 Medium Mmedium    BR-4118808    0.75   0.91           0     99
## 6    Medium Long    BR-4118808    0.80   0.91           0     99
## 7     Late Short    BR-4118808    1.01   0.97           0     99
## 8    Late Medium    BR-4118808    1.02   0.97           0     99
## 9      Late Long    BR-4118808    0.98   0.97           0     99

We can then use the individual values of Ky to calculate the ET following the IBGE yields.

Note that some municipalities that did not have an ET estimate from the public data, do have simulated yield and ET estimates.

Ky curves at microregion level


We follow the same exercise at the micro-region level by estimating a Ky curve for the micro-region. below is a summary:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.74    0.96    1.02    1.02    1.07    1.26


Here is an example of a Ky curve for micro-region 17003:

##   MICROCOD   Ky    R pvalue   n
## 1    17004 1.15 0.98      0 603

The microregion level might be better to correct the ET in cases of problematic IBGE yields at the municipality level. To apply the above corrections, we need to consider the following:

  • A revised IBGE yield at the microregionlevel, based on total area planted and total production of soybean within the microregion
  • A revised simulated ET from Flach et al (2020) taken as the average ET of the municipalities within the microregion
  • A revised Ymax and ETmax from Flach et al (2020) taken as the average of the municipalities within the microregion

Some further investigation on the relationships between yield and water use

## `geom_smooth()` using formula = 'y ~ x'

So there is a linear relationship in the data from Flach et al (2020) linking water use (in mm) with yield (ton/ha).

Applying the Fast-Track method

The Fast-track method is described by the equation below (Tuninetti et al 2017):

\[ CWF_{c,t}(Y) = \frac{\overline{CWF_{m,T}} * \overline{Y_{m,T}}}{Y_{m,t}} [m^3/ton] \] where m denotes municipality, \(\overline{Y_{m,T}}\) denotes the average yield during period T, and \(Y_{m,t}\) the yield in year t.

Yields can be described by the following equation:

\[ Y_{i,t} = \alpha^{cl}_{i,t} * \alpha^{man}_{i,t} * Y^{Mo}_{i,t=x} \left[ \frac{ton}{ha} \right] \]

The factor \(\alpha^{cl}_{i,t}\) accounts for climate-driven yield changes, while the \(\alpha^{man}_{i,t}\) accounts for the yield changes induced by technological advances and agricultural improvements, ascribable to the anthropic(human) role in agriculture.

When only climatic variations are taken into account,the yield value reads

\[ Y^{cl}_{i,t} = \alpha^{cl}_{i,t} * Y^{Mo}_{i,t} \]

where

\[ \alpha^{cl}_{i,t} = 1 - ky * \left( 1 - \frac{ET_{i,t}}{ET_{i,t = x}} \right) \] The \(\alpha^{man}_{c,t}\) factor expresses the yield variability due technological and mechanical advances in the agricultural management (e.g. use of pesticides, application of fertilizers, extensive irrigation) and it is thought as a correction factor to the \(Y^{cl}_{c,t}\) values in order to account for all other aspects beyond climate. It is defined as the ratio between the FAO country scale yield, \(Y^{FAO}_{c,t}\), and the national \(Y^{cl}_{c,t}\) values calculated with equation

\[ \alpha^{man}_{c,t} = \frac{Y^{FAO}_{c,t}}{Y^{cl}_{c,t}} \]

Fast-Track method with corrections to ET for depressed IBGE yields (using Ky curves)

First we correct the ET for the reported IBGE yields and then apply the Fast-Track method.

The method seems to provide a decent correction, but there needs to be a special selection now based on the stats (n, R2, and p-value). We also need to keep in mind that the correction is only meant to provide a dataset that can be used for the Fast-Track Method (see below) that will help provide predictions to soybean and maize ET beyond 2013.

Another option is to simply use the Ky curves + simulated ET/Yield from Flach et al (2020) to derive the ET of the crop beyond 2013. This is perhaps better than trying to correct the ET values.

Some of the data that should clearly be removed are:

  • estimates where corrected ET < 0
  • estimates where the p-value for Ky is > 0.05
  • estimates must have at least 3 years of data (with 9 simulations each year, so n > 27)
  • other factors affecting yield not related to climate or management

On this last point, other factors affecting the yield/ET relationship may have to do with climate shocks or disease that may have affected the yield, making it unreliable for the estimate (e.g. as described in Assad et al 2007).

Sentelhas et al (2015) also provide a detailed analysis of the yield gap and estimate that the yield gap due to crop management was 464 kg ha-1, while the yield gap due to water deficit was 1378 kg ha-1 (1980-2011) confirming that the largest gaps are more related to climatic differences. Average yield gap was 29% caused by water deficits and 13% by crop management. They also noted that in 2010/11 yields tended to be much lower in Southern Brazil (due to large water deficits). It is important to note that sowing date is considered a management factors (as well as fertilizer, pest control, etc.).

Additional notes from Sentelhas et al (2015) on yield gaps:

  • Breeding can reduce these gaps, especially new breeds that are more drought resistant. This is unlikely covered in the modeling steps of Flach et al (2020)
  • Sowing dates, which are taken care of in the simulations of Flach et al (2020)
  • Soil improvements
  • Crop rotations

In order to select the appropriate ET and yield combination we could consider a threshold of < 2 mm d-1 as this would mean that there is something serious going on with the crop (using an average crop development cycle of 125 days as per Flach et al (2020))

We look at the results in turn:

Year, municipality combination where corrected ET < 0:

## # A tibble: 36 × 10
##    YEAR  TRASE_ID   NAME       STATE YIELD…¹ AREA_HA GLAD_HA  TONS    ET ET_corr
##    <chr> <chr>      <chr>      <chr>   <dbl>   <dbl>   <dbl> <dbl> <dbl>   <dbl>
##  1 2009  BR-4101101 ANDIRA     Para…    0.27   10700    3522  2900  454.     -40
##  2 2009  BR-4101101 ANDIRA     Para…    0.27   10700    3522  2900  454.     -40
##  3 2009  BR-4101101 ANDIRA     Para…    0.27   10700    3522  2900  454.     -40
##  4 2009  BR-4101101 ANDIRA     Para…    0.27   10700    3522  2900  454.     -40
##  5 2009  BR-4101101 ANDIRA     Para…    0.27   10700    3522  2900  454.     -40
##  6 2009  BR-4101101 ANDIRA     Para…    0.27   10700    3522  2900  454.     -40
##  7 2009  BR-4101101 ANDIRA     Para…    0.27   10700    3522  2900  454.     -40
##  8 2009  BR-4101101 ANDIRA     Para…    0.27   10700    3522  2900  454.     -40
##  9 2009  BR-4101101 ANDIRA     Para…    0.27   10700    3522  2900  454.     -40
## 10 2009  BR-4102406 BANDEIRAN… Para…    0.15   13550    3673  1980  442.     -59
## # … with 26 more rows, and abbreviated variable name ¹​YIELD_IBGE

Year, municipality combination where correct ET > 0, and p-value > 0:

## # A tibble: 0 × 11
## # … with 11 variables: YEAR <chr>, TRASE_ID <chr>, NAME <chr>, STATE <chr>,
## #   YIELD_IBGE <dbl>, AREA_HA <dbl>, GLAD_HA <dbl>, TONS <dbl>, ET <dbl>,
## #   ET_corr <dbl>, p <dbl>

In the above case, we note that the remote sensing information from Song et al (2021) does not detect a soybean planting area.

Year, municipality combination where correct ET > 0, and p-value < 0, but there are < 3 years of data (n < 27):

## # A tibble: 0 × 2
## # … with 2 variables: YEAR <chr>, municipalities <int>

A quick analysis of the values of n provides the following picture:

  • n = 8 typically means that there is only 1 year of production in the municipality (in roughly 100 municipalities)
  • n = 24 seems to be linked to municipalities that started planting soy later in the time series (2010 onward) and so there are fewer simulations available. From the table above, there are about 18-24 municipalities between 2011 and 2013 that should be included in the selection

We then check the simulated ET to see if there might be an issue with production as reported by IBGE. The effects of drought, fertilizer and pest, disease may affect the yield such that:

  • ET cannot be predicted using the relationship to water deficit
  • A larger portion of the ET represents mostly soil evaporation and isn’t fully linked to the production of soybean

Given previous research, ET0 over the course of the soybean development cycle in roughly 500 mm, or about 4 mm d-1 on average over the 125 days of soybean development. Any predicted ET that would generate < 1 mm d-1 of average ET could be considered to be problematic for our purposes.

We look at these results at the municipality level first:

## `summarise()` has grouped output by 'YEAR'. You can override using the
## `.groups` argument.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

We notice that the majority of municipalities that have ET < 1 mm d-1 are located in Rio Grande do Sul which is known to have had climatic and disease that have affected yields (Sentelhas et al 2015) with over 50 municipalities with ET < 1 mm d-1 in the years 2000 (negative ENSO), 2002 (neutral), 2004 (neutral), 2005 (positive), 2008 (negative), 2009 (negative), 2012 (negative) where neutral and negative ENSO area associated with less precipitation and yield in Southern Brazil (Sentelhas et al 2015)

For now we can remove these municipalities for the purposes of testing the Fast-Track method.The summary of the data is shown below.

Note: If the above methods were to be used, we will need to address these municipalities in the application of the Fast-Track method.

We then look at microregions:

## Adding missing grouping variables: `MESOCOD`
## `summarise()` has grouped output by 'YEAR'. You can override using the
## `.groups` argument.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Rio Grande do Sul is highlighted here as well with microregions showing ET < 2 mm/d. This requires further investigation to understand crop failures in the South when calculating the results.

Summary of corrected ET at the municipality level:

  • Minimum = 119 mm
  • Maximum = 309 mm
  • Mean = 373 mm
  • Median = 361.6828905 mm

Summary of corrected WF at the municipality level:

  • Minimum = 840 m^3 tonne-1
  • Maximum = 1340 m^3 tonne-1
  • Mean = 1420 m^3 tonne-1
  • Median = 1423 m^3 tonne-1

Validate simulations with published ET results

We use published data/simulations to check the corrected ET values. First, we use the data from Figueiredo et al (2021) who simulate water productivity for soybean in several agroclimatic zones.

So far this validation step isn’t very promising.

## 
## Call:
## lm(formula = validation_1$WP ~ validation_1$WP_YIELD_KG_HA_MM_MEAN)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3.02930 -0.23784 -0.02121  0.33306  0.89126 
## 
## Coefficients:
##                                      Estimate Std. Error t value
## (Intercept)                          7.078513   0.062656 112.974
## validation_1$WP_YIELD_KG_HA_MM_MEAN -0.041573   0.006269  -6.631
##                                                 Pr(>|t|)    
## (Intercept)                         < 0.0000000000000002 ***
## validation_1$WP_YIELD_KG_HA_MM_MEAN      0.0000000000459 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4546 on 1537 degrees of freedom
## Multiple R-squared:  0.02782,    Adjusted R-squared:  0.02718 
## F-statistic: 43.98 on 1 and 1537 DF,  p-value: 0.00000000004589

We then check the WF results for 2003-2005 with the state averages published by the WFN.

## 
## Call:
## lm(formula = validation_2$WF_corr ~ validation_2$WF_1996_2005)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -209.31  -61.47  -10.38   86.20  310.01 
## 
## Coefficients:
##                             Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)               1638.16151  124.71916   13.13 <0.0000000000000002 ***
## validation_2$WF_1996_2005   -0.08118    0.06057   -1.34               0.186    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 112.9 on 49 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.03537,    Adjusted R-squared:  0.01568 
## F-statistic: 1.797 on 1 and 49 DF,  p-value: 0.1863

It looks like out state-wide weighted mean values slightly overestimate WF when compared to the WF Network results.

Validate the Fast-Track Method

Tuninetti et al (2017) carry out a validation of the Fast-Track Method by comparing two different estimates:

Using the corrected ET and WF values for soybean determined above we take a reference time period (e.g. 2000-2005) and compare it to another time period (2005-2013) both of which are containing within the data available from Flach et al (2020). The Fast-Track method then uses the following relationship:

\[ WFm,y = WFm,T(ave) Ym,T(ave) / Ym,y \] Where in our example:

We perform a first prediction based on selected municipality ET (selected above) to try and predict the WF of soybean in 2013 from the reference time period 2003-2012 (using a yield average that is weighted by area across years):

## 
## Call:
## lm(formula = soybean_wf_2$WF_FT ~ 0 + soybean_wf_2$WF_corr)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -583.3 -162.4  -35.6   97.3 4246.7 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## soybean_wf_2$WF_corr 0.901140   0.001556   579.1 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 281.6 on 16082 degrees of freedom
## Multiple R-squared:  0.9542, Adjusted R-squared:  0.9542 
## F-statistic: 3.354e+05 on 1 and 16082 DF,  p-value: < 0.00000000000000022

There are many WF that are either overestimated or underestimated using the current FT method, using a wide reference period (2003-2012) to infer the year 2013. The relationship between the two is shown above.

We repeat the Fast-Track method by applying the method to aggregated microregions instead of municipalities:

## 
## Call:
## lm(formula = soybean_wf_2_micro$WF_FT ~ 0 + soybean_wf_2_micro$WF_corr)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -541.64 -152.43  -46.73   68.53 2410.72 
## 
## Coefficients:
##                            Estimate Std. Error t value            Pr(>|t|)    
## soybean_wf_2_micro$WF_corr  0.93457    0.01218   76.74 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 276.4 on 251 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.9591, Adjusted R-squared:  0.959 
## F-statistic:  5889 on 1 and 251 DF,  p-value: < 0.00000000000000022

We need to find other simulations to be able to compare our estimates and make a call on whether they are acceptable.

Conclusions when correcting ET for depressed IBGE yields

We are facing a tough decision here to defensibly use the data from Flach et al (2020) to predict soybean ET beyond 2013:

  • ET and Yield provided by the Flach et al (2020) dataset are simulations and represent perfect conditions. This means that the simulated yield won’t match the yield reported by IBGE and that there are simulations that are missing in some municipalities
  • A correction to ET for the IBGE reported yield is necessary
  • The Fast-Track Method using the corrected ET datasets does show some broad discrepancies of up to ~25% between corrected and predicted ET values. Using microregions only marginally improves the predictions.

We may consider the following steps forward:

  1. Consider using the Ky curves exclusively for predicting the yields in other years (pseudo Fast-Track Method)
  2. We need to understand the role of fertilizer in the changes in yield for both soybean and corn
  3. We are still without any proper validation points to assess what the actual ET is in Brazil. Single points here and there are not ideal, which means that the values remain quite theoretical. Validation using Figueiredo et al (2021) did not show any promising result so far.

Fast-Track method without corrections to ET

Then we apply the Fast-Track method without previously correcting the ET. This means that we use the average ET from Flach et al (2020) and relate to the IBGE yields (that together can provide WF values > 10,000 m3ton-1).

## 
## Call:
## lm(formula = soybean_wf_2$WF_FT ~ 0 + soybean_wf_2$WF)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1006.40  -149.84   -59.06    61.93  1903.09 
## 
## Coefficients:
##                 Estimate Std. Error t value            Pr(>|t|)    
## soybean_wf_2$WF 1.047302   0.003001     349 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 259.5 on 1821 degrees of freedom
##   (134 observations deleted due to missingness)
## Multiple R-squared:  0.9853, Adjusted R-squared:  0.9853 
## F-statistic: 1.218e+05 on 1 and 1821 DF,  p-value: < 0.00000000000000022

#### Validation of results against known data

Starting with the data from Figueiredo et al (2021). Results do not correlate, simply since the ET values from Flach et al (2020) do not match the IBGE yields.

## 
## Call:
## lm(formula = validation_1$WP ~ validation_1$WP_YIELD_KG_HA_MM_MEAN)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2025 -0.2919  0.0382  0.4753  2.0155 
## 
## Coefficients:
##                                     Estimate Std. Error t value
## (Intercept)                          3.12588    0.33744   9.263
## validation_1$WP_YIELD_KG_HA_MM_MEAN  0.16406    0.03403   4.821
##                                                 Pr(>|t|)    
## (Intercept)                         < 0.0000000000000002 ***
## validation_1$WP_YIELD_KG_HA_MM_MEAN            0.0000031 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.872 on 174 degrees of freedom
## Multiple R-squared:  0.1178, Adjusted R-squared:  0.1128 
## F-statistic: 23.24 on 1 and 174 DF,  p-value: 0.000003097

Here is an attempt to see if changing the ET and Yield to the model values changes the validation results.

## 
## Call:
## lm(formula = validation_1$WP ~ validation_1$WP_YIELD_KG_HA_MM_MEAN)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.89205 -0.42856  0.05849  0.47223  1.55505 
## 
## Coefficients:
##                                     Estimate Std. Error t value
## (Intercept)                          6.77116    0.26866  25.203
## validation_1$WP_YIELD_KG_HA_MM_MEAN  0.01570    0.02709   0.579
##                                                Pr(>|t|)    
## (Intercept)                         <0.0000000000000002 ***
## validation_1$WP_YIELD_KG_HA_MM_MEAN               0.563    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6943 on 174 degrees of freedom
## Multiple R-squared:  0.001926,   Adjusted R-squared:  -0.00381 
## F-statistic: 0.3358 on 1 and 174 DF,  p-value: 0.563

Validation against WFN data

We then compare against state values from the Water Footprint Network.

## 
## Call:
## lm(formula = validation_2$WF ~ validation_2$WF_1996_2005)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1127.1  -332.9   -82.0   174.3  4507.5 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)  
## (Intercept)               801.6067   871.9417   0.919   0.3626  
## validation_2$WF_1996_2005   0.7844     0.4236   1.852   0.0704 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 789 on 47 degrees of freedom
##   (10 observations deleted due to missingness)
## Multiple R-squared:  0.06799,    Adjusted R-squared:  0.04816 
## F-statistic: 3.428 on 1 and 47 DF,  p-value: 0.07037

## Analysis of ANA data

We received ET data from ANA in April 2022 for all crops and aggregated at the municipality level. We look at this data more closely here.

First we take a look at the distribution of ET and WF values in the ANA dataset.

Now we compare the geographical distribution of ET and WF values, for the Flach et al (2021) and the ANA dataset for the overlapping year of 2013.

For the year of 2013, the relationship between ET and WF between the Flach et al (2021) and ANA datasets is shown below.

So the estimates from Flach et al (2020) are generally greater than the estimates from ANA. We can think of various reasons:

  • The estimates from Flach et al (2020) are meant to represent the perfect case
  • The estimates from ANA are in m3/s and require a proper area to derive mm/y that make sense (and we know the issues with IBGE data)

General conclusions

Some key general conclusions following the above exploration:

Next steps:

References

Assad et al (2007) Sistema de previsão da safra de soja para o Brasil Pesquisa Agropecuária Brasileira 42(5): 615-625, doi: 10.1590/S0100-204X2007000500002

Doorenbos et al (1979) Yield response to water FAO Irrigation and Drainage Paper Food and Agriculture Organization of the United Nations: Rome, Italy.

Figueiredo et al (2021) Impact assessment of soybean yield and water productivity in Brazil due to climate change European Journal of Agronomy 129: 126329, doi: 10.1016/j.eja.2021.126329

Tuninetti et al (2017) A Fast Track approach to deal with the temporal dimension of crop water footprint. Environ. Res. Lett. 12, 074010. https://doi.org/10.1088/1748-9326/aa6b09

Flach et al (2020) The effects of cropping intensity and cropland expansion of Brazilian soybean production on green water flows Environmental Research Communications 2(7): 071001, doi: 10.1088/2515-7620/ab9d04

Flach et al (2020) Water productivity and footprint of major Brazilian rainfed crops – A spatially explicit analysis of crop management scenarios Agricultural Water Management 233: 105996, doi: 10.1016/j.agwat.2019.105996.