Introduction

This script does a validation check of the soy ET, water footprint (WF) or similar calculated for TraseH2O. Values of soy ET were obtained following these steps:

Here we validate the results with the following datasets:

Comparison against Water Footprint Network

The WFN data provides soybean WF per state for the 1996-2005 period. ET is modeled using FAO56 and spatialized with Aquacrop. We can compare the two but the time period of the dataset is different to our study (2014-2017).

Here we compare the total water footprints.

## 
## Call:
## lm(formula = wfn_validation$soy_wf_state ~ wfn_validation$wf_1996_2005)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -165.747  -52.904    2.624   44.678  159.549 
## 
## Coefficients:
##                               Estimate Std. Error t value       Pr(>|t|)    
## (Intercept)                 3014.29787  188.21569  16.015 0.000000000614 ***
## wfn_validation$wf_1996_2005   -0.51469    0.09098  -5.658 0.000078304900 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 96.65 on 13 degrees of freedom
## Multiple R-squared:  0.7112, Adjusted R-squared:  0.6889 
## F-statistic: 32.01 on 1 and 13 DF,  p-value: 0.0000783

## # A tibble: 15 × 6
##    state_id       sum_m3 sum_tonnes soy_wf_state state              wf_1996_2005
##    <chr>           <dbl>      <dbl>        <dbl> <chr>                     <dbl>
##  1 11         5553686600    3036489         1829 RONDONIA                   2284
##  2 15         8753509379    4696337         1864 PARA                       1941
##  3 17        19591764695    8845182         2215 TOCANTINS                  1695
##  4 21        16453383121    7549759         2179 MARANHAO                   1628
##  5 22        13543888092    5925826         2286 PIAUI                      1725
##  6 29        32521232457   16121015         2017 BAHIA                      1856
##  7 31        34223119307   16935652         2021 MINAS GERAIS               2018
##  8 35        19269255770   10200110         1889 SAO PAULO                  2114
##  9 41       117785931491   68446698         1721 PARANA                     2368
## 10 42        14519557421    8255519         1759 SANTA CATARINA             2315
## 11 43       114209368196   63696062         1793 RIO GRANDE DO SUL          2444
## 12 50        55704109049   30136874         1848 MATO GROSSO DO SUL         2544
## 13 51       229337383560  111104011         2064 MATO GROSSO                1924
## 14 52        79572277831   39156782         2032 GOIAS                      1994
## 15 53         1627451348     872220         1866 DISTRITO FEDERAL           1909

The data seems to fall within the same range, but the time periods are not the same and so the differences could be due to meteorological differences across the years.

Comparison against Figueiredo da Silva et al (2021)

We compare a limited sample of water productivity data available from Figueiredo da Silva et al (2021). The authors modeled water productivity for soy in Southern and Central Brazil (mostly focused on Pampas and Cerrado biomes). Soybean development was modeled with the CROPGROW-SOYBEAN model and included field data to calibrate the model to specific cultivars. Soil water was modeled with DSSAT-SBUILD tool. This allowed for the simulation of crop growth and key variables such as LAI, above ground dry matter, grain weight which then allowed then to re-run the models with climate change scenarios.

Here we compare the total water footprints.

## 
## Call:
## lm(formula = fig_validation$wp ~ fig_validation$wp_yield_kg_ha_mm_mean)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.66247 -0.18555 -0.00085  0.40238  1.34254 
## 
## Coefficients:
##                                       Estimate Std. Error t value
## (Intercept)                            5.38068    0.51743  10.399
## fig_validation$wp_yield_kg_ha_mm_mean -0.04750    0.05116  -0.928
##                                                  Pr(>|t|)    
## (Intercept)                           0.00000000000000322 ***
## fig_validation$wp_yield_kg_ha_mm_mean               0.357    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7277 on 62 degrees of freedom
## Multiple R-squared:  0.01371,    Adjusted R-squared:  -0.002198 
## F-statistic: 0.8618 on 1 and 62 DF,  p-value: 0.3568

Most points are out of the range of water productivity and those that were provided by the publication are generally higher than the ones we calculate, which suggests that our WF are greater overall. This seems to be a recurring problem with the modeling that we are applying and when comparing to field measurements.

Comparison against Mapbiomas Agua + PML

Here we use the data that Rafaela generated for the moisture recycling work overlaying the Mapbiomas land use classes with the PML ET product (Zhang 2020) to get the average land use class ET. The results provide average land use class ET for each month of the year which is calculated for the “Soybean” and “Irrigated soybean” classes for the months of Oct-Jan and Nov-Feb (as potential soybean growing seasons).

Note that we need to combine 2 different years to get the correct ET; i.e. soybean planted in October 2008 can be harvested in Jan 2009 and then assigned to the 2009 harvest.

Here we compare the total water footprints.

## 
## Call:
## lm(formula = mb_validation$et ~ mb_validation$et_rock)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -372.62 -146.67  -77.32   42.64  623.72 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)            0.57898   20.99223   0.028               0.978    
## mb_validation$et_rock  0.93424    0.03659  25.530 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 212 on 14758 degrees of freedom
## Multiple R-squared:  0.0423, Adjusted R-squared:  0.04223 
## F-statistic: 651.8 on 1 and 14758 DF,  p-value: < 0.00000000000000022

Some of the Mapbiomas data have low ET over its season, like ~200-300 mm per cycle which is more in line with the field work. There are non-sensical values as well with ET < 200 mm which might have something to do with the land classification. The bottom line is that our estimates are greater than those provided by Mapbiomas/PML.

Comparison against ANA

ANA (2020) provided ET results for all crops in all municipalities for the 2014-2017 period. The method provides indirect estimates of crop ET following FAO56 guidelines together with meteorological data available in Brazil. The crop water requirement adjusted for water stress relies on a modification of crop coefficients by a KS factor to represent the effect of water stress on crop transpiration. This assumption presupposes a linear relationship between water use and yield and therefore differs from our previous assumption in the case of the curve from (Rockström, 2003) which does not assume a linear relationship between transpiration and yield.

Here we compare the green WF obtained from our method with the green WF derived from the ANA data (purely from effective precipitation).

## 
## Call:
## lm(formula = ana_validation$soy_wf_ana ~ ana_validation$soy_gwf_rock)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -8159   -255     13    192  85130 
## 
## Coefficients:
##                              Estimate Std. Error t value            Pr(>|t|)
## (Intercept)                 -73.05877   53.06275  -1.377               0.169
## ana_validation$soy_gwf_rock   0.66705    0.02577  25.888 <0.0000000000000002
##                                
## (Intercept)                    
## ana_validation$soy_gwf_rock ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1289 on 8109 degrees of freedom
## Multiple R-squared:  0.07634,    Adjusted R-squared:  0.07622 
## F-statistic: 670.2 on 1 and 8109 DF,  p-value: < 0.00000000000000022

Again, our values for green water footprint are greater than those of ANA, but ANA values can be non-sensical in some cases (e.g. WF = 3 m3 tonne-1) which likely has to do with the linear relationship assumption.

Comparison against de Petrillo et al (2023)

Here, the WF were calculated for both green and blue WF for soy using a 5x5 arcmin gridded dataset. Results are available for the year 2004 and 2018 (we use 2018 as the closest to 2017).

## 
## Call:
## lm(formula = petrillo_validation$`uwf_green (m3/ton)` ~ petrillo_validation$soy_gwf_rock)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1378.1  -192.0   -44.3   111.1  3782.3 
## 
## Coefficients:
##                                   Estimate Std. Error t value
## (Intercept)                      725.89418   78.72692   9.220
## petrillo_validation$soy_gwf_rock   0.41942    0.04278   9.804
##                                             Pr(>|t|)    
## (Intercept)                      <0.0000000000000002 ***
## petrillo_validation$soy_gwf_rock <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 383.6 on 1017 degrees of freedom
##   (1256 observations deleted due to missingness)
## Multiple R-squared:  0.08635,    Adjusted R-squared:  0.08545 
## F-statistic: 96.12 on 1 and 1017 DF,  p-value: < 0.00000000000000022

Although we are comparing different years (2017 vs 2018), our results are overall greater than those of de Petrillo et al (2023).

Comparison against Mialyk et al (2024)

## 
## Call:
## lm(formula = mialyk_validation$WF_2010_2019 ~ mialyk_validation$soy_wf_state)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1075.5  -316.6    73.6   209.0  1352.7 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                    445.4269   185.5576   2.400   0.0241 *
## mialyk_validation$soy_wf_state   0.2829     0.1071   2.643   0.0140 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 496 on 25 degrees of freedom
## Multiple R-squared:  0.2183, Adjusted R-squared:  0.1871 
## F-statistic: 6.983 on 1 and 25 DF,  p-value: 0.01399

Diving into the details of the data

## # A tibble: 7 × 16
##   year  trase_id   cod_uf sigla_uf mun_name_c tonnes planted_ha harvest_ha yield
##   <chr> <chr>      <chr>  <chr>    <chr>       <dbl>      <dbl>      <dbl> <dbl>
## 1 2016  BR-2210623 22     PI       SEBASTIAO…     55        315        314 0.175
## 2 2017  BR-2922250 29     BA       MUQUEM DO…    110        433        433 0.254
## 3 2017  BR-5220686 52     GO       SIMOLANDIA    200        665        664 0.301
## 4 2014  BR-2209757 22     PI       SAO GONCA…     69        230        230 0.3  
## 5 2015  BR-3515350 35     SP       EUCLIDES …    135        450        450 0.3  
## 6 2016  BR-2209757 22     PI       SAO GONCA…    532       1430       1430 0.372
## 7 2014  BR-3129202 31     MG       HELIODORA      18         45         45 0.4  
## # ℹ 7 more variables: area_soy_pivot_mapb_ha <dbl>, ana_pivot_ha <dbl>,
## #   soy_wf_rock <dbl>, soy_gwf_rock <dbl>, soy_bwf_rock <dbl>,
## #   soy_gwf_ana <dbl>, soy_bwf_ana <dbl>

There are 20 estimates that exceed 5000 m3 tonne-1, and 7 entries with a WF > 10,000 m3 tonne-1

In all cases, yield < 1, which explains why the WF are so high, the highest WF was 23,478 m3 tonne-1 for a yield of 0.175 for 55 tonnes produced.

If we remove the WF > 10000 m3 tonne-1, then we remove 1119 tonnes of soy produced on 3568 ha in PI, BA, GO, SP, MG.

We can probably remove these outliers for graphs, making a note of it in the captions, but keeping the values in the calculations.

Conclusions

What most of these comparisons have in common:

The big issue with the ANA data is that there are key municipalities missing water use estimates and so the data is incomplete when comparing to IBGE (e.g. Formosa do Rio Preto in 2017).

Proposal: use the WF data obtained from the Rockström (2003) curve and remove outliers (WF > 10,000 m3 tonne-1 in the paper figures).

References

ANA. (2020). Uso da água na agricultural de sequeiro no Brasil (2013-2017).

De Petrillo et al (2023) International corporations trading Brazilian soy are keystone actors for water stewardship Communications Earth and Environment 87, 4(1), doi: 10.1038/s43247-023-00742-4.

Figueiredo et al (2021) Impact assessment of soybean yield and water productivity in Brazil due to climate change European Journal of Agronomy 129: 126329, doi: 10.1016/j.eja.2021.126329.

Mialyk et al (2024) Water footprints and crop water use of 175 individual crops for 1990-2019 simulated with a global crop model Scientific data, doi: 10.1038/s41597-024-03051-3.

Rockström (2003) Water for food and nature in drought–prone tropics: vapour shift in rain–fed agriculture Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences 358(1440): 1997-2009, doi: 10.1098/rstb.2003.1400

Zhang, Y. (2020). PML_V2 global evapotranspiration and gross primary production (2002.07-2019.08). National Tibetan Plateau Data Center, 10.11888/Geogra.tpdc.270251