Cattle production data

This script calculates cattle production (ton carcass) per municipality, per year.

This is calculated as the herd size (per muni/yr) * slaughter rate (per state/yr) * carcass weight (per state/yr)

Herd size data

From the PPM.

# Load PPM data
ppm <- s3read_using(
  object = "brazil/production/statistics/ibge/cattle/ppm/out/PPM_BOV_HERD_1974_2019.csv",
  FUN = read_delim,
  delim = ";",
  bucket = "trase-storage",
  col_types = cols(GEOCODE = col_character()),
  opts = c("check_region" = T)
) %>%
  mutate(TWO_DIGIT_CODE = str_sub(GEOCODE, start = 1, end = 2)) %>%
  left_join(state_dic, by = "TWO_DIGIT_CODE") %>%
  rename(STATE = STATE_NAME)

The slaughter rate

The slaughter (or offtake) rate is calculated from the number of cattle per state / the number cattle slaughtered per state.

The number cattle slaughtered per state is calculated in the following way:

We use Anualpec data on slaughtered head per state (2011-2019). We favour these over the IBGE slaughter survey, because they are more complete (they include all slaughterhouses in theory, including informal slaughter).
The Anualpec timeseries is extended to other years by calculating the mean difference between the Anualpec and IBGE slaughter statistics (which are available for more years) per state, and then correcting the IBGE slaughter statistics from pre-2011 (where Anualpec is not available).
These slaughter numbers are the number of animals slaughtered IN a state - animals may of course be slaughtered in a different state from where they are raised. We correct for this using SIGSIF data on cattle movements to SIF slaughterhouses, adding/subtracting the number of cattle head recorded as moving from/to each state. This misses inter-state movements to other (SIE/SIM) slaughterhouses, though these inter-state movements are likely to be an order of magnitude less common than movements to SIF slaughterhouses. The SIGSIF data were only available for 2015-2019; we correct previous years slaughter data, assuming the same proportion of slaughter movements are inter-state as in later years.
Finally, the slaughter number is divided by the annual state herd size to estimate the slaughter rate per state per year.
Carcass weights are calculated per state from IBGE statistics.
Finally, the herd size (PPM), slaughter rate (composite), and carcass weight (IBGE) are multiplied to estimate production.
The production is reported both on an annual (CW_PRODUCTION_TONS) and lifecycle (CW_PRODUCTION_TONS_5_YR) basis.

Slaughter data

# Load anualpec abate (slaughter) data
aa <- 
  s3read_using(
  object = "brazil/production/statistics/anualpec/out/ANUALPEC_BOV_ABATE_2019.csv",
  FUN = read_delim,
  delim = ";",
  bucket = "trase-storage",
  opts = c("check_region" = T)
)

NB I chose the Anualpec data because it is more complete than the IBGE trimestral survey, e.g. including estimates for informal slaughterhouses. This is shown below.

# Compare Anualpec & IBGE survey


# Load IBGE survey
ibge_abate <- 
  s3read_using(
    object = "brazil/production/statistics/ibge/cattle/carcass_weights/IN/tabela1092.csv",
    delim = ";",
    FUN = read_delim, 
    skip = 2,
    bucket = "trase-storage",
    opts = c("check_region" = T)
  ) %>%
  slice(-1) %>%
  janitor::clean_names("screaming_snake") %>%
  mutate(
    YEAR = as.numeric(str_sub(TRIMESTRE, start = 14, end = 17)),
    ANIMAIS_ABATIDOS_CABECAS = as.numeric(ANIMAIS_ABATIDOS_CABECAS)
  ) %>%
  mutate(STATE = str_trans(UNIDADE_DA_FEDERACAO)) %>%
  group_by(STATE, YEAR) %>%
  summarise(NUM_HEAD = sum(ANIMAIS_ABATIDOS_CABECAS, na.rm = T)) %>%
  ungroup() %>%
  filter(!is.na(YEAR))


# Plot comparison of the datasets
years_in_both <- intersect(aa$YEAR, ibge_abate$YEAR)
comp_ibge_aa <- bind_rows(aa, ibge_abate, .id = "DATA_SOURCE") %>%
  mutate(DATA_SOURCE = ifelse(DATA_SOURCE == "1", "ANUALPEC","IBGE")) %>%
  filter(YEAR %in% years_in_both) 
comp_ibge_aa %>%
  ggplot(aes(YEAR, NUM_HEAD/1e06, fill = DATA_SOURCE)) + 
  geom_bar(stat = "identity", position = "dodge") + 
  theme_bw() + 
  facet_wrap(~STATE, scales = "free_y", ncol = 7) + 
  theme(legend.position = c(0.93,0.08), 
        legend.title = element_blank(), 
        axis.title.x = element_blank()) + 
  scale_y_continuous(label = unit_format(unit = "M")) + 
  ylab("Number of cattle slaughtered") + 
  geom_hline(yintercept = 0) +
  scale_fill_manual(values = c("#0072B2", "#D55E00"))

We however want a longer slaughter time series of slaughter (covering 2006-2019, for SEI-PCS 2010-2019) than available from Anualpec (2011-2020 only).

We therefore interpolate the difference between the Anualpec and IBGE data, for those earlier years (pre-2011) where we don’t have Anualpec slaughter statistics.

First we plot the consistency of the difference (Anualpec vs IBGE) over time.

# Calculate the % difference between IBGE and Anualpec per state
perc_diff <- 
  comp_ibge_aa %>%
  select(-STATE_ABBR) %>%
  spread(DATA_SOURCE, NUM_HEAD) %>%
  mutate(PERC_DIFF = (ANUALPEC - IBGE) / IBGE * 100) 


# Visualise the difference
perc_diff %>% 
  ggplot(aes(STATE, PERC_DIFF, col = YEAR)) +
  geom_boxplot() +
  geom_point() + 
  theme_minimal() + 
  coord_flip()

There is small uncertainty within states, except for PE, PB, DF, CE, and AP. These states made up a cumulative 4% of slaughter (2011-2020), and therefore this uncertainty doesn’t affect the overall calculation much.

We therefore ‘correct’ the IBGE statistics using the mean difference between IBGE and Anualpec (2011-2019) to estimate the slaughter in earlier years where Anualpec is not available, but IBGE is.

For some state/years, the IBGE data were missing. In these cases we used the mean number of cattle per state (from IBGE) available for other years. Finally, for Amapa, no IBGE data were available; we therefore used the 2011 Anualpec slaughter numbers to approximate slaughter in previous years.

i.e. slaughter heads = ibge * correction factor.

# Interpolate the 'anualpec' data for previous years
# ... based on the relationship in years where both datasets exist.

# Identify the correction factor
corr_factor <- 
  perc_diff %>%
  filter(PERC_DIFF != Inf) %>%
  group_by(STATE) %>%
  summarise(CORRECTION_FACTOR = 1 + mean(PERC_DIFF, na.rm = T) / 100) %>%
  ungroup() 


# Identify the values to 'gap fill' 
# ... i.e. use Anualpec 2011 if there is no IBGE data
# Identify the data for Amapa (where there is no)
fill_ibge_gap <- 
  ibge_abate %>% 
  filter(NUM_HEAD != 0) %>%
  group_by(STATE) %>%
  summarise(NUM_HEAD_GAP_FILL = mean(NUM_HEAD, na.rm = T))
ap_2011 <- 
  aa %>% filter(STATE == "AMAPA", YEAR == 2011) %>%
  pull(NUM_HEAD)
aa_corrected <- full_join(aa, ibge_abate, by = c("STATE", "YEAR"), suffix = c("_ANUALPEC", "_IBGE")) %>%
  left_join(fill_ibge_gap, by = "STATE") %>%
  mutate(NUM_HEAD_IBGE = case_when(NUM_HEAD_IBGE == 0 ~ NUM_HEAD_GAP_FILL, 
                                       TRUE ~ NUM_HEAD_IBGE)) %>%
  left_join(corr_factor, by = "STATE") %>%
  mutate(NUM_HEAD_ANUALPEC = ifelse(is.na(NUM_HEAD_ANUALPEC), NUM_HEAD_IBGE * CORRECTION_FACTOR, NUM_HEAD_ANUALPEC),
         NUM_HEAD_ANUALPEC = case_when(STATE == "AMAPA" & YEAR < 2011 ~ ap_2011, 
                                       TRUE ~ NUM_HEAD_ANUALPEC)) %>%
  select(
    STATE, 
    YEAR, 
    NUM_HEAD = NUM_HEAD_ANUALPEC
  )
stopifnot(all(aa_corrected$NUM_HEAD != 0))

Below we check that these ‘corrected’ time series look reasonable:

aa_corrected %>% 
  ggplot(aes(YEAR, NUM_HEAD)) + 
  facet_wrap(~STATE) + 
  geom_line() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        axis.title.x = element_blank()) +
  ylab("Number of cattle slaughtered/year") +
  labs(title = "Slaughter time series, correcting for IBGE:Anualpec gap")

Next we correct for inter-state movements of cattle to slaughter (which are included in SIGSIF and the IBGE abate data), by including the inter-state slaughter numbers reported in SIGSIF.

# Calculate the proportion of SIF slaughter per state that is inter-state


# Load SIGSIF
sigsif <- s3read_using(
  object = "brazil/production/statistics/sigsif/out/SIGSIF.csv",
  FUN = read_delim,
  delim = ";",
  opts = c("check_region" = T), 
  bucket = "trase-storage"
)


# (A) slaughter per DESTINATION state where cattle originate from inter-state movements
sigsif_dest <- 
  sigsif %>%
  filter(TYPE == "CATTLE") %>%
  mutate(TWO_DIGIT_CODE = str_sub(GEOCODE, start = 1, end = 2)) %>%
  left_join(state_dic, by = "TWO_DIGIT_CODE") %>%
  rename(STATE_ORIGIN = STATE_ABBR) %>%
  select(-STATE_NAME) %>%
  mutate(INTER_STATE_MOVEMENT = ifelse(STATE_SLAUGHTER == STATE_ORIGIN, F, T)) %>%
  filter(INTER_STATE_MOVEMENT == T) %>%
  group_by(YEAR, STATE_SLAUGHTER) %>%
  summarise(NUM_HEAD_ARRIVING_FOR_SLAUGHTER = sum(QUANTITY)) %>%
  ungroup()


# (B) Number of SIF-slaughtered cattle per ORIGIN state, where cattle slaughtered inter-state
sigsif_orgn <- 
  sigsif %>%
  filter(TYPE == "CATTLE") %>%
  mutate(TWO_DIGIT_CODE = str_sub(GEOCODE, start = 1, end = 2)) %>%
  left_join(state_dic, by = "TWO_DIGIT_CODE") %>%
  rename(STATE_ORIGIN = STATE_ABBR) %>%
  select(-STATE_NAME) %>%
  mutate(INTER_STATE_MOVEMENT = ifelse(STATE_SLAUGHTER == STATE_ORIGIN, F, T)) %>%
  filter(INTER_STATE_MOVEMENT == T) %>%
  group_by(YEAR, STATE_ORIGIN) %>%
  summarise(NUM_HEAD_SENT_FOR_SLAUGHTER = sum(QUANTITY)) %>%
  ungroup()


# (C) Join these together
slaughter <-
  left_join(aa_corrected, state_dic, by = c("STATE" = "STATE_NAME")) %>%
  select(-TWO_DIGIT_CODE) %>%
  full_join(sigsif_dest, by = c("STATE_ABBR" = "STATE_SLAUGHTER", "YEAR")) %>%
  full_join(sigsif_orgn, by = c("STATE_ABBR" = "STATE_ORIGIN", "YEAR")) %>%
  mutate(
    NUM_HEAD = replace_na(NUM_HEAD, 0),
    NUM_HEAD_ARRIVING_FOR_SLAUGHTER = replace_na(NUM_HEAD_ARRIVING_FOR_SLAUGHTER, 0),
    NUM_HEAD_SENT_FOR_SLAUGHTER = replace_na(NUM_HEAD_SENT_FOR_SLAUGHTER, 0),
    NUM_HEAD_PER_STATE_OF_PRODUCTION = NUM_HEAD - NUM_HEAD_ARRIVING_FOR_SLAUGHTER + NUM_HEAD_SENT_FOR_SLAUGHTER
    )

We don’t however have SIGSIF data for all years (2015-onwards only), so below we interpolate the difference between the raw Anualpec slaughter data and our inter-state-movement-corrected data, to use for the other years:

# (D) Calculate the average difference of the anualpec abate and our corrected-estimate, per state
# ... this will be used for years where we don't have all data 
# ... i.e. pre-2015 for SIGSIF
slaughter %>%
  filter(YEAR %in% unique(sigsif_dest$YEAR)) %>%
  mutate(PERC_DIFF = NUM_HEAD_PER_STATE_OF_PRODUCTION / NUM_HEAD * 100) %>%
  ggplot(aes(STATE_ABBR, PERC_DIFF, col = YEAR)) + 
  geom_point() + 
  theme_bw() +
  labs(title = "Percentage difference between Anualpec & data corrected for inter-state movements",
       subtitle = ">100 = net sender of cattle, <100 = net recipient of cattle")

perc_diff <- slaughter %>%
  filter(YEAR %in% unique(sigsif_dest$YEAR)) %>%
  mutate(PERC_DIFF = NUM_HEAD_PER_STATE_OF_PRODUCTION / NUM_HEAD * 100) %>%
  group_by(STATE_ABBR) %>% 
  summarise(PERC_DIFF = mean(PERC_DIFF)) %>%
  ungroup()


# (E) Add the the missing years of 'corrected data' back in.
# ... ie estimate inter-state movements for 2011-2014
# ... based on the previous years' data per state
slaughter <- slaughter %>%
  left_join(perc_diff, by = c("STATE_ABBR")) %>%
  mutate(NUM_HEAD_PER_STATE_OF_PRODUCTION = ifelse(
    !(YEAR %in% unique(sigsif_dest$YEAR)), NUM_HEAD * (PERC_DIFF / 100), NUM_HEAD_PER_STATE_OF_PRODUCTION)) %>%
  select(
    YEAR, 
    STATE,
    NUM_HEAD_PER_STATE_OF_PRODUCTION
    )

Below we check that the final ‘slaughter per state of production’ time series looks reasonable. I think it does (i.e. no major spikes or outliers).

slaughter %>% 
  ggplot(aes(YEAR, NUM_HEAD_PER_STATE_OF_PRODUCTION)) + 
  facet_wrap(~STATE) + 
  geom_line() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        axis.title.x = element_blank()) +
  scale_y_continuous(label = unit_format(unit = "M", scale = 1e-6, sep = "")) +
  ylab("Number of cattle slaughtered\nper state of production")

Calculate offtake rate

This is the number of heads * the slaughter rate, per state - plotted below.

# Define SR
# ... note I limit the calculation to 2006 onwards
# ... since these are the production years used in SEI-PCS
sr_df <- 
  ppm %>%
  filter(YEAR >= 2006) %>%
  group_by(STATE, YEAR) %>%
  summarise(
    HERD_SIZE = sum(NUM_HEAD)
    ) %>%
  ungroup() %>%
  inner_join(slaughter, by = c("YEAR", "STATE")) %>%
  mutate(SR = NUM_HEAD_PER_STATE_OF_PRODUCTION / HERD_SIZE)


# Visualise data
sr_df %>% 
  group_by(STATE) %>%
  summarise(MEAN = mean(SR), 
            MEDIAN = median(SR), 
            MIN = min(SR),
            MAX = max(SR)) %>%
  ungroup() %>%
  mutate(OUTLIER = ifelse(MEDIAN > 0.251 & STATE != "SANTA CATARINA", T, F)) %>%
  ggplot(aes(STATE, MEDIAN, col = OUTLIER)) + 
  geom_point() + 
  geom_errorbar(aes(ymin = MIN, ymax = MAX), width = 0, size = 0.5) +
  scale_y_continuous(limits = c(0, 1) ) + 
  coord_flip() +
  ylab("Slaughter rate")

outlier_states <- 
  sr_df %>%
  group_by(STATE) %>%
  summarise(MEDIAN = median(SR)) %>%
  ungroup() %>%
  filter(MEDIAN > 0.251) %>%
  pull(STATE)
outlier_states <- outlier_states[outlier_states != "SANTA CATARINA"]
national_median <- median(sr_df$SR)

I don’t believe the values in blue - they are too high (i.e. > 0.25 and outside of the south: PE, AP, AM, BA), nor do I believe SP which is too low.

Below we correct for these outliers, and set them to the national median (0.1991434).

Note: I also considered using a time-invariant slaughter rate (i.e. one per state, not per state/year), but decided against it. Comparing the two methods (the alternative is commented out code), there is only a small difference, plus this way we keep more information and it allwos SRs to increase over time, as they do in some states (e.g. RS?), see below:

sr_df %>% 
  filter(YEAR > 2005, 
         !STATE %in% outlier_states ) %>% 
  ggplot(aes(YEAR, SR)) + 
  # facet_wrap(~STATE, scale = "free_y") +
  facet_wrap(~STATE) + 
  geom_line() + 
  theme_minimal() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Outlier states not shown.

# I don't believe the outlier values > 0.25 
# ... set these to nationwide average 
# I also increase the SP production
# ... based on IEA data: http://www.iea.sp.gov.br/out/TerTexto.php?codTexto=14514
# ... otherwise production is too low.
sr_df <- sr_df %>%
  mutate(SR = case_when(STATE == "SAO PAULO" ~ 0.4072,
                        STATE %in% outlier_states ~ national_median,
                        TRUE ~ SR))  %>%
  select(STATE, YEAR, SR)

# The commented out code would be to produce a time-invariant estimate of the SR
# ... i.e. one value per state, not per state/yr
# sr_df <- sr_df %>%
#   mutate(SR = case_when(STATE == "SAO PAULO" ~ 0.4072,
#                         STATE %in% outlier_states ~ national_median,
#                         TRUE ~ SR))  %>%
#   group_by(STATE) %>%
#   summarise(SR = median(SR)) %>% 
#   ungroup()

Carcass weights

# Carcass weights
# ... get mean per state/year (from the data which are originally trimestral)
cw <- 
  s3read_using(
    object = "brazil/production/statistics/ibge/cattle/carcass_weights/2020-10-27-STATE_CARCASS_WEIGHTS.csv",
    bucket = "trase-storage",
    FUN = read_delim,
    opts = c("check_region" = T),
    delim = ";"
    )

Calculate cattle production

Per municipality per year, including offal - the latter calculated at 6.3% of the live cattle weight (450kg), based on FAO conversion coefficients.

# Calculate production
# ... CW_PRODUCTION_TONS = carcass + offal
# ... offal figure (6.3% of 450 kg cattle comes from Brazil-specific FAO coefficients).
cp <-
  inner_join(ppm, sr_df, by = c("STATE", "YEAR")) %>%
  left_join(cw, by = c("STATE" = "STATE_NAME", "YEAR")) %>%
  filter(YEAR %in% unique(cw$YEAR)) %>%
  mutate(
    NUM_HEAD_SLAUGHTERED = NUM_HEAD * SR,
    CW_PRODUCTION_TONS = NUM_HEAD_SLAUGHTERED * CARCASS_WEIGHT_PER_ANIMAL_KG / 1000 + (NUM_HEAD_SLAUGHTERED * 0.45 * 0.063)
    ) %>%
  select(GEOCODE, STATE, YEAR, CW_PRODUCTION_TONS)

# The commented out code would be to produce a time-invariant estimate of the SR
# ... i.e. one value per state, not per state/yr
# cp <-
#   full_join(ppm, sr_df, by = "STATE") %>%
#   left_join(cw, by = c("STATE" = "STATE_NAME", "YEAR")) %>%
#   filter(YEAR %in% unique(cw$YEAR)) %>%
#   mutate(
#     NUM_HEAD_SLAUGHTERED = NUM_HEAD * SR,
#     CW_PRODUCTION_TONS = NUM_HEAD_SLAUGHTERED * CARCASS_WEIGHT_PER_ANIMAL_KG / 1000 + (NUM_HEAD_SLAUGHTERED * 0.45 * 0.063)
#     ) %>%
#   select(GEOCODE, STATE, YEAR, CW_PRODUCTION_TONS)
stopifnot(!any(is.na(cp$CW_PRODUCTION_TONS)))

For the year 2019, visualise the production per municipality. Note: the scale is obviously skewed by large municipalities.

# Load spatial data
munis <- 
  s3read_using(
    object = "brazil/spatial/BOUNDARIES/ibge/old/boundaries municipalities/MUNICIPALITIES_BRA.geojson",
    FUN = read_sf,
    bucket = "trase-storage",
    opts = c("check_region" = T)
  )
brazil <- st_union(munis)
munis_to_plot <- inner_join(munis,  cp %>% filter(YEAR == 2019), by = "GEOCODE")


# Visualise
ggplot() + 
  geom_sf(data = brazil, fill = 'lightgrey') + 
  geom_sf(data = munis_to_plot, aes(fill = CW_PRODUCTION_TONS), col = NA) + 
  theme(axis.text = element_blank()) +
  coord_sf(datum = NA) +
  scale_fill_viridis(option="magma",
                     name = bquote('Tons/yr'),
                     na.value="white",
                     direction = -1) +
  labs(title = "Municipal cattle production (carcass tons/year)",
        subtitle = "For 2019")

Cattle lifecycle calculation

Sum production across the 5-yr cattle lifecycle.

# For each year, calculate the liveweight production over the previous 5 years
cp_5yr <- 
  cp %>%
  # select(GEOCODE, YEAR, CW_PRODUCTION_TONS)  %>%
  group_by(GEOCODE) %>%
  arrange(YEAR) %>%
  mutate(CW_PRODUCTION_TONS_5_YR = 
           CW_PRODUCTION_TONS + 
           lag(CW_PRODUCTION_TONS, n = 1L) +
           lag(CW_PRODUCTION_TONS, n = 2L) + 
           lag(CW_PRODUCTION_TONS, n = 3L) + 
           lag(CW_PRODUCTION_TONS, n = 4L)) %>%
  ungroup()


# Check the lag worked OK
check_5_yr <- 
  cp_5yr %>%
  filter(GEOCODE == "1507300", 
         YEAR == 2017) %>%
  pull(CW_PRODUCTION_TONS_5_YR)
check_5_yr_manual <-
  cp_5yr %>%
  filter(GEOCODE == "1507300", 
         YEAR %in% c(2013:2017)) %>%
  pull(CW_PRODUCTION_TONS) %>%
  sum()
if(
  !isTRUE(
    all.equal(
      check_5_yr, 
      check_5_yr_manual,
      tolerance = 0.0001
    )
  )
){stop("Error in lag - go back and check")}

Visualise the 5-yr sum production data

# Visualise
munis_to_plot <- inner_join(munis, cp_5yr %>% filter(YEAR == 2019), by = "GEOCODE")
ggplot() + 
  geom_sf(data = brazil, fill = 'lightgrey') + 
  geom_sf(data = munis_to_plot, aes(fill = CW_PRODUCTION_TONS_5_YR), col = NA) + 
  theme_minimal() + 
  theme(axis.text = element_blank()) +
  coord_sf(datum = NA) +
  scale_fill_viridis(option="magma",
                     name = bquote('Tons/5yr lifecycle'),
                     na.value="white",
                     direction = -1) +
  labs(title = "Life-cycle cattle production (carcass tons/5-years)",
        subtitle = "For 2019")

Export data

# Export the 5-yr data
s3write_using(
  x = cp_5yr, 
  object = "brazil/production/statistics/ibge/beef/out/2020-10-27-CATTLE_PRODUCTION_5_YR.csv",
  bucket = "trase-storage",
  FUN = write_delim,
  opts = c("check_region" = T),
  delim = ";"
)

Comparison with previous data

Besides the fact that our time-series is now longer (not shown), the values are pretty similar - though the new estimates are notably higher in BA and lower in MT and MS.