Skip to content

In

View or edit on GitHub

This page is synchronized from trase/data/brazil/beef/indicators/in/README.md. Last modified on 2025-12-15 23:00 CET by Trase Admin. Please view or edit the original file there; changes should be reflected here after a midnight build (CET time), or manually triggering it with a GitHub action (link).

Search Deforestation Commitment Agent

This Python script implements an agent-based pipeline to automatically extract deforestation commitments from company reports and online sources. It combines Grounded Google Search and PDF RAG (Retrieval-Augmented Generation) with LangChain to gather, parse, and structure data about deforestation commitments for major beef companies in Brazil.


Features

  • Searches for PDF sustainability reports and relevant web sources.
  • Downloads and parses PDFs to build a FAISS vector index.
  • Queries PDFs and online sources to extract structured deforestation commitments.
  • Supports multiple companies in a single run.
  • Outputs results in both JSON and Parquet formats for downstream analysis.
  • Uses a ReAct agent to decide dynamically which tools to use for each query.

Requirements

  • It requires some extra libraries such as:
langchain
langchain_google_genai
google-genai
pdfplumber
polars
httpx
python-dotenv

Environment Setup

  1. Install dependencies:
pip install -r extra-requirements.txt
  1. Create a .env file in the project root:
    GOOGLE_API_KEY=your_google_api_key_here
    
  2. Optional (to prevent OpenMP runtime conflicts):
    os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
    

Usage

trase/data/brazil/beef/indicators/in/br_beef_search_zdc.py
The script will:

  1. Initialize the SearchDeforestationCommitmentAgent.
  2. Iterate through a predefined list of company names.
  3. Search for deforestation commitments using both PDFs and grounded Google search.
  4. Clean and parse the agent’s output into structured JSON.
  5. Save the results to:
    brazil/beef/indicators/in/br_beef_search_zdc_output_v1