In
View or edit on GitHub
This page is synchronized from trase/data/brazil/beef/indicators/in/README.md. Last modified on 2025-12-15 23:00 CET by Trase Admin.
Please view or edit the original file there; changes should be reflected here after a midnight build (CET time),
or manually triggering it with a GitHub action (link).
Search Deforestation Commitment Agent
This Python script implements an agent-based pipeline to automatically extract deforestation commitments from company reports and online sources. It combines Grounded Google Search and PDF RAG (Retrieval-Augmented Generation) with LangChain to gather, parse, and structure data about deforestation commitments for major beef companies in Brazil.
Features
- Searches for PDF sustainability reports and relevant web sources.
- Downloads and parses PDFs to build a FAISS vector index.
- Queries PDFs and online sources to extract structured deforestation commitments.
- Supports multiple companies in a single run.
- Outputs results in both JSON and Parquet formats for downstream analysis.
- Uses a ReAct agent to decide dynamically which tools to use for each query.
Requirements
- It requires some extra libraries such as:
langchain
langchain_google_genai
google-genai
pdfplumber
polars
httpx
python-dotenv
Environment Setup
- Install dependencies:
pip install -r extra-requirements.txt
- Create a .env file in the project root:
GOOGLE_API_KEY=your_google_api_key_here - Optional (to prevent OpenMP runtime conflicts):
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
Usage
trase/data/brazil/beef/indicators/in/br_beef_search_zdc.py
- Initialize the SearchDeforestationCommitmentAgent.
- Iterate through a predefined list of company names.
- Search for deforestation commitments using both PDFs and grounded Google search.
- Clean and parse the agent’s output into structured JSON.
- Save the results to:
brazil/beef/indicators/in/br_beef_search_zdc_output_v1