Silos Map V3
View or edit on GitHub
This page is synchronized from trase/data/brazil/logistics/silos/silos_map_v3/readme.md. Last modified on 2026-02-03 10:30 CET by Jason J. Benedict.
Please view or edit the original file there; changes should be reflected here after a midnight build (CET time),
or manually triggering it with a GitHub action (link).
Agricultural Silo Mapping Pipeline (v3)
This repository contains the end-to-end workflow for detecting, vectorizing, and validating agricultural silos across Latin America using Google Earth Engine (GEE), the Clay Foundation Model, AlphaEarth, and official country data. Detailed decription may be found here.
Project Structure
The pipeline is organized in the following sequential stages:
-
1_classify_image_alphaearth.py- Purpose: Generates probability maps using a Random Forest classifier within Google Earth Engine.
- Data Source: Uses
GOOGLE/SATELLITE_EMBEDDING/V1/ANNUALas input features, ~300 silo samples, and Random Forest classifier. - Output: Raster assets of classification and probability stored in GEE.
- Probabilities:
projects/trase-396112/assets/brazil/logistics/silos/probabilities - Binary image:
projects/trase-396112/assets/brazil/logistics/silos/classification
- Probabilities:
-
2_convert_to_centroid.py- Purpose: Converts high-probability pixels into discrete geospatial points.
- Process: Applies a probability threshold (default 70%), converts connected pixels into vectors, and calculates the geometric centroid of each candidate facility.
- Output: FeatureCollections of candidate points.
-
3_deduplicate_centroids.py- Purpose: Cleans the dataset by removing spatial overlaps and close locations.
- Process: Ensures that a single physical facility isn't counted multiple times due to overlapping satellite tiles.
-
4_validation_via_clay_embeddings.py- Purpose: High-precision filtering using Deep Learning to reduce false positives (e.g., distinguishing silos from other facility types).
- Model: Uses a Multi-Layer Perceptron (MLP) built in PyTorch (
SimpleNN). - Input: 1024-dimensional embeddings generated by the Clay Foundation Model.
- Threshold: A strict sigmoid threshold of 0.9 is applied to ensure high-confidence results.
-
5_enrich_with_sicarm_silos.py- Purpose: Bringing metadata information to the detected facilities, it's initially getting data from SICARM (Brazil's official system for warehouse registration).
-
6_enrich_ownership.py- Purpose: Enriches detected facilities with legal identity and ownership metadata.
- Process:
- Retrieves the Company Name and Address via the Google Places API.
- Identifies the Tax Number (CNPJ) using LLM-based research.
- Appends CNAE (Industry Classification) information from the Cadastro database and standardizes the final column schema.
- Model:
gemini-2.5-flash
-
7_consolidate.py- Purpose: It consolidates the final dataset of detected silos by integrating with official silo database (SICARM-CONAB)
- Output: S3 PATH -
brazil/logistics/silos/silo_map_v3/silos_consolidated_{country}_{year}_{version}.geojson
Prerequisites
- Google Earth Engine: Account and project authentication required.
- Clay Foundation Model:
- Install
extra-requirements.txt. - Download Clay model weights.
- Place weights at:
trase/data/brazil/logistics/silos/silos_map_v3/clay_finetuning/clay_foundation_model
- Install
Usage
To run the inference pipeline for a specific year and country:
```bash python 4_validation_via_clay_embeddings.py --year 2024 --country brazil --version 1