Skip to content

View or edit on GitHub

This page is synchronized from doc/Traders-User-Guide.md. Last modified on 2025-12-09 00:30 CET by Trase Admin. Please view or edit the original file there; changes should be reflected here after a midnight build (CET time), or manually triggering it with a GitHub action (link).

Traders - User guide

[!NOTE] There is a recorded presentation about the way that traders are stored in Trase under Google Drive > 2. Operations > Internal Webinars and Training > Internal Webinar Series > Trase speakers > Traders in Trase - Clement.mp4

1. Traders data model

1.1 Types of traders

The database contains a simplified version of trader legal hierarchies. Instead of the multi-level ownership structures of the real world, only 2 levels of traders are included: "trader" and "group", with groups being comprised of one or more traders, representing all the subsidiaries that fall under the same ultimate parent.

So for instance, the following legal ownership structure

 GROUP
  |  
  |--  SUBSIDIARY 1
        |
        |--  SUBSIDIARY 2

will be represented in the database as

 GROUP
  |  
  |--  SUBSIDIARY 1
  |--  SUBSIDIARY 2

I.e. the fact that the parent of subsidiary 2 is actually subsidiary 1 will not be captured, and instead the direct parent of those subsidiaries will be recorded as being the ultimate parent.

Note that the group name is not the official legal entity name, but a simplified name. For instance, "BUNGE" was picked as the group name for "BUNGE LIMITED".

1.2 Synonyms

Traders often have multiple names, due to alternate spellings, or even typos in the datasets. These names, or "labels", are attached to the trader they are referring to. Each trader must have one of its "labels" as default name, which is the name the trader will appear under in the data products.

 GROUP
  |  
  |-- SUBSIDIARY 1 DEFAULT NAME
  |     |
  |     |--     SUBSIDIARY 1 DEFAULT NAME
  |     |--     SUBSIDIARY 1 SYNONYM
  |
  |-- SUBSIDIARY 2 DEFAULT NAME
  |     |
  |     |--     SUBSIDIARY 2 DEFAULT NAME
  |     |--     SUBSIDIARY 2 SYNONYM
  |
  |-- GROUP
  |     |
  |     |--     GROUP

(Note that the default name for the group must also exist as a trader and a label in the trader tree)

2. Cleaning trader names

2.1 Finding synonyms

Multiple versions of a trader name can exist within the same datasets. Finding matches between rows of a dataset can be done with any string comparison tool (Python difflib.get_close_matches, Open Refine, etc.)

However, some synonyms may already be known to the database (especially when contexts for the same country have been previously studied). Therefore, there is great potential benefit in using the trader names matching tool connected to the database, and hosted on deforestationfree.com in the notebook shared/trader-names/match-trader-names.ipynb.

2.2 Deciding on the default trader name

Follow the guidelines: - Remove the business structure identifiers ("LLC", "SA", "LTDA", "PT", etc.). - Keep the subsidiary; for instance, "CARGILL AMERICAS PERU S R L" should not be cleaned into "CARGILL", but "CARGILL AMERICAS PERU".

2.2 Deciding on the default group name

For groups that already exist in other datasets, the existing group name should be used. Otherwise, a simplified name can be chosen.

2.3 Submitting trader names to the database

The synonyms of trader names in the dataset may not be known to the database, and need to be submitted for addition. This is done automatically when using the notebook on deforestationfree.com, but can also be done manually by putting a csv file in s3://trase-storage/{country}/{commodity}/.

The files should follow the general guidelines for csv files on s3 (UTF-8 encoding, semicolon separator, double quote quotation). The file should contain one column with the default trader name and one or more of the following columns: - trader label (i.e. synonym); - trader group.

Example 1: submitting synonyms

LABEL;TRADER
FRESH PAPAYAS LI INC.;FRESH PAPAYAS LIECHTENSTEIN
FRESH PAPAYAS LIECHTENSTEIN INC.;FRESH PAPAYAS LIECHTENSTEIN

Example 2: submitting trader group membership

TRADER;GROUP
FRESH PAPAYAS LIECHTENSTEIN;FRESH PAPAYAS INTERNATIONAL
FRESH PAPAYAS ANDORRA;FRESH PAPAYAS INTERNATIONAL

Example 3: submitting synonyms and groups

LABEL;TRADER;GROUP
FRESH PYAPAS LIECHTENSTEIN;FRESH PAPAYAS LIECHTENSTEIN;FRESH PAPAYAS INTERNATIONAL
FRSH PAPAYAS ANDORRA;FRESH PAPAYAS LIECHTENSTEIN;FRESH PAPAYAS INTERNATIONAL

2.4 Traders in datasets

Do not overwrite traders with your matches in the datasets; add a column for clean trader name.

For instance, if an input file has the following data:

EXPORTER;VOLUME;...
PAPAYAS INC.;21;...
PAPAYAS INC;32;...
PAPYAS INC;12000;...
PAPAYASINC;0.1;...

do not overwrite the exporter, but add a column for cleaned exporter names:

EXPORTER;EXPORTER_CLEAN;VOLUME;...
PAPAYAS INC.;PAPAYAS INC;21;...
PAPAYAS INC;PAPAYAS INC;32;...
PAPYAS INC;PAPAYAS INC;12000;...
PAPAYASINC;PAPAYAS INC;0.1;...

Adding Traders for New Contexts - summary

  • if you only want to make sure the new traders are in the database: nothing to do; they will be added at first ingest
  • if you already know some synonyms: add a file to s3 with columns LABEL;TRADER or LABEL;TRADER;GROUP and send the file key to a data team member for ingestion
  • if you want to search for synonyms within the list: go to deforestationfree notebook match-trader-names.ipynb and run run_trader_names_matching_from_file(filename) after uploading a simple csv with one label per line. Then let a data team member know that some new matches need to be digested