Match trader names importers
View or edit on GitHub
This page is synchronized from trase/models/colombia/coffee/trader_names/match-trader-names-importers.ipynb. Last modified on 2025-12-13 00:30 CET by Trase Admin.
Please view or edit the original file there; changes should be reflected here after a midnight build (CET time),
or manually triggering it with a GitHub action (link).
from trase.tools.pcs import (
run_trader_names_matching_from_file,
run_trader_names_matching_from_db,
get_trader_names_dictionary_from_file,
)
import psycopg2
# psycopg2.__version__
Trader names matching tool
How to run through a batch of trader names from a file?
- Prepare a text file with one trader name per line; duplicates will be ignored, and the order matters, so priority names should be at the beginning of the file;
- Upload to folder
shared/trader-names-matching; - Replace value of
filenameargument inrun_trader_names_matching_from_filefunction by the name of the file; - Run cell;
- For each suggested match, select a number or press RETURN to reject suggestions;
How to run through a batch of trader names from a dataset already in the database?
- run function
run_trader_names_matching_from_db;
How does this work?
The function loops through trader names one by one, and looks for other names in the list that are similar. When a synonym is already known in the database or has already been identified by another user of this notebook, it is skipped. So although human input is required at every step, this ensures that no decision has to be made more than once. When no selection is made, the suggested matches are marked as "rejected", and will not be suggested again within the current session. However, these rejections are not persistent, so they will come up again the next time the notebook is run.
What happens to the matches I select?
Matches are written to s3 bucket s3://trase-traders/.
A member of the data team then runs function pcs.traders.digest.digest_trader_matches in Python, and either accepts, rejects, or defers decision on match suggestions in the bucket. This ensures that each match has been seen by at least 2 people before it is recorded in the database.
For accepted matches, the trader hierarchy in the database is modified by adding a new synonym to a trader.
The accepted matches will be visible in file s3://trase-results/DB_FILES/trader_labels.csv the next time the file is dumped from the database, within one day.
Choosing a source file to select name matches from
You are able to match trader names from two kinds of sources:
- Datasets already on the Trase database (which is useful to clean existing Trase data)
To do this, run the run_trader_names_matching_from_db function, and follow prompts to select which dataset you'd like to pull raw trader names from.
- From datasets you provide via a .csv file (useful for new contexts/data that may not be on the Trase database).
To do this, run the run_trader_names_matching_from_file function. This function requires that a path to a csv file containing raw trader names be passed as the first parameter.
You have three ways to respond to each match suggestion.
-
There is a match in the options. Example:
`OLAM INTERNATIONAL LIMITED 7 1: OLAM INTERNATIONAL LIMITED 2: OLAM INTERNATIONNAL LIMITED Select number, press 'r' to reject all suggestions, or press RETURN to skip...`Press the option number. In the example, 'OLAM INTERNATIONAL LIMITED 7' and 'OLAM INTERNATIONAL LIMITED' are a match, press 1.
-
Or there is no match for any of the options. Example:
`OLAM INTERNATIONAL LIMITED 7 1: OLMEDO PRINTING CORP 2. OLEMDO PRINTING Select number, press 'r' to reject all suggestions, or press RETURN to skip...`Press r, when none of the options match the target trader name. In the example, 'OLMEDO PRINTING CORP' or 'OLMEDO PRINTING' are not a match with 'OLAM INTERNATIONAL LIMITED 7'.
-
Or if you are in doubt, always press RETURN to skip. Example:
`OLAM INTERNATIONAL LIMITED 7 1: OL. INTERNATIONAL LIMITED Select number, press 'r' to reject all suggestions, or press RETURN to skip...`Press RETURN. In the example, you might not have or could not find enough information to match/reject the suggestion.
Getting a dictionary of clean trader names
Once the suggested synonyms have been digested, it is possible to get a dictionary of clean trader names from a file containing a list of trader names with function get_trader_names_dictionary_from_file(filename). The dictionary is written in a file with "results_" appended before the file name.
Sample Code
run_trader_names_matching_from_file("colombia_importers.txt")
run_trader_names_matching_from_db(ref_id=2816)