View or edit on GitHub
This page is synchronized from doc/Amazon-Web-Services.md. Last modified on 2025-12-09 00:30 CET by Trase Admin.
Please view or edit the original file there; changes should be reflected here after a midnight build (CET time),
or manually triggering it with a GitHub action (link).
Amazon Web Services (AWS)
We make heavy use of AWS services, in particular:
- Lambda (for trase.earth release)
- Lightsail (for Metabase)
- SageMaker
- RDS
- S3
- IAM
- Identity Center authentication (through Google Workspace Authentication)
Table of contents:
- Billing
- AWS Authentication through Google
- Creating an EC2 Worker
- Running commands with screen:
- How do I...?
Billing
We track billing in the cost explorer, and record in Technology Service Costs.
We categorise resources using the tag Trase resource group, which takes one of the following values:
| Value | Meaning |
|---|---|
DeforestationFree |
JupyterHub instance at https://deforestationfree.com |
Metabase |
Metabase instance at https://metabase.deforestationfree.com |
RDS personal instances |
|
RDS production instance |
|
Trase Worker |
EC2 instance for SSH'ing into |
s3 |
Anything relating to AWS S3 |
trase-earth |
All resources for https://supplychains.trase.earth and its subdomains |
AWS Authentication through Google
We have setup so that users having a @trase.earth account can use AWS resources. This is specially useful for
* Logging to the AWS Starter Portal, and through it access Sagemaker or the AWS Console
* Getting an AWS identity access token without requiring saving AWS keys, and so
* run aws cli commands
* run code and systems (Python boto3, R paws, duckdb, etc) that requires AWS credentials
The setup of this is explained in https://docs.aws.amazon.com/singlesignon/latest/userguide/gs-gwp.html .
Adding Identity Center users to a group so they can use Sagemaker
Although creating a @trase.earth user automatically creates an AWS Identity Center user, it doesn't automatically assign it to an Identity Center group. As Sagemaker requires that the AWS Identity Center users are part of the General group, this assignment has to be done manually for each new @trase.earth user. At the moment of this writing, this can only be done through the command line, as follows:
# List AWS Identity Center users
aws identitystore list-users --identity-store-id d-936744f563
# List groups (General group should be `6255b4e4-b001-700e-a109-bfeb69622705`)
aws identitystore list-groups --identity-store-id d-936744f563
# Add the user-id from the first step to the General group. For example:
aws identitystore create-group-membership \
--identity-store-id "d-936744f563" \
--group-id "6255b4e4-b001-700e-a109-bfeb69622705" \
--member-id '{"UserId":"12d59464-a0b1-7066-a7be-630775e40002"}'
Rotating AWS access token used by Google (1 year validity)
There is a Token that allows that when a @trase.earth user is created, a corresponding user in
AWS Identity Center is also automatically created. This token has a 1 year validity. AWS sends an email 90
days before expiration reminding to rotate the token (create a new one and set it again in Google Workspace).
The email contains instructions to update, which at the time of this writing (08/05/2025), are roughly the following:
- In the AWS Identity Center automatic provisioning, click
Generate token: https://eu-west-1.console.aws.amazon.com/singlesignon/identity/home?region=eu-west-1#/instances/680459c52db53115/settings/automatic-provisioning - Copy the token, and log in to
Google Admin console → Apps → Web and Mobile Apps → Amazon Web Services (SAML) → Autoprovisioning → Reauthorizeand paste the new token there.
After a couple of minutes, try to login to the AWS portal (https://trase-earth.awsapps.com/start#/) , to confirm its working well.
Current token will expire on 8/05/2026.
Creating an EC2 Worker
It is handy to have an EC2 worker that you can SSH into. Here we document the process of creating one from scratch.
Firstly, go to the AWS Web Console and launch an EC2 worker. Here are the recommended steps:
- Use the eu-west-1 region.
- Use the latest Ubuntu server image. You will be thankful for this when you install things later!
- Use one of the r5 generation, e.g. r5.xlarge. These have lots of memory.
- Use the Trase VPN and public subnet. Keep the root storage low (~8 GB) and attach an EBS volume for actual disk space. This makes it easier to swap out the EBS volume later. Tick the option for the EBS volume to be destroyed on termination.
- Create a security group where only your IP address is allowed to SSH into it. You will have to keep updating this security group whenever your IP address changes, but it's worth the security.
- Upload your own SSH key when you create the instance.
Now SSH in and install some things. First you might want to mount your drive:
lsblk # see that device is at e.g. /dev/nvme1n1
sudo file -s /dev/nvme1n1
sudo mkfs -t ext4 /dev/nvme1n1
sudo mkdir /data
sudo mount /dev/nvme1n1 /data/
sudo chown -R ubuntu /data/
Set it up to auto-mount on boot:
sudo lsblk -o +UUID # note down UUID
sudo cp /etc/fstab /etc/fstab.orig
echo 'UUID=the-uuid-from-previous-command /data xfs defaults,nofail 0 2' | sudo tee -a /etc/fstab
The ~/.cache/ folder can get pretty big, so symlink that:
mkdir /data/cache
ln -sf /data/cache/ ~/.cache
Now install a bunch of dependencies:
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt-get update
sudo apt-get --yes install postgresql-16 graphviz git build-essential software-properties-common
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt install python3.10 python3.10-dev
curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10
python3.10 -m pip install poetry
echo 'export PATH=$HOME/.poetry/bin:/home/ubuntu/.local/bin:/usr/cli/postgresql/12/bin/:$PATH' >> ~/.bashrc
source ~/.bashrc
hash -r
Set up git author and store credentials locally:
git config --global credential.helper store
git config --global user.name "John Smith"
git config --global user.email "johnsmith@users.noreply.github.com"
git config --global core.editor "vim"
Make a github token (https://github.com/settings/tokens/new) with repo scope and take note of it.
Clone the repo:
git clone https://github.com/sei-international/TRASE.git
Cloning into 'TRASE'...
Username for 'https://github.com': my_token
Password for 'https://my_username@github.com': x-oauth-basic
Install poetry:
poetry install
Set the virtual environment to be activated on login:
echo 'cd ~/TRASE && source "$(poetry env info --path)/bin/activate"' >> ~/.bashrc
Running commands with screen:
- Start a screen session:
screen,screen -L, orscreen -L -Logfile log_filename - Start a screen session:
screen my_command,screen -L my_command, orscreen -L -Logfile log_filename my_command - Start logging while in session: "Ctrl + a" then "Shift + h"
- Rename a logging file while in session: "Ctrl + a" then ":" then "logfile filename"
- Detach while in session: "Ctrl + a" followed by "Ctrl + d".
- Reattach:
screen -rorscreen -r screen_name_substring - List:
screen -ls
How do I...?
Add a new IAM user
- Decide on a username. There isn't a strict naming convention but it is usually first name, last name, and the institution, e.g. "AndersonImaflora", "IanCaldwell", "ValentinUCLouvain", or just "MarcosBarrozo_External".
- Create the user:
aws iam create-user --user-name JoeBloggs - If the user needs access to objects in S3, see Grant read/write access to objects in S3 below.
- Generate an access key
aws iam create-access-key --user-name JoeBloggs - Send the user the SecretAccessKey and AccessKeyId over Slack using the /secret command or DopplerShare online.
Grant read/write access to objects in S3
- Check if an IAM user already exists:
If not, see Add a new IAM user
aws iam list-users | grep -i joe - Check if a policy already exists for the prefix (directory) you want to give access to:
If the policy you need does not exist, you can use this script to create it.
aws iam list-policies | grep TraseStorageIndonesia | grep Arn # "Arn": "arn:aws:iam::614804060947:policy/S3TraseStorageIndonesiaReadOnly", # "Arn": "arn:aws:iam::614804060947:policy/S3TraseStorageIndonesiaReadWrite", # ...etc...$ python trase/admin/scripts/create_aws_s3_policy.py indonesia/shrimp [--write] Created policy TraseStorageIndonesiaShrimpReadOnly that grants read-only access to s3://trase-storage/indonesia/shrimp - Assign the policy to the user
aws iam attach-user-policy \ --user-name JoeBloggs \ --policy-arn arn:aws:iam::614804060947:policy/TraseStorageIndonesiaShrimpReadOnly
Back up the S3 bucket
Run the following command:
aws s3 sync \
--source-region eu-west-1 \
--region eu-west-1 \
s3://trase-storage \
s3://trase-storage-backup