View or edit on GitHub

This page is synchronized from trase/models/indonesia/shrimp/J - Visualising and Comparing Supply Chains.ipynb. Last modified on 2026-03-21 22:30 CET by Trase Admin. Please view or edit the original file there; changes should be reflected here after a midnight build (CET time), or manually triggering it with a GitHub action (link).

Visualising and Comparing Supply Chains

The SEI-PCS toolkit provides a few useful functions for visualising and comparing dataframes containing supply chain data. Either use this notebook as a tutorial to help you make your own notebook, or just replace the S3 paths with your own and use this notebook for your own purposes.

At the end of each day this notebook will be reset and your changes cleared: if you want your changes to persist, then make a copy of this notebook outside of the "= tutorials" folder.

Sankey

This wouldn't be a Trase toolkit if there wasn't support for a Sankey! For all of the options available for the Sankey chart run:

from trase.tools import sps

help(sps.sankey)

Alternatively you can visit http://docs.deforestationfree.com and navigate to the section on trase.tools.sps.sankey.

from trase.tools import sps

df = sps.get_pandas_df(
    "candyland/candy/sei_pcs/v1.0.0/SEIPCS_CANDYLAND_CANDY_2017.csv",
)

sps.sankey(
    df,
    "VOLUME_RAW",
    ["MUNICIPALITY_TRASE_ID_PROD", "BRANCH", "COUNTRY_OF_DESTINATION"],
)

Comparing two dataframes in a table

To produce a table of comparison values, use compare_dataframes_single. In this example we compare absolute error (that is, df_b - df_a). However there are many more functions available. To see all of the pre-build functions that are available run the following Python code:

from trase.tools import sps

help(sps.Compare)

Alternatively you can visit http://docs.deforestationfree.com and navigate to the section on trase.tools.sps.Compare.

df1.columns

from trase.tools import sps

df1 = sps.get_pandas_df(
    "indonesia/shrimp/sei_pcs/SEIPCS_INDONESIA_SHRIMP_2018.csv", dtype=str
).astype({"VOLUME_RAW": float})
df2 = sps.get_pandas_df(
    "indonesia/shrimp/sei_pcs/v1.0.0/SEIPCS_INDONESIA_SHRIMP_2018.csv", dtype=str
).astype({"VOLUME_RAW": float})

print(df1.columns)
print("Missing columns:", set(df1.columns) - set(df2.columns))
print("Extra columns:", set(df2.columns) - set(df1.columns))

for df in (df1, df2):
    for column in ("LVL1_GEOCODE_PROD", "LVL1_GEOCODE_LH"):
        df[column] = df[column].str.replace(".0", "")

sps.dumbbell_compare(
    df1,
    df2,
    "VOLUME_RAW",
    ["LVL1_GEOCODE_PROD"],
    "absolute_error",
    labels=["old", "new"],
    max_rows=None,
    yaxis_title="Province of production",
    xaxis_title="Volume",
)

fig = _

dir(list(fig.select_yaxes())[0])

Comparing two dataframes in a dumbbell plot

A dumbbell plot is a good way of visualising both magnitude and difference. For all of the options available for the dumbbell comparison plot run:

from trase.tools import sps

help(sps.dumbbell_compare)

Alternatively you can visit http://docs.deforestationfree.com and navigate to the section on trase.tools.sps.dumbbell_compare.

from trase.tools import sps

df1 = sps.get_pandas_df("argentina/soy/sei_pcs/v1.1.0/SEIPCS_ARGENTINA_SOY_2015.csv")
df2 = sps.get_pandas_df("argentina/soy/sei_pcs/v1.1.0/SEIPCS_ARGENTINA_SOY_2016.csv")

sps.dumbbell_compare(
    df1,
    df2,
    "vol_bean",
    ["country_of_destination"],
    "absolute_error",
    labels=["2015", "2016"],
)

Comparing two dataframes in a histogram plots

All of the visualisations above are great when you have only a few flows or nodes to compare, but how do you get an overview of the differences if you have hundreds or thousands?

The below chart compares the flows from two dataframes by subtracting the volume (vol_b - vol_a). It then plots the resulting volume differences in a histogram plot. Positive differences (i.e. the flows where vol_b > vol_a) are green, negative differences are purple, and the bin which crosses zero is grey.

In the example below you can see that for the vast majority of destination countries, the 2016 Argentinian soy trade data has between 84k less and 220k more volume than 2015.

For all of the options available for the histogram comparison plot run:

from trase.tools import sps

help(sps.histogram_compare)

Alternatively you can visit http://docs.deforestationfree.com and navigate to the section on trase.tools.sps.histogram_compare.

from trase.tools import sps

df1 = sps.get_pandas_df("argentina/soy/sei_pcs/v1.1.0/SEIPCS_ARGENTINA_SOY_2015.csv")
df2 = sps.get_pandas_df("argentina/soy/sei_pcs/v1.1.0/SEIPCS_ARGENTINA_SOY_2016.csv")

sps.histogram_compare(
    df1,
    df2,
    "vol_bean",
    ["country_of_destination"],
    "absolute_error",
)