QA code

This folder contains the QA phase 2 and 3 code for the beef model. The jupyter notebook codes output the html files that are used to do the QA of the model. The html files are stored on github

trase/data/brazil/beef/sei_pcs/qa_phase2_%%_plots.html
````

The amazon s3 data is used to produce the general and specific plots. The Postgres data is used to produce the compare versions plots. This is good to have in mind to know which data to download in the workflow section.

## Structure and files

├── imports
│ ├── constants.py # Constants used in the QA of the beef model, ie, hs6 codes, dataframe column names, etc.
│ ├── functions.py # The auxiliary functions used to read the files, process the dataframe for the plots, etc.
│ ├── layout.py # Colors used for the layout of the plots.
│ ├── plots_general.py # All the general plots used to QA beef model.
│ ├── plots_specific.py # All the specific plots used to QA beef model.
│ ├── preprocess.py # Functions to preprocess the dataframes used on the QA.
│ └── reader.py # Functions to read the files in S3, Trase database and local storages. ├── loader.py # The Python file that downloads the databases used in QA. ├── qa_phase2_general_plots.ipynb # The Python file that produces the general html file. ├── qa_phase2_specific_plots.ipynb # The Python file that produces the specific html file. ├── qa_phase2_compare_versions.ipynb # The Python file that produces the compared versions html file. └── qa_phase3_plots.ipynb # The Python file that produces the plots for phase3 QA. ```

Workflow

The workflow consists in the following steps:

Go to imports/constants.py and define the PATH variable that the data will be downloaded and consumed. The default path creates a folder downloads inside qa_beef
Run loader.py with arguments to download data from S3 and Postgres databases. You can run python loader.py --help to check available arguments. As mentioned in the first section of this file, amazon s3 data is used to produce the general and specific plots. Postgres data is used to produce the version comparison charts. Here are some examples to download data depending on your goals:
1. You can download all the data required to generate all the QA files with the following command: python loader.py --all True
2. If you want to download just the data from S3, you can run python loader.py --all_s3 True
3. If you want to download just the data from Postgres, you can run python loader.py --all_db True
4. If you want to download expecific dataframes, for example, SEI_PCS version 2.2 and MDIC port you can run python loader.py --sei_pcs True --mdic_port True
Downloading all the data usually takes 35-45 minutes and requires 10GB of memory.

Important: If there is any error downloading the data, most likely the error comes from changing the path or file name or changing the name of the columns in the dataframe. To deal with this it is good to go to amazon s3 and see the modified date of the file. If it was recently, then probably something happened to the file name or columns in the file. If the filename or path on amazon s3 has changed, you can update the path in the imports/reader.py file. If the columns in the file have changed, you can change the column name in the imports/columns.py file.
After downloading the data from the s3 database, run the jupyter notebook file qa_phase2_general_plots.ipynb to view the general plots. This notebook imports the function in reader.py that loads the downloaded data, and imports the functions in plots_general.py that will produce and display the plots. After running the notebook, you can export it as an html file. This will be saved locally on your computer.
Run the jupyter notebook file qa_phase2_specific_plots.ipynb to view the specific plots. This notebook imports the function in reader.py that loads the downloaded data, and imports the functions in plots_specific.py that will produce and display the plots. After running the notebook, you can export it as an html file. This will be saved locally on your computer.
After downloading the data from the Postgres database, you can run the jupyter notebook file qa_phase2_compare_versions.ipynb to view the version comparison graphs. This notebook imports the function in reader.py that loads the downloaded data, and imports the functions in plots_general.py and plots_specific.py that will produce and display the graphs. After running the notebook, you can export it as an html file. This will be saved locally on your computer.
Save the html files in Trase GitHub path ```bash trase/data/brazil/beef/sei_pcs/ ````

QA scripts workflow

imports

qa_phase2_compare_versions

qa_phase2_general_plots

qa_phase2_specific_plots

qa_phase3

Archive

QA code

Workflow