Skip to content

jpl-vipre/vipre-data

Repository files navigation

VIPRE Data

This project will handle data management for the VIPRE Application, including data models, ingestion scripts, and any ad-hoc processing/exploration that needs to be performed.

Contents

.
├── README.md                       This file
├── data/                           Collection of seed data for development and testing
├── docs/                           Documentation for the design and development of this application
│     ├── ERD.drawio                Entity Relationship Diagram capturing the data model's relationships
│     ├── ERD.png
│     ├── backend_service.md        Notes on the design of the backend data service
│     ├── draft_er.png              Archive
│     ├── example_tables.png        Example of some data tables in spreadsheet format
│     ├── old_data_dictionary.md    Archive
│     └── sample_table_data.xlsx    Example of some data tables in spreadsheet format
├── notebooks/                      Notebooks for ad-hoc processing or data exploration
├── pyproject.toml                  Configuration for the project - managed by `poetry`
├── schemas/                        Data schemas in static text files
│     ├── vipre_schema-*.csv        Exported from a shared google sheet where the data model was developed
│     └── vipre_schema-*.json       Converted from csvs to provide easily parsable schema to various consumers
├── scripts/
│     └── make_schemas.py           Converts the csv files found in schemas/ to the json format with metadata
├── src/                            Source code for the application
│     ├── alembic/                  Configuration and versions for database migrations - managed by `alembic` 
│     ├── alembic.ini               Configuration for the alembic tool
│     ├── app/                      FastAPI Python application
│     │     ├── dependencies.py     Global dependencies used across the API
│     │     ├── main.py             Main FastAPI application and top-level utility routes
│     │     ├── routers/            Collection of routers that service the primary application routes 
│     │     └── schemas.py          Schema objects that define request/response models
│     ├── database.db               SQLite database for development and testing
│     ├── init-db.sql               SQL schema export from the sqlalchemy/alembic managed database; kept for portability
│     └── sql/
│         ├── database.py           Manages database connections and utilities
│         └── models.py             Defines the data models using the sqlalchemy ORM 

Running the Application

This project uses poetry for dependency and environment management. See the Poetry Introduction docs to learn more.

To get started run:

poetry install

Once your environment has been initialized with the proper packages, you can bring up the API REST server with:

cd src/
poetry run uvicorn --reload --log-level debug app.main:app

Building for Distribution

This project uses two separate build tools for generating the distribution files for unix and Windows systems. First, pex was explored for generating a standalone executable file. This is a single file which can be passed around and run independently of location/path on system. It was later discovered that pex cannot build Windows exe files and thus a new tool was also incorporated: pyinstaller. Even so, the Windows executable still must be built on a Windows device which is currently done manually via VirtualBox.

Building with PyInstaller

See the Docs

PyInstaller also must be installed in the active environment (it is already included in the pyproject.toml). The build command relies on a spec file which was first generated by PyInstaller and then modified after. Once this spec file is generated and present (it is version controlled with this repo), builds can be rolled out quite simply.

To generate the server.spec file that controls subsequent builds, run:

pyinstaller server.py --collect-all vipre_data

To create a new build, run:

pyinstaller server.spec -y

This will by default create a dist/server/ folder with all the libraries and dependencies needed to run dist/server/server.exe. This entire folder can be zipped and distributed for local execution.

Note, the above commands assume an active environment. These are typically run on the Windows builder where poetry has been swapped out for a simpler python-venv based environment (python -m venv venv)

PyInstaller requires a framework build of Python; if you do not have this (PyInstaller will emit errors), you can build it with pyenv. Make sure you are running the latest version of pyenv or you may run into build errors.

Detailed Windows build instructions Execute the following from Git-Bash on VirtualBox:
cd Documents/vipre-data
git pull
python -m venv ./venv
. venv/Scripts/activate
# make sure that uvloop is not included in the requirements.txt file
pip install -r requirements.txt
pip install pyinstaller
pip install -e .
pyinstaller -y vipre-data.spec

Interacting with the Database

The database is managed by a collection of tools. The schemas are defined in vipre-schemas which are used to write the sqlalchemy ORM models. These models are in turn read by the alembic migration tool which creates migrations in alembic/versions/ to reflect the changes to those models. alembic is also used to connect to the database and execute those migrations when appropriate.

Interaction with the database relies on an initialized and active python environment

poetry install
poetry shell

For more information on these tools see the following docs:

Currently, the MatLab process (vipre-gen) performs all writes to the database and thus frequently needs to create the database as well. This is easily done by exporting a sql init script. This can be done directly with alembic by running alembic upgrade head --sql for an initial migration, or by connecting to the database with sqlite and running .schema. Examples are shown below.

NOTE: be sure to check the autogenerated revision file before executing the upgrade command

Dump the current database schema (whatever is in database.db)

alembic revision --autogenerate "[revision message]"
alembic upgrade head
sqlite3 database.db ".schema" > init-db.sql

Use alembic to generate an initial schema and a corresponding init-db.sql file

alembic revision --autogenerate "Generate schema"
alembic upgrade head --sql > init-db.sql

Use alembic to generate a migration, upgrade the current database, and output the schema to an init-db.sql file

alembic revision --autogenerate "[revision message]"
alembic upgrade head
sqlite3 database.db ".schema" > init-db.sql

NOTE: Since sqlite is being used at this phase, migrations are not all that important and it is sometimes easier to wipe the database and revision history to start from scratch. This can be done with the following:

cd src
rm -rf alembic/versions/*
rm -f database.db
alembic revision --autogenerate "Generate schema"
alembic upgrade head --sql > init-db.sql

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages