best practices for managing large set of alembic files #1259

indiVar0508 · 2023-06-12T03:18:45Z

indiVar0508
Jun 12, 2023

Hi ,

I had a sort of best practices question on how to manage alembic files generated for your application for long time, like currently in application in which i am working it almost now has ~400+ migration files and it takes some time (7-20 mins DB to DB) to complete migrations where schema changes + data migrations are also handled.
I wanted to know like if there is some industry standard way to squash these migrations or elegantly handle data migration and schema migration separately, like basically i want to reduce the migration time that it takes to upgrade a DB from scratch.

Thanks!

zzzeek · 2023-06-13T12:57:41Z

zzzeek
Jun 13, 2023
Maintainer

i would do this:

say you have revisions A through J and you want to squash so that you will end up with revisions G through J.
check out the source tree at revision G, where your models have "revision G"
make a temporary alembic environment and do an autogenerate against a blank database. Now you have files that will start a migration at G.
copy those new alembic files and splice them into your main tree; that is, remove revision files A->G, put your new "G" in there, and change the revision ids for the new "G" to be the same as the old "G"

that's how to get a squashed file. not any kind of "industry standard", just something I came up with.

for data migrations it sort of depends on what you are doing. A lot of data migrations I would have as scripts that are outside of the revision tree entirely, like the kind that fully read all the data from an old schema structure and just persist into a new schema structure. Other migrations are like INSERTs of some lookup records, etc. I dont think there's one way to approach that. if you squash your basemost migrations then you would also design some new data migration scripts that do the upfront work more efficiently.

0 replies

dangbert · 2024-11-08T00:55:15Z

dangbert
Nov 8, 2024

I know this is old but I'll document the commands I ran for my approach based on @zzzeek 's comment

# first verify current git state generates no alembic revision
alembic revision --autogenerate -m "test_revision"
# ^verify above is empty then delete it

# write down current revision ID
alembic current

# delete all previous migrations
git rm -f migrations/versions/*.py

# NOW I renamed my dev database so it didn't "exist anymore"
#alter database example rename to example_bak; # psql command
# then created a new empty database for the project

# create new "squashed" migration
alembic revision --autogenerate -m "init db"

# now edit the revision, renaming the ID to the previously saved one
# and rename the filename as well to reflect the old revision ID

# final test:
alembic upgrade head # should do nothing

# lastly I deleted my temporary database, renamed my old database and ran the above command again to be sure

I ended up squashing because I updated some dependencies (notably sqlalchemy and psycopg2 -> psycopg (3) and my old migrations could no longer be stepped through on an empty database scratch without refactoring.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

best practices for managing large set of alembic files #1259

{{title}}

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

best practices for managing large set of alembic files #1259

indiVar0508 Jun 12, 2023

Replies: 2 comments

zzzeek Jun 13, 2023 Maintainer

dangbert Nov 8, 2024

indiVar0508
Jun 12, 2023

zzzeek
Jun 13, 2023
Maintainer

dangbert
Nov 8, 2024