Internals for Developers

Transactions in LSD

Since version 0.4 LSD implements transactions. Transactions are used to:

prevent database corruption in case of failed updates
provide a consistent view of the database to readers while updates are ongoing

Any operation that modifies table data or metadata can only be done from within a transaction. This is achieved using DB.transaction() context manager, that automatically commits on exit. For example:

import lsd, lsd.smf

db = lsd.DB("test") 
with db.transaction():
        db.create_table('ps1_det', lsd.smf.det_table_def)

As a part of the committing process, the table's neighbor cache will be automatically updated to keep it in a consistent state, as well as it's catalog (a list of which datafiles make up the table data).

Implementation

LSD implements transactions using a variant of the [http://en.wikipedia.org/wiki/Snapshot_isolation snapshot isolation] technique. Each LSD table has a 'snapshots' directory, with subdirectories storing snapshot data. Snapshots can either be opened or committed; a committed snapshot contains special file '.committed', as a marker of its state.

The data logically contained in the table consists of a union of contents of all committed snapshot directories, made from oldest to newest committed snapshot, where contents (files) of newer snapshots overwrite eponymous files from older ones. For example, imagine a table 'table1', with two snapshots, '0001' and '0002', containing the following:

table1/snapshots/0001/tablets/+0.5+0.5/T55555/main.h5  
table1/snapshots/0001/tablets/+0.5+0.5/T55556/main.h5
 
table1/snapshots/0002/tablets/+0.5+0.5/T55556/main.h5
table1/snapshots/0002/tablets/+0.5+0.5/T55557/main.h5

Logically, this table is equivalent to the one having:

table1/tablets/+0.5+0.5/T55555/main.h5  # file from 0001
table1/tablets/+0.5+0.5/T55556/main.h5  # file from 0002
table1/tablets/+0.5+0.5/T55557/main.h5  # file from 0002

LSD does this "directory merging" automatically, and caches the results for fast lookup in {{{catalog.pkl}}} files stored in each snapshot's directory. Also, actual "snapshot IDs" (the 0001 and 0002 in the example above) are times when the transaction was created, formatted as "YYYYMMDDHHmmss.ssssss".

As a consequence of this implementation:

Rolling back to an older snapshot can be achieved by removing directories containing newer snapshots. Actually, in principle the directories don't even have to be removed -- LSD just needs to be told to look for a specific snapshot -- but this is not implemented yet.
To read a given snapshots, all older snapshots must be present. You can view each snapshot as a "diff" between the current and previous state of the database, going back to the beginning; all diffs have to be present to construct the current state.
If anything goes wrong in a transaction, the snapshot directory created by the transaction will be left in the snapshots/ subdirectory, but won't have a '.committed' file, and therefore be ignored by LSD. They can be safely removed, either manually ({{{'rm -rf'}}}), or using {{{lsd-vacuum}}}.
Queries to the database don't see the data added by the current transaction; they see the database state as it was when the transaction was started. For example, if you have a table with 10 rows, begin a transaction, add or modify some rows and, without committing, query that table again, you will get the original 10 rows as a result. Only after you've called db.commit() will your queries begin returning the new data.
Upon commit, LSD will do the necessary housekeeping, including the updating of table catalogs ({{{catalog.pkl}}}), as well as intelligently updating the neighbor caches for the cells that were modified by the transaction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Internals for Developers

Transactions in LSD

Implementation

Clone this wiki locally