Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from pickled blobs to JSON data #1786

Open
wants to merge 86 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 75 commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
a05fe5a
Database format 21: add JSON, remove pickle
dsblank Oct 10, 2024
a8ef265
Rename new column to json_data
dsblank Oct 10, 2024
97d3388
Read prev version
dsblank Oct 10, 2024
7497abd
Load old version
dsblank Oct 10, 2024
76d622e
Added to_dict, from_dict
dsblank Oct 11, 2024
c014c15
Refactor for upgrade uses
dsblank Oct 12, 2024
43ea2b2
Peoplemodel mostly working
dsblank Oct 12, 2024
1350f5c
Save new db 21 with JSON data field
dsblank Oct 13, 2024
e677833
Docstrings
dsblank Oct 13, 2024
7ac4f7b
Generic needs to handle both blob and json during upgrades
dsblank Oct 13, 2024
08869eb
name fixes
dsblank Oct 13, 2024
99b3d2b
black linting
dsblank Oct 13, 2024
a9da731
Removed unneeded properties on primary objects
dsblank Oct 13, 2024
bc5ac5b
Use a version of Nick's to/from json funcs
dsblank Oct 13, 2024
33928a9
Revert "Removed unneeded properties on primary objects"
dsblank Oct 14, 2024
3785e46
linting
dsblank Oct 14, 2024
b211640
WIP: eventmodel, and familymodel
dsblank Oct 14, 2024
21caaa2
Use column position in model
dsblank Oct 14, 2024
bcd7018
Refactor serializers to classes
dsblank Oct 27, 2024
b3c0540
People model converted
dsblank Oct 27, 2024
5a01af2
Family model converted
dsblank Oct 27, 2024
22fc4b8
Merge remote-tracking branch 'origin/master' into dsb/depickle
dsblank Oct 28, 2024
125d46d
Typo: return -> yield
dsblank Oct 28, 2024
f16a268
Protection for null dates in events
dsblank Oct 28, 2024
b59fb64
All remaining views converted
dsblank Oct 28, 2024
0489b5b
All remaining views converted
dsblank Oct 28, 2024
1085ced
No need to make treemodels serialize agnostic
dsblank Oct 29, 2024
65d84c8
Date positions are in dateval; ok
dsblank Oct 29, 2024
ae204b8
Import from XML working
dsblank Oct 29, 2024
fe2b561
Remove old commented code
dsblank Oct 29, 2024
c1c1804
Bug fixing
dsblank Oct 29, 2024
e4c0f2c
Linting
dsblank Oct 29, 2024
adda6d5
Linting
dsblank Oct 29, 2024
136d907
Bug fixes
dsblank Oct 29, 2024
ef5fc30
Updated libgedcom
dsblank Oct 29, 2024
73d2054
Got to be careful adding random values to objects
dsblank Oct 29, 2024
d3dfd4f
Fixing undo/redo
dsblank Oct 29, 2024
69f9bbb
Fixing undo/redo
dsblank Oct 29, 2024
dbeb864
Suss bug?
dsblank Oct 29, 2024
368e2ac
Temp fix before Nick's fix
dsblank Oct 29, 2024
d2c301a
Date JSON tweak
dsblank Oct 30, 2024
20c4781
Fix PAT/MAT ronymics
dsblank Oct 30, 2024
1d24d85
Merge branch 'master' into dsb/depickle
dsblank Oct 31, 2024
f14d776
Updated copyright statements for this PR
dsblank Oct 31, 2024
08d76dc
Revert "Updated copyright statements for this PR"
dsblank Oct 31, 2024
54fe9c1
Updated copyright statements for this PR
dsblank Oct 31, 2024
6f5bcda
Convert serializer methods to static methods; can be used as singleton
dsblank Nov 1, 2024
ef8297b
Don't add properties/methods to Gramps objects, unless they start with _
dsblank Nov 1, 2024
cb7ffbc
Simplify access
dsblank Nov 1, 2024
357584a
Update gramps/gui/views/treemodels/placemodel.py
dsblank Nov 1, 2024
641a05b
Leaving metadata as pickle for now
dsblank Nov 1, 2024
ac5db94
Renamed struct to dict
dsblank Nov 1, 2024
f834b50
Fix typo in from_dict docstring
dsblank Nov 1, 2024
5c8c643
Update gramps/gen/lib/serialize.py
dsblank Nov 2, 2024
7381347
Apply suggestions from code review
dsblank Nov 2, 2024
0e55d49
Removed COLUMN_ constants
dsblank Nov 4, 2024
9e6ea7e
Removed COLUMN_ constants
dsblank Nov 4, 2024
c3c3a03
Fixed broken test: couldn't replicate, so went with new results
dsblank Nov 4, 2024
e1a65fa
Fixed broken test: couldn't replicate, so went with new results
dsblank Nov 4, 2024
707a74b
Migrated metadata to JSON
dsblank Nov 12, 2024
be15f1e
Needed to move the JSON access test around
dsblank Nov 12, 2024
be97ff4
Linting
dsblank Nov 12, 2024
cea7468
Refine BSDDB
dsblank Nov 12, 2024
6221aed
Just use one query to test for json_data
dsblank Nov 13, 2024
ab75096
Typo fix: suffic -> suffix
dsblank Nov 13, 2024
1512e07
Refactor supports_json_access
dsblank Nov 14, 2024
f95d3c0
Refactor supports_json_access; linting
dsblank Nov 14, 2024
28b18a1
Update gramps/plugins/db/dbapi/sqlite.py
dsblank Nov 14, 2024
c3d6798
Update gramps/gen/db/generic.py
dsblank Nov 14, 2024
45cf406
Update gramps/plugins/db/dbapi/sqlite.py
dsblank Nov 14, 2024
3176463
Update gramps/plugins/db/dbapi/sqlite.py
dsblank Nov 14, 2024
91a242d
Change naming for json data functions
dsblank Nov 14, 2024
02c6115
Regular bug fix: citation date error
dsblank Nov 15, 2024
4e3fb7e
Added logging to serialize
dsblank Nov 15, 2024
6ab1985
Add protections on loggin
dsblank Nov 15, 2024
995d698
Update gramps/gen/db/generic.py
dsblank Nov 16, 2024
3cfa92e
First pass at new conversion
dsblank Nov 18, 2024
6d5f058
Fixes after mass conversion
dsblank Nov 18, 2024
647fd1e
Don't eat exception
dsblank Nov 18, 2024
915240b
Put entire set of changes in one transaction
dsblank Nov 18, 2024
2d2ab43
Added a couple of missing defaults: family.complete, media.thumb
dsblank Nov 18, 2024
ebbe6cf
A manual test script for validating conversion
dsblank Nov 18, 2024
abf6608
Rename
dsblank Nov 18, 2024
4915d83
table -> database
dsblank Nov 18, 2024
ba5fb5d
Moved conversion tools around a bit
dsblank Nov 19, 2024
a55b0ef
Moved conversion tools around a bit
dsblank Nov 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 58 additions & 15 deletions gramps/gen/db/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
#
# Copyright (C) 2015-2016 Gramps Development Team
# Copyright (C) 2016 Nick Hall
# Copyright (C) 2024 Doug Blank
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
Expand Down Expand Up @@ -57,6 +58,7 @@
Source,
Tag,
)
from ..lib.serialize import from_dict, BlobSerializer, JSONSerializer
from ..lib.genderstats import GenderStats
from ..lib.researcher import Researcher
from ..updatecallback import UpdateCallback
Expand Down Expand Up @@ -228,7 +230,7 @@ def _undo(self, update_history):
try:
self.db._txn_begin()
for record_id in subitems:
(key, trans_type, handle, old_data, _) = pickle.loads(
(key, trans_type, handle, old_data, x) = pickle.loads(
self.undodb[record_id]
)

Expand Down Expand Up @@ -389,7 +391,7 @@ class DbGeneric(DbWriteBase, DbReadBase, UpdateCallback, Callback):

__callback_map = {}

VERSION = (20, 0, 0)
VERSION = (21, 0, 0)

def __init__(self, directory=None):
DbReadBase.__init__(self)
Expand Down Expand Up @@ -621,6 +623,18 @@ def _initialize(self, directory, username, password):
"""
raise NotImplementedError

def upgrade_table_for_json_data(self, table_name):
"""
Overload this method to add JSON access
dsblank marked this conversation as resolved.
Show resolved Hide resolved
"""
raise NotImplementedError

def use_json_data(self):
"""
Overload this method to check if the database stores objects in JSON format
"""
raise NotImplementedError

def __check_readonly(self, name):
"""
Return True if we don't have read/write access to the database,
Expand Down Expand Up @@ -669,8 +683,17 @@ def load(
# run backend-specific code:
self._initialize(directory, username, password)

need_to_set_version = False
if not self._schema_exists():
self._create_schema()
need_to_set_version = True

if self.use_json_data():
self.set_serializer("json")
else:
self.set_serializer("blob")

if need_to_set_version:
self._set_metadata("version", str(self.VERSION[0]))

# Load metadata
Expand Down Expand Up @@ -900,6 +923,13 @@ def transaction_begin(self, transaction):
self.transaction = transaction
return transaction

def _get_metadata_keys(self):
"""
Get all of the metadata setting names from the
database.
"""
raise NotImplementedError

def _get_metadata(self, key, default=[]):
"""
Get an item from the database.
Expand Down Expand Up @@ -1355,7 +1385,8 @@ def _get_from_handle(self, obj_key, obj_class, handle):
raise HandleError("Handle is empty")
data = self._get_raw_data(obj_key, handle)
if data:
return obj_class.create(data)
return self.serializer.data_to_object(obj_class, data)

raise HandleError(f"Handle {handle} not found")

def get_event_from_handle(self, handle):
Expand Down Expand Up @@ -1396,39 +1427,39 @@ def get_tag_from_handle(self, handle):

def get_person_from_gramps_id(self, gramps_id):
data = self._get_raw_person_from_id_data(gramps_id)
return Person.create(data)
return self.serializer.data_to_object(Person, data)

def get_family_from_gramps_id(self, gramps_id):
data = self._get_raw_family_from_id_data(gramps_id)
return Family.create(data)
return self.serializer.data_to_object(Family, data)

def get_citation_from_gramps_id(self, gramps_id):
data = self._get_raw_citation_from_id_data(gramps_id)
return Citation.create(data)
return self.serializer.data_to_object(Citation, data)

def get_source_from_gramps_id(self, gramps_id):
data = self._get_raw_source_from_id_data(gramps_id)
return Source.create(data)
return self.serializer.data_to_object(Source, data)

def get_event_from_gramps_id(self, gramps_id):
data = self._get_raw_event_from_id_data(gramps_id)
return Event.create(data)
return self.serializer.data_to_object(Event, data)

def get_media_from_gramps_id(self, gramps_id):
data = self._get_raw_media_from_id_data(gramps_id)
return Media.create(data)
return self.serializer.data_to_object(Media, data)

def get_place_from_gramps_id(self, gramps_id):
data = self._get_raw_place_from_id_data(gramps_id)
return Place.create(data)
return self.serializer.data_to_object(Place, data)

def get_repository_from_gramps_id(self, gramps_id):
data = self._get_raw_repository_from_id_data(gramps_id)
return Repository.create(data)
return self.serializer.data_to_object(Repository, data)

def get_note_from_gramps_id(self, gramps_id):
data = self._get_raw_note_from_id_data(gramps_id)
return Note.create(data)
return self.serializer.data_to_object(Note, data)

################################################################
#
Expand Down Expand Up @@ -1629,7 +1660,7 @@ def _iter_objects(self, class_):
"""
cursor = self._get_table_func(class_.__name__, "cursor_func")
for data in cursor():
yield class_.create(data[1])
yield self.serializer.data_to_object(class_, data[1])

def iter_people(self):
return self._iter_objects(Person)
Expand Down Expand Up @@ -1744,7 +1775,7 @@ def _iter_raw_place_tree_data(self):

def _get_raw_data(self, obj_key, handle):
"""
Return raw (serialized and pickled) object from handle.
Return raw (serialized) object from handle.
"""
raise NotImplementedError

Expand Down Expand Up @@ -1935,7 +1966,7 @@ def commit_person(self, person, transaction, change_time=None):
old_data = self._commit_base(person, PERSON_KEY, transaction, change_time)

if old_data:
old_person = Person(old_data)
old_person = from_dict(old_data)
# Update gender statistics if necessary
if old_person.gender != person.gender or (
old_person.primary_name.first_name != person.primary_name.first_name
Expand Down Expand Up @@ -2663,6 +2694,7 @@ def _gramps_upgrade(self, version, directory, callback=None):
gramps_upgrade_18,
gramps_upgrade_19,
gramps_upgrade_20,
gramps_upgrade_21,
)

if version < 14:
Expand All @@ -2679,6 +2711,8 @@ def _gramps_upgrade(self, version, directory, callback=None):
gramps_upgrade_19(self)
if version < 20:
gramps_upgrade_20(self)
if version < 21:
gramps_upgrade_21(self)

self.rebuild_secondary(callback)
self.reindex_reference_map(callback)
Expand All @@ -2694,3 +2728,12 @@ def get_schema_version(self):
def set_schema_version(self, value):
"""set the current schema version"""
self._set_metadata("version", str(value))

def set_serializer(self, serializer_name):
"""
Set the serializer to 'blob' or 'json'
"""
if serializer_name == "blob":
self.serializer = BlobSerializer
elif serializer_name == "json":
self.serializer = JSONSerializer
48 changes: 48 additions & 0 deletions gramps/gen/db/upgrade.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
#
# Copyright (C) 2020-2016 Gramps Development Team
# Copyright (C) 2020 Paul Culley
# Copyright (C) 2024 Doug Blank
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
Expand Down Expand Up @@ -35,6 +36,8 @@
#
# ------------------------------------------------------------------------
from gramps.cli.clidbman import NAME_FILE
from gramps.gen.db.dbconst import CLASS_TO_KEY_MAP
from gramps.gen.lib.serialize import to_dict
from gramps.gen.lib import EventType, NameOriginType, Tag, MarkerType
from gramps.gen.utils.file import create_checksum
from gramps.gen.utils.id import create_id
Expand All @@ -58,6 +61,51 @@
LOG = logging.getLogger(".upgrade")


def gramps_upgrade_21(self):
"""
Add json_data field to tables.
"""
length = 0
for key in self._get_table_func():
count_func = self._get_table_func(key, "count_func")
length += count_func()

self.set_total(length)

# First, do metadata:
self.upgrade_table_for_json_data("metadata")
keys = self._get_metadata_keys()
for key in keys:
self.set_serializer("blob")
value = self._get_metadata(key, "not-found")
if value != "not-found":
self.set_serializer("json")
self._set_metadata(key, value)

self._txn_begin()
for table_name in self._get_table_func():
# For each table, alter the database in an appropriate way:
self.upgrade_table_for_json_data(table_name.lower())

get_obj_from_handle = self._get_table_func(table_name, "handle_func")
get_handles = self._get_table_func(table_name, "handles_func")
commit_func = self._get_table_func(table_name, "commit_func")
key = CLASS_TO_KEY_MAP[table_name]
for handle in get_handles():
# Initially, serializer must be set to blob:
self.set_serializer("blob")
obj = get_obj_from_handle(handle)
# Force the save in json_data:
raw = to_dict(obj)
self.set_serializer("json")
self._commit_raw(raw, key)
self.update()

self._txn_commit()
# Bump up database version. Separate transaction to save metadata.
self._set_metadata("version", 21)


def gramps_upgrade_20(self):
"""
Placeholder update.
Expand Down
Loading