-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating, deleting and updating datasets #31
Comments
This is something I would love to have! Manually writing updating TOML feels hackish and unreproducible at the moment. The |
Love the questions being asked here, but I would add another related to
Should it data projects be made more transparent as well? While I know the functions
|
We have this — I guess it's just badly named:
Alternatively, we could make Current data REPL docs do mention this, and the
|
This works perfectly. My tired eyes / brain just looked right over it. Thanks for clarifying! |
We need some programmatic way to create datasets, to update their metadata and to delete them. Currently people need to manage this manually by writing TOML but clearly this isn't great.
API musings
One possibility is to overload the
dataset()
function itself with the ability to create a dataset. For example adding acreate=true
flag:Another idea would be to pass a verb along as a positional argument, such as
With
:read
being the default verb. This allows us to reuse the exporteddataset()
function for all dataset-related CRUD operations.But let's be honest this is little weird other than being economical with exported names. Perhaps I've been doing too much REST recently :-) Probably a better alternative would be to just have a function per operation:
update()
is a bit of an odd one out of these operations — what if you wanted to delete some metadata? I guess we could pass something likedescription=nothing
for deleting metadata items.Which data project?
When creating a dataset it needs to be created within "some" data project. Presumably this would be the topmost project in the data project stack, or within a provided project if the project is supplied as the first argument.
Data ownership
Creation — and especially deletion — brings up an additional problem: How do we distinguish between data which is "owned" by a data project (so that the data itself should be deleted when the dataset is removed from the project), vs data which is merely linked to?
For existing data referenced on the filesystem this is particularly relevant. We don't want
datasets()
to delete somebody's existing data which they're referring to. But neither do we wantDataSets.delete()
to leave unwanted data lying around.I think we should have an extra metadata key to distinguish between data which is managed-vs-linked-to by DataSets. Perhaps under the keys
linked
, ormanaged
or some such. (Should this go within thestorage
section or not?)The text was updated successfully, but these errors were encountered: