Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data access description example #170

Merged
merged 6 commits into from
Apr 16, 2024
Merged

Data access description example #170

merged 6 commits into from
Apr 16, 2024

Conversation

mih
Copy link
Contributor

@mih mih commented Apr 15, 2024

Major changes are:

  • more useful and accurate distribution schema description
  • more complete DataService class
  • new Parameter concept
  • examples that document the capabilities

While most changes more-or-less reflect the continued adoption of DCAT
concepts and properties, the introduction of Parameter is noteworthy.

Parameter is a variant of Property and serves a similar purpose
(declare arbitrary additional aspects without prescribing a vocabulary
to do so) with only a change in semantics of the class itself. In
contrast to Property (observed or measured, fixed), Parameter is a
variable with impact on a system or function.

Closes #171

Via the property has_parameter particular parameters can be declared
as supported/needed (e.g., DataService), or provided
(QualifiedAccess). Examples are included.

QualifiedAccess is no longer derived from EntityInfluence -- it has
been too much of a stretch. It is now focused on access, and no longer
requires a role specification.

Closes #156

This is using a fictitious installation of the RWTH RDM system Coscine
as an example. Most needed aspects can be mapped onto properties that
are provided by DCAT. Missing pieces are:

1. Specification of full details of how to perform an actual download request
2. Required authentication

Specifically `DCAT:endpointURL` is just "root location or primary
endpoint of the service", but most APIs offer more than a download API,
hence need some further specification. Coscine uses a particular URL
path for its "get blob" API, which needs to be parameterized with three
parameters. These parameters needs to be put into a qualified-relation
(Distribution-DataService).

The approach via `endpoint_path` and `access_parameter` is arbitrary. I
could not find a common pattern or vocabulary to do this. An equallly
good (from my POV) alternative would be:

- `DataService.download_url_template` -- with some convention to put
  placeholders into a download URL
- `QualifiedAccess.download_url_parameters` with a list of name/value
  pairs

Yet another alternative would be to encode all three shown parameters
into a single "identifier", and call that `access_id` (matching watch
the annex access example is doing). The disadvantage for that would be
that the knowledge of the individual parameters that make up the
compound identifier is lost. If preserved, however, a single-point-fix
of `endpoint_path` could fix up data access in case of an API version
update that does not change the meaning of the identifiers.

Theoretically, we would also need to add information on authentication,
but this is also a vast space of possibilities. A system implementing a
download based on this schema is given a `contact_point`, and also a
dataservice description to be able to present a user with a meaningful
request for entering a credential, at least.
@mslw
Copy link

mslw commented Apr 15, 2024

My initial instinct is that DCAT:endpointURL (link) is a (general) property of a Data Service which provides the entire dataset, and so would be endpoint_path / download_url_template. On the other hand, access_parameter / download_url_parameters is a Distribution property which we would assign to a file (blob). Put together, they would form a DCAT:downloadURL (link)

I agree with your description of pros/cons of having a single identifier (access_id) vs list of parameters. Making the template & parameters explicit sounds appealing in the context of URL updates (as long as the service operates according to such template), and reminds me of the uncurl remote. Though I would wonder, whether to include also DCAT:downloadURL (for straightforward access), or to leave construction of such URL to any implementation that wants to use the data model (easier to account for template updates).

Whether it's named endpoint_path & access_parameter or download_url_template & download_url_parameters makes little difference to me. I'd lean slightly towards the latter, because it suggests that template formatting is involved, and it relates to the DCAT:downloadURL.

@jsheunis
Copy link
Contributor

Have you seen https://www.hydra-cg.com/spec/latest/core/ ? I came across it several times in my search for schema-drive UI generation.

It has several terms related to some aspects of this PR, such as templated links and discovering an API endpoint. I haven't inspected it in detail, though, so not sure if it is actually practically useful. Perhaps worth a quick scan though.

mih added 2 commits April 16, 2024 10:35
The (more or less arbitrary) definition is

- a property is something that is observed or measured, something that
  is a fact, given a concrete thing
- a characteristic is the underlying, more generic concept

This is done to be able to distinguish a `Parameter` from a `Property`
later on (or in another schema). A `Parameter` being a more
causal/functional concept, something that is variable and impacts
a system or function.

The reason to base both concepts on the same foundation is the
similarity with respect to the schema design principles. We need to be
able to express parameters without prescribing a particular vocabulary.
Employing the same approach twice in identical fashion (only changing
the semantics) avoids needless complexity.
So far it was only in `Distribution`, but we also need it for services
and processes, etc.
@mih mih force-pushed the ddist branch 4 times, most recently from da0483a to 455d2f3 Compare April 16, 2024 12:30
Major changes are:

- more useful and accurate `distribution` schema description
- more complete `DataService` class
- new `Parameter` concept
- examples that document the capabilities

While most changes more-or-less reflect the continued adoption of `DCAT`
concepts and properties, the introduction of `Parameter` is noteworthy.

`Parameter` is a variant of `Property` and serves a similar purpose
(declare arbitrary additional aspects without prescribing a vocabulary
to do so) with only a change in semantics of the class itself. In
contrast to `Property` (observed or measured, fixed), `Parameter` is a
variable with impact on a system or function.

Closes #171

Via the property `has_parameter` particular parameters can be declared
as supported/needed (e.g., `DataService`), or provided
(`QualifiedAccess`). Examples are included.

`QualifiedAccess` is no longer derived from `EntityInfluence` -- it has
been too much of a stretch. It is now focused on access, and no longer
requires a role specification.

Closes #156
Also prune some prefixes that are not strictly needed.
@mih mih merged commit f2a6334 into main Apr 16, 2024
3 checks passed
@mih mih deleted the ddist branch April 16, 2024 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add the concept of a Parameter Decide on support of qualified_access property
3 participants