forked from SynBioDex/SBOL-specification
-
Notifications
You must be signed in to change notification settings - Fork 0
/
serialization.tex
51 lines (41 loc) · 7.96 KB
/
serialization.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
% -----------------------------------------------------------------------------
\section{SBOL RDF Serialization}
\label{sec:serialization}
% -----------------------------------------------------------------------------
In order for SBOL objects to be readily stored and exchanged, it is important that they are able to be {\em serialized}, i.e., converted to a sequence of bytes that can be stored in a file or exchanged over a network. The serialization format for SBOL is designed to meet several competing requirements.
First, SBOL needs to support ad-hoc annotations and extensions.
Second, SBOL needs to support processing by general database and semantic web software tools that have little or no knowledge of the SBOL data model.
Finally, it ought to be relatively simple to write a new software implementation, so that SBOL can be readily used even in software environments where community-maintained implementations are not available.
To meet these goals, the canonical serialization of SBOL has been selected to be a strict dialect of RDF/XML~\cite{rdfxml}, a syntactic standard defined for Semantic Web data exchange.
This serialization provides a standard base from which to meet further requirements.
Moreover, it allows any RDF/XML-aware software tool to consume and operate on an SBOL file without needing any customization to support SBOL. Where possible, we have re-used predicates from widely-used terminologies (such as Dublin Core~\cite{dcmi2012}) to expose as much of the data as practical to such standard RDF tooling.
Arbitrary RDF/XML, however, provides a sometimes problematically large amount of flexibility in how equivalent data can be serialized. This flexibility can result in different serializations when processing RDF/XML files using standard off-the-shelf XML tools, such as DOM-OO mappings.
To address this issue, we define a canonical association between the nesting of data structures within the SBOL UML data model and the RDF/XML file.
For all ownership associations (filled diamonds), the RDF/XML for the owned entity is embedded within the owner's RDF/XML (note, however, that the property values MAY be listed in any order).
For all associations that are by reference (open diamonds), the RDF/XML for the referenced property is linked via a resource \sbol{URI}.
For example, the serialization of a \sbol{ComponentDefinition} embeds the serializations of the \sbol{SequenceConstraint} and \sbol{Component} objects associated with it. Those \sbol{SequenceConstraint} objects, however, link to the \sbol{Component} objects with a \sbol{URI} rather than embedding another copy.
Every SBOL document MUST be a valid RDF/XML document.
Accordingly, each SBOL document starts with an XML declaration that has its XML version set to ``1.0.'' As shown in the example below, this declaration is then followed by an \external{rdf:RDF} XML element that includes the namespace declarations for RDF, Dublin Core, PROV, and SBOL. The SBOL namespace, which is \url{http://sbols.org/v2#}, is used to indicate which entities and properties in the SBOL document are defined by SBOL, and MUST NOT be used for any entities or properties not defined in this specification.
\lstsetsbol
\begin{lstlisting}
<?xml version="1.0" ?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:prov="http://www.w3.org/ns/prov#" xmlns:sbol="http://sbols.org/v2#">
...
</rdf:RDF>
\end{lstlisting}
All first-class SBOL data types (i.e., those enumerated in \ref{sec:model}) have an associated identifying \sbol{URI}. In the RDF, this is the resource \sbol{URI} used by instances of that type. For example, \sbol{ComponentDefinition} has the
type \sbol{URI} \external{sbol:ComponentDefinition}.
Properties and associations are then asserted as nested RDF/XML assertions.
\ref{sec:model} provides the serialization template and an example at the end of its description of each data type.
All of the data types that are \sbol{TopLevel} are so named because they always appear at the top-most level of the RDF/XML serialization. All other data types will always appear nested within their parent container and, ultimately, some \sbol{TopLevel} object.
For example, a \sbol{ComponentDefinition} is a \sbol{TopLevel} and therefore listed at the top-most level of the RDF/XML serialization, and contains its \sbol{SequenceConstraint} objects, since they are not \sbol{TopLevel}. Its \sbol{Sequence}, however, is also \sbol{TopLevel} and is therefore not nested within and instead linked via a \sbol{URI}.
Each instance of a first-class SBOL data type MAY have annotations attached, as described in \ref{sec:Annotations}. These annotations are composed of a name and a value. They are serialized to RDF as a conceptual triple with the subject being the identity of the instance they annotate, the predicate being the name of the annotation, and the object being the value of that annotation. Annotation values are always nested within the RDF/XML serialization of the instance that they annotate.
For example, a \sbol{ModuleDefinition} might add a DOI annotation that links to the scientific article that first described the system that it represents.
SBOL also supports top-level, user-defined data, again as described in \ref{sec:Annotations}. This is to allow non-standardized but necessary information to be carried around as part of a design. For example, a particular sub-community might have an internal standard for genetic device characterization data sheets.
Such data can be represented as a \sbol{GenericTopLevel} object with internal structured annotations.
For example, each individual data sheet might be contained in its own \sbol{GenericTopLevel} instance.
This custom data is serialized into the RDF/XML in the usual way, as an RDF/XML block at the top level of the file.
Other objects can refer to this entity through their annotations by reference, and this generic top-level entity can refer to other entities via references.
For example, a \sbol{ModuleDefinition} might use an annotation to refer to the data sheet \sbol{GenericTopLevel} that documents its properties.
\twozeroone{As previously stated, XML is the preferred RDF serialization for SBOL files, and these files should use the ".xml" file extension. While any RDF serialization is valid, as the contents of the triples and not their format determine the validity, restriction to XML enhances human ability to classify a document as SBOL easily. Furthermore, documents can be human-examined by using XML formatting tools and syntax highlighters. It should be noted that XML was chosen for no particular reason than the simple fact of its widespread use in the community and by existing libraries. The proposers are aware that many of these benefits apply to several serializations of SBOL. The choice of XML is simply a codification of the existent standard and practice.}
By adopting this paradigm of RDF/XML serialization, SBOL is able to adapt to future changes in the standard without requiring large-scale alterations to the RDF files. Since exactly the same scheme is used to serialize annotations as is used to serialize specification-defined properties and associations, it is possible to update the SBOL standard to recognize a different range of properties and associations. Those properties not recognized by the specification will always be available through the API as annotations. Similarly, by allowing arbitrary top-level entities in a SBOL file, we enable future specifications or extensions to ratify the structure of other top-level objects. These entities would then become part of the explicit data model, but the identical RDF serialization would be used. Applications lacking support for a given extension can safely read in, manipulate, and write out the top-level data that is not understood, treating it as a top-level structured annotation, without data loss or corruption. Finally, the very regimented control of nesting versus referencing also allows the XML structure to be very predictable, enabling XML/DOM-based tooling to work with SBOL RDF/XML files safely.