About • Key Features • How it works • Usage • Installation • Demo • Reproducibility
NextiaDI, from nextia in the Nahuatl language (the old Aztec language), is an incremental and agnostic Data Integration (DI) that facilitates generating schema of heterogeneous data sources and integrating them. NextiaDI generates a graph-based schema for JSON and CSV. Note we are working to support more sources. We also provide with an incremental schema integration to annotate how integration is performed. We aim to automate as much as possible the integration of heterogeneous data sources.
Here, we provide you detailed information on how to run and evaluate NextiaDI. To learn more about the project, visit our website.
- Extraction of schamata levaraging on the structure of schemaless data sources
- Standardization of such extracted schemata into RDFS graph data model
- Annotation-based schema integration for RDF graphs describing unions and joins
- Automated derivation of DI constructs for specific querying systems (i.e., source schemata, schema mappings, and target schema)
We encourage you to read our paper to better understand what NextiaDI is and how can fit your scenarios.
- Java 11
- org.glassfish.javax.json 1.1.4
The easy way to use NextiaJD is with Maven.
For Gradle just add the following dependency in your build.sbt:
implementation 'edu.upc.essi.dtim:nextiadi:0.1.0'
For Apache Maven, just add the following dependency in your pom.xml:
<dependency>
<groupId>edu.upc.essi.dtim</groupId>
<artifactId>nextiadi</artifactId>
<version>0.1.0</version>
</dependency>
For more ways to import it, please go here
Depending on the intent, we will import different class. We have to main features: extract and standardize schema (bootstrapping) and schema integration To start using NextiaJD just import the implicits class as below:
We provide two bootstrapping methods: JSON and CSV. Note that we are planning to add more in a future.
To bootstrap a JSON file, we need to import the class:
import edu.upc.essi.dtim.nextiadi.bootstraping.JSONBootstrap;
Then to start the bootstrapping, we create an instance of the class JSONBootstrap
as follows:
JSONBootstrap b = new JSONBootstrap();
Using this instance, namely b
, we call the method bootstrapSchema(<Here datasource name>, <path to the data source>)
. This method will return a Jena model containing the schema represented as triples. An example of the used for this method is:
String path = "/home/datasources/sales.json"
Model schema_graph_based = b.bootstrapSchema("data source name", path);
To bootstrap a CSV file, we need to import the class:
import edu.upc.essi.dtim.nextiadi.bootstraping.CSVBootstrap;
Then to start the bootstrapping, we create an instance of the class CSVBootstrap
as follows:
CSVBootstrap b = new CSVBootstrap();
Using this instance, namely b
, we call the method bootstrapSchema(<Here datasource name>, <path to the data source>)
. This method will return a Jena model containing the schema represented as triples. An example of the used for this method is:
String path = "/home/datasources/sales.json"
Model schema_graph_based = b.bootstrapSchema("data source name", path);
To perform schema integration, we import the class:
import edu.upc.essi.dtim.NextiaDI;
Before integrating schemas, we need to read the RDF graph using Jena. We can do it as follows:
String pathA = "/somepath"
Model schemaA = RDFDataMgr.loadModel( pathA );
String pathB = "/somepath"
Model schemaB = RDFDataMgr.loadModel( pathB );
And have a list of aligments which will be used in the integration
# import edu.upc.essi.dtim.nextiadi.models.Alignment;
List<Alignment> alignments = new ArrayList<>();
Alignment a = new Alignment();
a.setIriA("some resource of A");
a.setIriB("some resource of B");
a.setL("some label for the integrated resource");
a.setType("class|datatype|object");
alignments.add(a);
Having the models A
and B
, and the list of alignments alignments
. We proceed to integrate by creating an instance of NextiaDI and calling the method Integrate
which return a model with the integrated annotations.
NextiaDI n = new NextiaDI();
Model integrated = n.Integrate(A, B, alignments);
If we wish to get a fully merge schema, we need to call the method getMinimalGraph()
after the integration, as follows:
Model minimal = n.getMinimalGraph()
We performed differents experiments to evaluate the predictive performance and efficiency of NextiaDI. In the spirit of open research and experimental reproducibility, we provide detailed information on how to reproduce them. More information about it can be found here.