Skip to content

Latest commit

 

History

History
169 lines (136 loc) · 7.51 KB

data-connection.md

File metadata and controls

169 lines (136 loc) · 7.51 KB

Data connection

Data connection is used for accessing data in a particular data source. Connection parameters depend on the data source. Typically, you would need to provide server name and login credentials.

Connectors

A connector could work with a database, an Excel file, a CSV file, a web service, or basically anything that is capable of providing the data. We currently support over 20 different connectors, and the list is quickly growing. Most of our data connectors are open-sourced and could be found on GitHub (MIT license).

Name Type
Access JDBC
Athena JDBC
BigQuery JDBC
Cassandra JDBC
DB2 JDBC
Denodo JDBC
DropBox Files
Files Files
Firebird JDBC
Git Files
GoogleCloud Files
HBase JDBC
Hive JDBC
Hive2 JDBC
Impala JDBC
MariaDB JDBC
MongoDB JDBC
MS SQL JDBC
MySql JDBC
Neo4j JDBC
OData
Oracle JDBC
Postgres JDBC
Redshift JDBC
S3 Files
Snowflake JDBC
Socrata
Sparql
SQLite JDBC
Teradata JDBC
Twitter
Vertica JDBC
Virtuoso JDBC
Web

Creating a connection

To create a new data connection, open the "Databases" pane (Open | Databases), right-click on the appropriate connector in the tree, and choose "Add connection...". Alternatively, click on "New Connection" under the "Actions" tab, and select the appropriate connector.

Editing properties

Then, edit the connection attributes, and click on TEST to confirm that you've entered everything correctly. The set of attributes you can edit depends on the connector. Typically, for JDBC-based connectors you can provide a custom connection string (but do not enter login and password there, they will still be picked up from the corresponding fields).

Once a connection is set up, you are ready to start creating queries. There are multiple ways to do so: manually or programmatically.

Note that the platform supports caching of results.

Access control

Connections are first-class entities in the Datagrok platform, and as such are subjects to the standard checks and routines performed against them whenever they are used in the specific context. Some of the most popular privileges are: view, edit, delete, and share. Those privileges can be given to individual users, or to groups. For more information on the access privilege model, check out privileges.

Another “out of the box” feature that comes with connections being first-class entity is the audit trail for every action performed against the connection. For details on that, check out Audit page.

Caching

You can force the platform to cache results of executing queries (taking into account parameters as well). This is useful when a query executes slowly and returns relatively small results. A popular use case is caching the values that are used for building the UI automatically (typically this is some form of select distinct <name> from <table>) is a good idea.

You can turn caching on for either the whole connection, or for a particular query. For a connection, open its properties, and check the "Cache Results" checkbox. When you check it, the "Invalidate On" input becomes visible. Enter the cron expression there to define cache invalidation timepoints (for instance, if a database refreshes overnight and you want to invalidate it at 1am each night, enter 0 0 1 * * ?). Leaving the field blank will make the cache stay forever.

If a connection gets created automatically as part of the package, you can specify the cacheResults parameter in the connection json definition:

{
  "name": "Northwind",
  "parameters": {
    "server": "dev.datagrok.ai",
    "port": 23306,
    "db": "Northwind",
    "cacheSchema": false,
    "cacheResults": true,
    "ssl": false,
    "connString": ""
  }
}

To cache results of individual queries, edit the query (either via Datagrok UI if a query already exists, or by editing the corresponding .sql file of the package queries) and specify the meta.cache and meta.invalidate fields:

--name: getProductNames
--input: string department
--meta.cache: true
--meta.invalidate: 0 0 1 * * ?
select distinct name from products p 
where p.department = @department

Filtering

You can use these fields to filter connections with smart search:

Field Description
id
name
server
port
db
login
dataSource
description
createdOn
updatedOn
author User object
starredBy User object
commentedBy User object
usedBy User object

JDBC connection

For some cases connection may require custom JDBC connection string. For this case, JDBC-based data connection has parameter "Conn. string". If filled, it will be used for connection, and all other parameters will be ignored except for "Login" and "Password".

Videos

Data connection

See also: