Skip to content

Latest commit

 

History

History
39 lines (24 loc) · 851 Bytes

azure-databricks-dq-pipeline.md

File metadata and controls

39 lines (24 loc) · 851 Bytes

Azure DataBricks - DQ Pipeline

Run a DQ check on any file in Azure Blob

Read the File by setting up the azure key.

spark.conf.set("fs.azure.account.key.abcCompany.blob.core.windows.net","GBB6Upzj4AxQld7cFv7wBYNoJzIp/WEv/5NslqszY3nAAlsalBNQ==")

val df = spark.read.parquet("wasbs://[email protected]/FILE_NAME/20190201_FILE_NAME.parquet")

Process the file using Owl

// register in Owl Catalog, Optional
val owl = new Owl(df).register

// run a full DQ Check
owl.owlCheck()

Additional imports and input options

import com.owl.core._
import com.owl.common._

val props = new Props()
props.dataset = datasetName
props.runId = 2019-03-02
props..... // look at the many input options

{% embed url="https://owl-analytics.com/Create-a-Data-Quality-Pipeline-using-Owl.html" %}