Skip to content

Releases: datahub-project/datahub

DataHub v0.8.23

14 Jan 23:06
a44b48a
Compare
Choose a tag to compare

Release Highlights

  • Fix critical Dashboard / Charts bug from 0.8.22, where Chart inputs were not being ingested successfully.
  • Adding currently deployed version to the UI (under top-right dropdown menu). Also available via the GMS /config endpoint.
  • Robustness improvements to DataHub Java Client Package
  • Introducing a new Elasticsearch ingestion connector!
  • Misc bug fixes & improvements.

What's Changed

Full Changelog: v0.8.22...v0.8.23

DataHub v0.8.22

09 Jan 00:59
bb0943f
Compare
Choose a tag to compare

Disclaimers!

  • Ingesting Chart Inputs was broken in a PR that got into this release. This will be fixed in v0.8.23. If you plan to ingest Charts / Dashboards, we recommend skipping this version and upgrading to v0.8.23 directly.

Release Highlights:

  • Support for mapping DBT meta properties of a dataset to metadata operations, such as add_owner, add_term, add_tag etc.
  • Java REST emitter library to programmatically generate metadata events from Java-based clients such as from Spark jobs.
  • Data freshness indication via Last Updated Timestamp.
  • Improvements to data profiling performance and lineage extraction

What's Changed

New Contributors

Full Changelog: v0.8.21...v0.8.22

v0.8.21

28 Dec 19:37
895af09
Compare
Choose a tag to compare

This release includes a fix for timeouts in reindexing of large indices that occurs when new fields are added to an index.

Release Highlights

  • Getting Started Modal + Empty State: Improve the experience of having no data ingested in DataHub by providing a "Getting Started" Guide when there is no data yet ingested.
  • Provide BigQuery credentials via recipe config: Previously BigQuery credentials were provided via environment variable. Going forward they can be provided directly inside the Recipe config.
  • Increase re-indexing 30s timeout: Previously elastic reindexing was maxed at a 30 second synchronous timeout. This was causing some upgrades of GMS to fail. This PR increases that timeout to one hour.

What's Changed

  • fix(lkml): bump lkml version up to 1.1.2 to support sql_preamble expression by @hyunminch in #3757
  • fix(react-ui): fix header min height by @gabe-lyons in #3784
  • docs(auth): add Microsoft Azure as an SSO provider (#3779) by @cccs-eric in #3780
  • Add azure OIDC doc to sidebar by @jjoyce0510 in #3785
  • feat(UI): Add "Getting Started" Modal on fresh deployment by @jjoyce0510 in #3773
  • feat(transform): adds simple add dataset properties transform by @sgomezvillamor in #3778
  • Update troubleshooting steps for local development with docker by @RyanHolstien in #3788
  • docs(redshift): Updating Redshift permission prerequisites in doc by @treff7es in #3777
  • fix(superset): fix Superset chart ingestion with an empty metric label by @cccs-eric in #3793
  • doc(transforms): adds doc for simple_add_dataset_properties transformer by @sgomezvillamor in #3790
  • feat(ingest): Add config option to set Bigquery credential in source config by @treff7es in #3786
  • fix(elastic): allow more time for re-indexing tasks by @gabe-lyons in #3794
  • docs(kafka): add example for ingestion from confluent cloud by @anshbansal in #3789

New Contributors

Full Changelog: v0.8.20...v0.8.21

v0.8.20

20 Dec 22:35
77e3641
Compare
Choose a tag to compare

This release includes the patch for CVE-2021-44228, pinning log4j to 0.2.17. Small bug fixes & improvements, otherwise.

Release Highlights

  • Configurable aspect retention in application.yml (disabled by default)
  • Metabase Ingestion Source connector
  • Constrain log4j to version 0.2.17
  • Upgrade logback to 1.2.9

What's Changed

  • feat(spark-lineage): add ability to push data lineage from spark to d… by @MugdhaHardikar-GSLab in #3664
  • feat(cli): allow to nuke without deleting data in quickstart by @anshbansal in #3655
  • feat(Dgraph): Make Dgraph a proper Neo4j alternative by @EnricoMi in #3578
  • feat(retention): Add retention to Local DB by @dexter-mh-lee in #3715
  • feat(ingest): cleanup deprecated datahub.integrations.airflow.* imports by @hsheth2 in #3732
  • feat(ingestion) : Add Metabase Source Connector by @jawadqu in #3602
  • fix(ingest): count profiled tables separately in report by @hsheth2 in #3731
  • feat(perf-test): changes for perf testing by @anshbansal in #3728
  • ci(cypress): adding the foundation for cypress integration tests & some starter coverage for login, search & updates by @gabe-lyons in #3672
  • (fix) Elastic search container log4j CVE-2021-44228 vulnerability by @nsbala-tw in #3733
  • Revert "feat(Dgraph): Make Dgraph a proper Neo4j alternative" by @gabe-lyons in #3740
  • fix(CI): Regenerate Docker Quickstart by @jjoyce0510 in #3741
  • fix(DataHubGraph): changing datahub-graph to use underlying session connection. by @varunbharill in #3743
  • fix(ingest): Remove unecessary isalpha check for data platforms + warnings by @jjoyce0510 in #3742
  • feat(snowflake-usage): add knob for direct objects accesssed vs base objects accessed by @gabe-lyons in #3744
  • fix(snowflake): support snowflake allow/deny pattern for lineage and usage by @varunbharill in #3748
  • refactor(gms auth): Remove base64 decoding of token service signing key by @jjoyce0510 in #3747
  • test(ingest): fix pytest warning for class starting with Test by @hsheth2 in #3745
  • feat: enables dbt metadata files to be loaded from URIs by @sgomezvillamor in #3739
  • fix(ingestion): Skipping duplicate tables from ingestion by @treff7es in #3753
  • feat(Stateful Ingestion): 1/3 Stateful ingestion server changes by @rslanka in #3749
  • Fix CVE-2021-44228 continued: log4j constraints to version 2.16.0 by @jjoyce0510 in #3755
  • build(ingest): restrict latest mypy version by @hsheth2 in #3756
  • doc: Add IOMED as a DataHub adopter by @merqurio in #3758
  • docs(spark-lineage): update artifact name and version by @MugdhaHardikar-GSLab in #3760
  • feat(profiler): add upper bound on combined query size by @hsheth2 in #3762
  • feat(ingestion): Mode retry wait logic to avoid hitting Mode API rate limit by @jawadqu in #3761
  • feat(Stateful Ingestion-2/3): Client side changes for checkpointing a source job state. by @rslanka in #3763
  • refactor(test): replace CliRunner with run_datahub_cmd method by @hsheth2 in #3746
  • feat(bigquery): add support for parsing exported bigquery audit logs by @hyunminch in #3680
  • feat(ingest): Adding support for Elasticsearch and Clickhouse by @sudotty in #3227
  • Upgrade to logback 1.2.9 to address CVE-2021-42550 by @jjoyce0510 in #3771
  • fix(profiling): Disabling expensive profilers by default by @treff7es in #3759
  • docs(ingestion): Add details of sensitive info handling by @anshbansal in #3767
  • docs(snowflake): Adding documentation about required Snowflake Privileges by @jjoyce0510 in #3770
  • Upgrade to 3rd Apache patch for log4j by @xiphl in #3772
  • fix(ingestion): Fix for same schema foreign key reference by @treff7es in #3769
  • fix(ingest): fix compatibility with google composer by @anshbansal in #3774

Known Issues

We've been made aware that in large deployments the re-indexing step required at boot-up time exceeds the 30 second timeout. We've since made changes to loosen this timeout limit, with these changes coming in 0.8.21.

New Contributors

Full Changelog: v0.8.19...v0.8.20

v0.8.19

13 Dec 19:13
83207b3
Compare
Choose a tag to compare

This release is a fast followup to the more substantial 0.8.18 release addressing bugs a few folks are facing in the Community.

Release Highlights

  • Fix base64 cli command issue where some systems do not have it.
  • Fix usage user extraction where email domain repeated twice.

What's Changed

  • fix(recommendations): don't show a 0 character when there are no suggestions by @gabe-lyons in #3720
  • fix(mode): support definitions in mode query by @gabe-lyons in #3721
  • fix(doc): fixing doc in datahub cli for corpuser urn. by @varunbharill in #3717
  • docs(redshift): Adding svv_table privilege requirement to redshift source doc by @treff7es in #3708
  • fix(profiler): Fixing division by zero in pct_unique calculation by @treff7es in #3727
  • fix(ingest): get mysql geotypes properly by @treff7es in #3726
  • fix(ingest): update trino source error handling in get_table_comment by @mayurinehate in #3712
  • feat(ingest) Trim long sql queries in usage by @treff7es in #3725
  • fix(ingestion): adds missing port to the connection bootstrap by @sgomezvillamor in #3706
  • fix(ingest): add source.config.connection.schema_registry_config to SchemaRegistryClient creation by @lvicentesanchez in #3702
  • fix(docker): Fix issues with base64 not working on some platforms by @dexter-mh-lee in #3723
  • feat(DataHubGraph): Adding utilities methods to DataHubGraph class. by @varunbharill in #3729
  • fix(superset): handle dashboards without charts (#3713) by @grumbler in #3714

New Contributors

Full Changelog: v0.8.18...v0.8.19

v0.8.18

10 Dec 19:45
d651040
Compare
Choose a tag to compare

DataHub Release 0.8.18 is here!

Release Highlights

  1. Metadata Service Authentication: Make authenticated requests to the Metadata Service APIs (GraphQL + Rest.li)

    1. Video Demo
    2. Technical Deep Dive
  2. Redshift Lineage: Out-of-the-box support for ingesting Dataset->Dataset lineage from Redshift system tables. Includes Tables, Views, and COPY from S3

    1. Video Demo
  3. Apache Nifi Connector (Beta) : Integration with Apache Nifi to extract DataJobs and DataFlows! Read the source docs here. This source is currently incubating in beta.

  4. Mode Connector (Beta): Integration with Mode Analytics to extract reports, charts, and more! Read the source docs here. This source is currently incubating in beta.

  5. Add Aspects without a fork: This is a major milestone towards No-Code UI

    1. Watch the No Code UI Sneak Peek
  6. Glossary Term Transformer: Allows users to add tags or glossary terms to entities based on a regex match filter (Shoutout to Community Member ecooklin!)

  7. Bug Fixes:

    1. [metadata service] Empty search query fails to resolve
    2. [metadata service] Log4j vulnerability addressed!! Highly recommend folks to upgrade to latest.
    3. [metadata ingestion] [bigquery] Fix handling of partitioned & snapshotted tables for lineage usage, and basic table indexing.
    4. [metadata-service] [recommendations] Fix issue where recently viewed and most popular recommendations were not showing up when user urn contains special chars.
    5. [metadata ingestion] Add config to specify ca certificate path for datahub-rest sink
    6. [metadata ingestion][snowflake] Handling for special characters in snowflake databases and schemas.
    7. [ui] Fix Groups page not showing asset ownership correctly
    8. [ui] Fix issue where markdown links were not clickable.
    9. [metadata service] Improve search & recommendations performance by ~50%, homepage load by ~50%.
    10. [cli] Fix deletes by search cannot accept auth token
    11. [metadata service][policies] Fix invalid Tag creation policy
    12. [metadata service][upgrade] Fix Spring injection of Entity Client inside datahub-upgrade

Backwards Incompatible Changes

  • The standalone Spring GraphQL Service has been removed. (Replaced in full by Metadata Service GraphQL API)

New Contributors

What's Changed

Read more

v0.8.17

19 Nov 07:58
f1045f8
Compare
Choose a tag to compare

Notable Changes

  • Added Recommendations and redesigned the home page!
    • Modular way to add recommendations throughout the application
    • Recommendation modules for top platforms, recently viewed, popular entities, top tags/terms were added to home page
    • Search page also has top tags/terms module on the bottom
  • Ingestion Sources
    • DBT enhancements
      • Creating dbt platform entities to capture dbt node types such as models, tests, source, seed, etc. linking dbt entities with other dbt or underlying platform entities.
    • OpenAPI specs
    • Kafka Connect (Regex based transformers, BigQuery sink)
    • Trino Usage (Starburst)
  • Improved lineage viz performance and lineage viz UX
    • Improved layout logic
    • Nodes can be dragged and dropped
  • Fixes for delete API not always deleting all of an entities data
  • Improved documentation for adding a custom Metadata Ingestion Source
    • Fixes description rendering for Charts, Dashboards, Flows, Jobs
  • Add YAML configuration file for Metadata Service
  • Filter search results by Sub-Type (Looker Explore, View, etc)
  • Support proxying DataHub Frontend requests to Metadata Service at /api/gms
  • Multi-platform (x86, arm64) support for Docker images (Apple M1 support)
  • Graph Service: DGraph support (phase 1)

What's Changed

Read more

DataHub v0.8.16

21 Oct 21:00
dd8c592
Compare
Choose a tag to compare

Release Highlights

  • Important bug-fixes: properties for DataJob and DataFlow, descriptions for Datasets should now correctly show in the UI
  • Search redesign! Single search experience across all entity types with left filter bar
  • Added searchAcrossEntities endpoint on both GraphQL and Rest.li that pulls search results for all entity types and mixes them together
  • Dataset level lineages - Added support for ingesting dataset level lineages for bigquery. Added support for linking external tables in redshift to the corresponding table in the external data catalog.
  • Performance optimization: graphql will now directly call the entity service instead of calling the entity resource over http to hydrate graphql models.
  • The “filter” input model used for “search” API now supports disjunctive normal form. (OR of ANDs). The previous filter model should continue to work as expected. (criteria array)
  • Adding foundations (models) for search insights, or highlights shown in the search result previews.
  • Add owner experience improvements: using full text search to find users and groups.
  • User & Group Management Screens!
    • View all users (and those who have logged in)
    • View all groups
    • Create new groups
    • Add and remove group members

Breaking Changes

None

What's Changed

Read more

DataHub v0.8.15

29 Sep 19:35
268d112
Compare
Choose a tag to compare
DataHub v0.8.15 Pre-release
Pre-release

Notable Changes

  • Support the “NONE” Client Authentication Method for OIDC login.
  • Migrated to the new UI for Charts, Dashboards, Data Flows (Pipelines), Data Jobs (Tasks) profile pages
  • Primary and Foreign Keys rendered in the UI
  • Ingestion
    • Support for redshift-usage source
    • Fixes for looker ingestion
    • datahub cli supports -f/--force option to skip confirmations

Changelog

DataHub v0.8.14

17 Sep 17:51
97bed71
Compare
Choose a tag to compare

Release Highlights

  • Small bug fixes over 0.8.13

Notable Changes

  • Fix bug in OIDC config for setting response type
  • Add WAU chart in the analytics page
  • Starting with acryl_datahub==0.8.13.1 (pypi), Looker and Lookml ingestion will now name views differently from before. You will need to delete old LookML metadata to start with a clean slate or specify view_naming_pattern = “{name}” in both your Looker and LookML ingestion recipes to get the old behavior.
  • Populate the user email field in usage statistics to correctly show top users on the entity page
  • Full changelog below

Changelog