Releases: datahub-project/datahub
DataHub v0.8.23
Release Highlights
- Fix critical Dashboard / Charts bug from 0.8.22, where Chart inputs were not being ingested successfully.
- Adding currently deployed version to the UI (under top-right dropdown menu). Also available via the GMS /config endpoint.
- Robustness improvements to DataHub Java Client Package
- Introducing a new Elasticsearch ingestion connector!
- Misc bug fixes & improvements.
What's Changed
- build: include correct version in metadata-ingestion docker image by @hsheth2 in #3857
- fix(metabase): fix crashes on missing values by @iasoon in #3859
- fix(datahub-client): fix shadow jar build, correct spark-lineage url … by @swaroopjagadish in #3871
- feat(git-version): Add version to the UI and config endpoint by @dexter-mh-lee in #3866
- fix(build): fix shadow jar checker to allow new git.properties by @swaroopjagadish in #3875
- feat(metadata-ingestion): Make datahub-rest client more robust by configurable retries. (#3826) by @RickardCardell in #3860
- fix(github-workflow): Remove duplicate context in kafka setup workflow by @dexter-mh-lee in #3876
- docs(azure-ad): correct default value for username attr by @iasoon in #3861
- docs: fix endpoint URL by @anshbansal in #3852
- fix(cli): disable telemetry in CLI tests by @kevinhu in #3877
- feat(metabase): allow configuring how database engines get mapped to platforms by @iasoon in #3869
- doc(graphql): add some examples by @anshbansal in #3867
- fix(search): Fix issue with filters and autocomplete by @dexter-mh-lee in #3868
- fix(build): remove jcenter from gradle build by @aditya-radhakrishnan in #3882
- (docs)Roadmap, Townhall, & Feature Request link updates by @maggiehays in #3873
- doc(kafka): add permissions required for confluent cloud by @anshbansal in #3850
- feat(ingest): ingestion-specific telemetry by @kevinhu in #3881
- Add AWS MSK Iam Auth Jar to GMS by @arunvasudevan in #3872
- docs(ingestion) azure: specify required permission type by @iasoon in #3886
- feat(ingestion) dbt: support spark sql types by @iasoon in #3880
- update dependency for bigquery. by @varunbharill in #3874
- fix(field-extraction): Fix extraction for unions by @dexter-mh-lee in #3892
- fix(ingest): sqlparser - Not lowercasing looker source's special table name by @treff7es in #3891
- feat(ingest): Support for spectrum external array types by @treff7es in #3890
- feat(Ingestion): Add Elasticsearch Source by @rslanka in #3893
Full Changelog: v0.8.22...v0.8.23
DataHub v0.8.22
Disclaimers!
- Ingesting Chart Inputs was broken in a PR that got into this release. This will be fixed in v0.8.23. If you plan to ingest Charts / Dashboards, we recommend skipping this version and upgrading to v0.8.23 directly.
Release Highlights:
- Support for mapping DBT meta properties of a dataset to metadata operations, such as add_owner, add_term, add_tag etc.
- Java REST emitter library to programmatically generate metadata events from Java-based clients such as from Spark jobs.
- Data freshness indication via Last Updated Timestamp.
- Improvements to data profiling performance and lineage extraction
What's Changed
- feat(snowflake-usage): Generate email address if not exists by @treff7es in #3791
- feat(java datahub-client): add Java REST emitter by @MugdhaHardikar-GSLab in #3781
- fix(docker): Fix path to elastic definition in dev docker compose by @MikeSchlosser16 in #3808
- feat(nocode): Add get entities v2 endpoint that can get without snapshot by @dexter-mh-lee in #3738
- docs(modeling): Add a link to MXE page inside the Metadata Modeling page by @pramodbiligiri in #3765
- docs(fix): fix broken reference by @RyanHolstien in #3814
- feat(java-emitter): improvements to builder API-s, moving spark-linea… by @swaroopjagadish in #3819
- fix(ingestion): Make url an optional field of the DefaultConfig for business glossary by @rslanka in #3817
- fix(ingest): Handle string redshift type by @treff7es in #3811
- feat(gms): add schema registry support for tls in gms by @MikeSchlosser16 in #3804
- Add table, changed formatting and wording by @dannylee8 in #3802
- feat(mae/mcl): Make ingestAspect produce both MCLs and MAEs by @dexter-mh-lee in #3737
- docs(confluent): Add new topic names by @anshbansal in #3825
- (feat)(glossary): Increase number of autocomplete results shown to 25 by @aditya-radhakrishnan in #3821
- feat(sql-parser): Replacing sqlmetadata sql parser lib with sqlineage parser lib by @treff7es in #3806
- feat(profiler): using approximate queries for profiling by @treff7es in #3752
- docs: improve docs for kafka configuration by @abiwill in #3828
- test(fixEbeanEntityServiceTest): fix bug on verification for EbeanEntityService by @RyanHolstien in #3829
- fix(ingest): ignore custom connectors for Glue ingestion by @kevinhu in #3805
- fix(java-emitter): check for null callback by @swaroopjagadish in #3830
- feat(dbt-meta): add support for dbt meta mapping by @swaroopjagadish in #3832
- fix(ingestion): Fix the datetime parsing issue in the metabase source. by @rslanka in #3831
- feat(removeGMA): remove all dependencies on gma libraries by @RyanHolstien in #3835
- perf(ingest): changes to improve ingest performance a bit by @anshbansal in #3837
- fix(azure AD): fix problem with missing key causing failures in ingestion by @anshbansal in #3824
- docs: fix typo by @anshbansal in #3848
- docs(cli): fix wrong heading, add link to release notes by @anshbansal in #3700
- feat(ci): split metadata-ingestion ci to streamline build by @swaroopjagadish in #3854
- fix(dbt): fix warning due to struct type not being mapped by @anshbansal in #3846
- fix(ingest): bigquery-usage - fix remove_extras to remove all partitions by @gfalcone in #3842
- fix(ingestion): handle database=None for dbt ingestion by @iasoon in #3851
- feat(ingest): last updated - show last updated for sql usage sources by @aditya-radhakrishnan in #3845
- feat(lineage): allow for expanding of lineage node titles in the lineage explorer by @gabe-lyons in #3856
New Contributors
- @MikeSchlosser16 made their first contribution in #3808
- @pramodbiligiri made their first contribution in #3765
- @aditya-radhakrishnan made their first contribution in #3821
- @abiwill made their first contribution in #3828
- @gfalcone made their first contribution in #3842
- @iasoon made their first contribution in #3851
Full Changelog: v0.8.21...v0.8.22
v0.8.21
This release includes a fix for timeouts in reindexing of large indices that occurs when new fields are added to an index.
Release Highlights
- Getting Started Modal + Empty State: Improve the experience of having no data ingested in DataHub by providing a "Getting Started" Guide when there is no data yet ingested.
- Provide BigQuery credentials via recipe config: Previously BigQuery credentials were provided via environment variable. Going forward they can be provided directly inside the Recipe config.
- Increase re-indexing 30s timeout: Previously elastic reindexing was maxed at a 30 second synchronous timeout. This was causing some upgrades of GMS to fail. This PR increases that timeout to one hour.
What's Changed
- fix(lkml): bump lkml version up to 1.1.2 to support sql_preamble expression by @hyunminch in #3757
- fix(react-ui): fix header min height by @gabe-lyons in #3784
- docs(auth): add Microsoft Azure as an SSO provider (#3779) by @cccs-eric in #3780
- Add azure OIDC doc to sidebar by @jjoyce0510 in #3785
- feat(UI): Add "Getting Started" Modal on fresh deployment by @jjoyce0510 in #3773
- feat(transform): adds simple add dataset properties transform by @sgomezvillamor in #3778
- Update troubleshooting steps for local development with docker by @RyanHolstien in #3788
- docs(redshift): Updating Redshift permission prerequisites in doc by @treff7es in #3777
- fix(superset): fix Superset chart ingestion with an empty metric label by @cccs-eric in #3793
- doc(transforms): adds doc for simple_add_dataset_properties transformer by @sgomezvillamor in #3790
- feat(ingest): Add config option to set Bigquery credential in source config by @treff7es in #3786
- fix(elastic): allow more time for re-indexing tasks by @gabe-lyons in #3794
- docs(kafka): add example for ingestion from confluent cloud by @anshbansal in #3789
New Contributors
- @cccs-eric made their first contribution in #3780
Full Changelog: v0.8.20...v0.8.21
v0.8.20
This release includes the patch for CVE-2021-44228, pinning log4j to 0.2.17. Small bug fixes & improvements, otherwise.
Release Highlights
- Configurable aspect retention in application.yml (disabled by default)
- Metabase Ingestion Source connector
- Constrain log4j to version 0.2.17
- Upgrade logback to 1.2.9
What's Changed
- feat(spark-lineage): add ability to push data lineage from spark to d… by @MugdhaHardikar-GSLab in #3664
- feat(cli): allow to nuke without deleting data in quickstart by @anshbansal in #3655
- feat(Dgraph): Make Dgraph a proper Neo4j alternative by @EnricoMi in #3578
- feat(retention): Add retention to Local DB by @dexter-mh-lee in #3715
- feat(ingest): cleanup deprecated
datahub.integrations.airflow.*
imports by @hsheth2 in #3732 - feat(ingestion) : Add Metabase Source Connector by @jawadqu in #3602
- fix(ingest): count profiled tables separately in report by @hsheth2 in #3731
- feat(perf-test): changes for perf testing by @anshbansal in #3728
- ci(cypress): adding the foundation for cypress integration tests & some starter coverage for login, search & updates by @gabe-lyons in #3672
- (fix) Elastic search container log4j CVE-2021-44228 vulnerability by @nsbala-tw in #3733
- Revert "feat(Dgraph): Make Dgraph a proper Neo4j alternative" by @gabe-lyons in #3740
- fix(CI): Regenerate Docker Quickstart by @jjoyce0510 in #3741
- fix(DataHubGraph): changing datahub-graph to use underlying session connection. by @varunbharill in #3743
- fix(ingest): Remove unecessary isalpha check for data platforms + warnings by @jjoyce0510 in #3742
- feat(snowflake-usage): add knob for direct objects accesssed vs base objects accessed by @gabe-lyons in #3744
- fix(snowflake): support snowflake allow/deny pattern for lineage and usage by @varunbharill in #3748
- refactor(gms auth): Remove base64 decoding of token service signing key by @jjoyce0510 in #3747
- test(ingest): fix pytest warning for class starting with
Test
by @hsheth2 in #3745 - feat: enables dbt metadata files to be loaded from URIs by @sgomezvillamor in #3739
- fix(ingestion): Skipping duplicate tables from ingestion by @treff7es in #3753
- feat(Stateful Ingestion): 1/3 Stateful ingestion server changes by @rslanka in #3749
- Fix CVE-2021-44228 continued: log4j constraints to version 2.16.0 by @jjoyce0510 in #3755
- build(ingest): restrict latest mypy version by @hsheth2 in #3756
- doc: Add IOMED as a DataHub adopter by @merqurio in #3758
- docs(spark-lineage): update artifact name and version by @MugdhaHardikar-GSLab in #3760
- feat(profiler): add upper bound on combined query size by @hsheth2 in #3762
- feat(ingestion): Mode retry wait logic to avoid hitting Mode API rate limit by @jawadqu in #3761
- feat(Stateful Ingestion-2/3): Client side changes for checkpointing a source job state. by @rslanka in #3763
- refactor(test): replace
CliRunner
withrun_datahub_cmd
method by @hsheth2 in #3746 - feat(bigquery): add support for parsing exported bigquery audit logs by @hyunminch in #3680
- feat(ingest): Adding support for Elasticsearch and Clickhouse by @sudotty in #3227
- Upgrade to logback 1.2.9 to address CVE-2021-42550 by @jjoyce0510 in #3771
- fix(profiling): Disabling expensive profilers by default by @treff7es in #3759
- docs(ingestion): Add details of sensitive info handling by @anshbansal in #3767
- docs(snowflake): Adding documentation about required Snowflake Privileges by @jjoyce0510 in #3770
- Upgrade to 3rd Apache patch for log4j by @xiphl in #3772
- fix(ingestion): Fix for same schema foreign key reference by @treff7es in #3769
- fix(ingest): fix compatibility with google composer by @anshbansal in #3774
Known Issues
We've been made aware that in large deployments the re-indexing step required at boot-up time exceeds the 30 second timeout. We've since made changes to loosen this timeout limit, with these changes coming in 0.8.21.
New Contributors
- @MugdhaHardikar-GSLab made their first contribution in #3664
- @jawadqu made their first contribution in #3602
- @nsbala-tw made their first contribution in #3733
- @merqurio made their first contribution in #3758
- @hyunminch made their first contribution in #3680
- @sudotty made their first contribution in #3227
- @xiphl made their first contribution in #3772
Full Changelog: v0.8.19...v0.8.20
v0.8.19
This release is a fast followup to the more substantial 0.8.18 release addressing bugs a few folks are facing in the Community.
Release Highlights
- Fix
base64
cli command issue where some systems do not have it. - Fix usage user extraction where email domain repeated twice.
What's Changed
- fix(recommendations): don't show a
0
character when there are no suggestions by @gabe-lyons in #3720 - fix(mode): support definitions in mode query by @gabe-lyons in #3721
- fix(doc): fixing doc in datahub cli for corpuser urn. by @varunbharill in #3717
- docs(redshift): Adding svv_table privilege requirement to redshift source doc by @treff7es in #3708
- fix(profiler): Fixing division by zero in pct_unique calculation by @treff7es in #3727
- fix(ingest): get mysql geotypes properly by @treff7es in #3726
- fix(ingest): update trino source error handling in get_table_comment by @mayurinehate in #3712
- feat(ingest) Trim long sql queries in usage by @treff7es in #3725
- fix(ingestion): adds missing port to the connection bootstrap by @sgomezvillamor in #3706
- fix(ingest): add source.config.connection.schema_registry_config to SchemaRegistryClient creation by @lvicentesanchez in #3702
- fix(docker): Fix issues with base64 not working on some platforms by @dexter-mh-lee in #3723
- feat(DataHubGraph): Adding utilities methods to DataHubGraph class. by @varunbharill in #3729
- fix(superset): handle dashboards without charts (#3713) by @grumbler in #3714
New Contributors
- @lvicentesanchez made their first contribution in #3702
- @grumbler made their first contribution in #3714
Full Changelog: v0.8.18...v0.8.19
v0.8.18
DataHub Release 0.8.18 is here!
Release Highlights
-
Metadata Service Authentication: Make authenticated requests to the Metadata Service APIs (GraphQL + Rest.li)
-
Redshift Lineage: Out-of-the-box support for ingesting Dataset->Dataset lineage from Redshift system tables. Includes Tables, Views, and COPY from S3
-
Apache Nifi Connector (Beta) : Integration with Apache Nifi to extract DataJobs and DataFlows! Read the source docs here. This source is currently incubating in beta.
-
Mode Connector (Beta): Integration with Mode Analytics to extract reports, charts, and more! Read the source docs here. This source is currently incubating in beta.
-
Add Aspects without a fork: This is a major milestone towards No-Code UI
- Watch the No Code UI Sneak Peek
-
Glossary Term Transformer: Allows users to add tags or glossary terms to entities based on a regex match filter (Shoutout to Community Member ecooklin!)
-
Bug Fixes:
- [metadata service] Empty search query fails to resolve
- [metadata service] Log4j vulnerability addressed!! Highly recommend folks to upgrade to latest.
- [metadata ingestion] [bigquery] Fix handling of partitioned & snapshotted tables for lineage usage, and basic table indexing.
- [metadata-service] [recommendations] Fix issue where recently viewed and most popular recommendations were not showing up when user urn contains special chars.
- [metadata ingestion] Add config to specify ca certificate path for datahub-rest sink
- [metadata ingestion][snowflake] Handling for special characters in snowflake databases and schemas.
- [ui] Fix Groups page not showing asset ownership correctly
- [ui] Fix issue where markdown links were not clickable.
- [metadata service] Improve search & recommendations performance by ~50%, homepage load by ~50%.
- [cli] Fix deletes by search cannot accept auth token
- [metadata service][policies] Fix invalid Tag creation policy
- [metadata service][upgrade] Fix Spring injection of Entity Client inside datahub-upgrade
Backwards Incompatible Changes
- The standalone Spring GraphQL Service has been removed. (Replaced in full by Metadata Service GraphQL API)
New Contributors
- @robscriva made their first contribution in #3600
- @adriangb made their first contribution in #3582
- @bartlomiejolma made their first contribution in #3650
- @anshbansal made their first contribution in #3653
- @ecooklin made their first contribution in #3657
What's Changed
- style(react-app): add default monospace font to font-family by @robscriva in #3600
- feat(boot): Ingest datahub root user info on boot by @jjoyce0510 in #3603
- [refactor] - Remove GMS GraphQL Service by @arunvasudevan in #3605
- feat(auth): Metadata Service Authentication! by @jjoyce0510 in #3598
- docs:remove hubspot form and instead link to acryldata.io by @jeffmerrick in #3488
- fix(docs): Move transformers to be under metadata ingestion by @aseembansal-gogo in #3591
- fix(bigquery-usage): Fix filters and event joining logic. by @varunbharill in #3610
- feat(cli): adding a put command and docs by @swaroopjagadish in #3614
- feat(elastic): adding es logo by @gabe-lyons in #3611
- feat(profiler): dynamically combine queries by @hsheth2 in #3572
- doc(components): Adding DataHub components overview by @jjoyce0510 in #3606
- fix(java client): Fix Profiling NPE + misc improvements by @jjoyce0510 in #3621
- fix(docs-website): fix incorrect managed url by @jeffmerrick in #3618
- fix(ingest): rectify platform urn in kafka connect source by @mayurinehate in #3624
- docs(okta): Added Okta Logout Settings by @serefacet in #3627
- fix(search): Fix issue when query is empty by @dexter-mh-lee in #3620
- fix(redshift-usage): Add docs for redshift usage ingestion. by @varunbharill in #3617
- fix(ci): pin great expectations version by @swaroopjagadish in #3629
- fix(delete): Remove logic that adds an invalid filter for platform field by @dexter-mh-lee in #3619
- feat(metadata-service): support for custom model extensions without forks by @shirshanka in #3630
- fix(kafka-producer): fix debug logging by @claudio-benfatto in #3626
- fix(tests): fix typo in test name by @adriangb in #3582
- feat(cfg): Add configurable GCP log page size by @jjoyce0510 in #3556
- fix(recommendations): Fix issue with recently viewed and most popular recs not showing up by @dexter-mh-lee in #3631
- fix(ingestion): Add config to specify ca certificate path for datahub-rest sink by @dexter-mh-lee in #3632
- fix(ingest): workaround great-expectations compatibility issue by @hsheth2 in #3634
- fix(ingestion): Handling for special characters in snowflake databases and schemas. by @rslanka in #3635
- fix(group ownership): Fixing Groups Profile ownership by @jjoyce0510 in #3638
- feat(autorender): Auto render aspects that don't have frontend components in the UI by @gabe-lyons in #3597
- docs(business glossary): document the business glossary file format by @gabe-lyons in #3639
- fix(ingestion): Enhance supported and unsupported base_objects_accessed for Snowflake Usage by @rslanka in #3608
- feat(quickstart): Simplify docker generate and compare script by @EnricoMi in #3434
- fix(docs): small fixes to docs and docker images for custom metadata … by @swaroopjagadish in #3640
- fix(mongodb): enable version check for document size filter. by @varunbharill in #3644
- docs: Update to DataHub Adopter logos & Townhall details by @maggiehays in #3648
- feat(build): adds support for incremental build in ingestion by @swaroopjagadish in #3647
- fix(description): fix issue where markdown links are unclickable by @gabe-lyons in #3646
- fix(schema): fix bug where key/value toggle would appear on schema tabs with no fields by @gabe-lyons in #3643
- feat(build): Preflight script for metadata ingestion setup on m1 by @treff7es in #3652
- docs(graphql) Adding additional GraphQL docs by @jjoyce0510 in #3649
- docs: correct title of postgres gms by @bartlomiejolma in #3650
- fix(cli): fix for deletion cli by @anshbansal in #3653
- fix(metadata-io) Adds docker engine configuration checks before running docker-based tests by @pedro93 in #3654
- fix(model): Remove unused PDL from pre-nocode days by @dexter-mh-lee in #3659
- fix(docs): fix docs build on m1 by @anshbansal in #3662
- feat(ingest): add --strict-warnings option by @hsheth2 in #3665
- fix(search): Improve search and recs performance by @dexter-mh-lee in #3660
- feat(metadata-model): adding metadata model doc generation and upload… by @swaroopjagadish in #3667
- fix(ingestion): black formatting by @hsheth2 in #3676
- fix(metadata-ingestion): fix requirements for m1 preflight checks by @gabe-lyons in #3677
- fix(kafka): Add back changes to centralize kafka config by @dexter-mh-lee in #3675
- feat(ingestion): anonymous usage stats by @kevinhu in #3668
- docs(scheduling): re-arrange docs related to scheduling, lineage, CLI by @anshbansal in #3669
- feat(delete): support deleting by searc...
v0.8.17
Notable Changes
- Added Recommendations and redesigned the home page!
- Modular way to add recommendations throughout the application
- Recommendation modules for top platforms, recently viewed, popular entities, top tags/terms were added to home page
- Search page also has top tags/terms module on the bottom
- Ingestion Sources
- DBT enhancements
- Creating dbt platform entities to capture dbt node types such as models, tests, source, seed, etc. linking dbt entities with other dbt or underlying platform entities.
- OpenAPI specs
- Kafka Connect (Regex based transformers, BigQuery sink)
- Trino Usage (Starburst)
- DBT enhancements
- Improved lineage viz performance and lineage viz UX
- Improved layout logic
- Nodes can be dragged and dropped
- Fixes for delete API not always deleting all of an entities data
- Improved documentation for adding a custom Metadata Ingestion Source
- Fixes description rendering for Charts, Dashboards, Flows, Jobs
- Add YAML configuration file for Metadata Service
- Filter search results by Sub-Type (Looker Explore, View, etc)
- Support proxying DataHub Frontend requests to Metadata Service at
/api/gms
- Multi-platform (x86, arm64) support for Docker images (Apple M1 support)
- Graph Service: DGraph support (phase 1)
What's Changed
- fix(docs): fix image paths and company logo link by @jeffmerrick in #3435
- feat(docs-site): two small tweaks by @gabe-lyons in #3437
- feat(ingestion): support custom properties to be ingested via business glossary yaml by @gabe-lyons in #3438
- fix(restli entity client): fix case where sortCriterion is null by @gabe-lyons in #3436
- feat(lineage): improved lineage performance + simplified layout logic + some easter eggs by @gabe-lyons in #3357
- docs(metamodel): added DataHub's metadata model diagram by @swaroopjagadish in #3449
- fix(tag+terms): improved error messaging & rules on tag + term mutations by @gabe-lyons in #3448
- fix(browse): disable breadcrumb links on non-browsable entities by @gabe-lyons in #3447
- fix(ingest): fix lookml derived tables parsing by @remisalmon in #3443
- docs(docs-site): small nits for docs site homepage by @gabe-lyons in #3444
- perf(ingest): lazy load ingestion plugins by @hsheth2 in #3430
- Fix docs website by @jeffmerrick in #3446
- fix(restore): Fix restore backup jobs by @dexter-mh-lee in #3445
- fix(ingest): lineage for Airflow subdags by @kevinhu in #3351
- docs: Update to Q3 2021 accomplishments by @maggiehays in #3420
- fix(bigquery): Add gcp logging dependency for bigquery source. by @varunbharill in #3451
- build(frontend): unzip depend on yarnBuild by @gabe-lyons in #3452
- feat(react): add handy webpack analyze command by @gabe-lyons in #3454
- test(CI): show test results on GitHub by @EnricoMi in #3362
- docs(transformers): add exemple of custom tag function by @WaStCo in #3354
- docs: add guide for using custom sources by @DSchmidtDev in #3324
- feat(dbt-ingestion): added possibility to skip specific models by @AndreasTA-AW in #3340
- fix(mongodb): Support filtering mongodb documents as per size. by @varunbharill in #3456
- fix(mysql): Update default mysql collation to utf8mb4_bin by @jjoyce0510 in #3459
- fix(ingestion): Workaround for Python 3.8/3.9 mypy invalid syntax issue with airflow 2.2.0 by @rslanka in #3460
- fix(ui): Fixing UI User + Group display name by @jjoyce0510 in #3461
- fix(react): fix up
yarn test
error reporting by @gabe-lyons in #3462 - docs(frontend): remove confusing suggestion to manually create users by @gabe-lyons in #3465
- docs: Overhaul of DataHub Features page by @maggiehays in #3439
- docs: Update TownHall Agenda and TownHall History by @maggiehays in #3463
- fix(tags): fix links to tags when there are special chars in the urls by @gabe-lyons in #3464
- fix(CI): Stabalize gradle build by @EnricoMi in #3413
- docs: update next Townhall date in README.md by @maggiehays in #3466
- perf(react bundle): decrease bundle size by 15% by @gabe-lyons in #3468
- fix(graphql): fixing Graphql engine factory when analytics are disabled by @gabe-lyons in #3467
- feat(recommendations): Recommendations infra P1 by @jjoyce0510 in #3455
- refactor(styling): Improving recommendation Tag / Search query list styling by @jjoyce0510 in #3472
- fix(docs): fix transformer doc example by @aseembansal-gogo in #3469
- fix(ingest): redshift source gets external table types properly by @treff7es in #3371
- fix(recs): Remove removed entities from aggregation by @dexter-mh-lee in #3473
- fix(ui): fix double formatting of entity count on home page by @jjoyce0510 in #3474
- fix(subtypes): fix case where subtypes are not being fetched for leaf datasets by @gabe-lyons in #3476
- feat(ingestion): User configurable dataset profiling. by @rslanka in #3453
- styling(ui): improve tag list, glossary term list recommendation styling by @jjoyce0510 in #3475
- feat(ui): Provide filtering capability for Sub Types inside the UI by @jjoyce0510 in #3479
- fix(ingest): correctly support multiple snowflake databases by @hsheth2 in #3482
- fix(datajobs): fetch dataflow properties from a relationship by @gabe-lyons in #3487
- fix(fk): fix schemaField urn construction in foreign keys by @gabe-lyons in #3486
- fix(fk): trim whitespace from fk constraints in the case the fieldspec has leading or trailing whitespace characters by @gabe-lyons in #3485
- feat(dbt): add dbt logo and platform. by @varunbharill in #3483
- feat(lineage): some ux improvements to lineage interactions by @gabe-lyons in #3478
- refactor(nocode): Final part of No-Code cleanup by @jjoyce0510 in #3477
- fix(browse paths): Adjust Default browse path logic for datasets by @jjoyce0510 in #3495
- fix(lineage backend): fix ownership timestamps by @gabe-lyons in #3498
- tests(smoke): introducing first isolated smoke test: updating tags & terms by @gabe-lyons in #3496
- feat(graphql): extend entity client to support aspect methods directly via java by @gabe-lyons in #3489
- fix(aspects): fix null aspects case by @gabe-lyons in #3501
- Docs: Update to Slack & Townhall details by @maggiehays in #3502
- refactor(profiler): add PerfTimer class and fix typos by @hsheth2 in #3497
- fix tiny typo by @andrewm4894 in #3484
- fix(ingestion): Glue job names by @kevinhu in #3503
- fix(fk): fix foreign key styling with modals by @gabe-lyons in #3500
- docs: add path fix for 'command not found' by @dannylee8 in #3490
- docs: nit, grammar by @dannylee8 in #3491
- docs: nit by @dannylee8 in #3492
- Docs: nits by @dannylee8 in #3493
- add tooltip for owner category in dataset profile page by @saxo-lalrishav in #3470
- feat(ingest) : kafka connect source improvements by @mayurinehate in #3481
- feat(ingest): adding support for read-modify-write capabilities durin… by @swaroopjagadish in #3506
- feat(dbt): Dbt enhancements - dbt nodes, lineage, subtype, etc. by @varunbharill in #3519
- docs (Metadata Model): nits by @dannylee8 in #3525
- fix(ingestion): Enhance logging and error-handling in bigquery usage connector. by @rslanka in https://github.com/linkedin/datahub/pul...
DataHub v0.8.16
Release Highlights
- Important bug-fixes:
properties
for DataJob and DataFlow,descriptions
for Datasets should now correctly show in the UI - Search redesign! Single search experience across all entity types with left filter bar
- Added searchAcrossEntities endpoint on both GraphQL and Rest.li that pulls search results for all entity types and mixes them together
- Dataset level lineages - Added support for ingesting dataset level lineages for bigquery. Added support for linking external tables in redshift to the corresponding table in the external data catalog.
- Performance optimization: graphql will now directly call the entity service instead of calling the entity resource over http to hydrate graphql models.
- The “filter” input model used for “search” API now supports disjunctive normal form. (OR of ANDs). The previous filter model should continue to work as expected. (criteria array)
- Adding foundations (models) for search insights, or highlights shown in the search result previews.
- Add owner experience improvements: using full text search to find users and groups.
- User & Group Management Screens!
- View all users (and those who have logged in)
- View all groups
- Create new groups
- Add and remove group members
Breaking Changes
None
What's Changed
- feat(ui): Improve add owner search experience by @jjoyce0510 in #3306
- (fix) Set ebean transaction level to be repeatable read by @xdl in #3285
- fix(fonts): fix manrope styling by @gabe-lyons in #3311
- docs(datahub-frontend): add build instructions for the datahub-frontend docker image by @thebouv in #3314
- feat(ingest): support for primary and foreign key extraction from sql sources by @swaroopjagadish in #3316
- feat(transform): adds replace_existing config to set_dataset_browse_path by @sgomezvillamor in #3313
- feat(redshift): added ability to extract external schema from Redshift spectrum by @varunbharill in #3321
- fix(docs): patch link to Airflow Docker compose file by @kevinhu in #3322
- docs: Fix topic_pattern typo in kafka ingestion docs by @serefacet in #3317
- fix(graphql): add ElasticSearch path prefix configuration by @zhoxie-cisco in #3297
- fix(ingest): more robust error handling in lookml sql parsing by @swaroopjagadish in #3325
- fix(ingest): Fix sasl exception for hive ingestion by @serefacet in #3326
- fix(ingest): no error when there are no partition keys by @aseembansal-gogo in #3328
- fix(docs): fix graphql deprecated comment by @gabe-lyons in #3327
- feat(dbt-ingestion): added tags and owner from dbt by @AndreasTA-AW in #3270
- fix(oidc): Tolerate null emails by @jjoyce0510 in #3330
- feat(Snowflake Lineage Ingestion) by @rslanka in #3331
- feat(ingest): support user group filtering for Azure AD by @vlavorini in #3312
- feat(ingest): Redash add parse_table_names_from_sql feature and multiple refactor by @taufiqibrahim in #3267
- feat(ingest): add support for github and looker links in looker views… by @swaroopjagadish in #3332
- fix(git-ignore): Git ignore generated python and avro artifacts by @dexter-mh-lee in #3320
- fix(ingestion): make dbt tag prefix configurable by @remisalmon in #3334
- feat(ingest): add trino source in metadata-ingestion by @mayurinehate in #3307
- feat(ingestion): support Airflow cluster config by @hsheth2 in #3336
- feat: add support for specialization of models through subtypes with … by @swaroopjagadish in #3338
- feat(search): Redesign search page - left filter pane by @dexter-mh-lee in #3337
- feat(users & groups): User & Groups Management GraphQL APIs + UI by @jjoyce0510 in #3318
- fix(pk + autocomplete): some ui fixes by @gabe-lyons in #3347
- fix(urns): prevent corrupted urns from being created by @gabe-lyons in #3348
- fix(ingestion-docker): Codegen and build again by @dexter-mh-lee in #3342
- docs(ingest): fix trino doc by @mayurinehate in #3339
- fix(docker-quickstart): Fix volume mount paths when using quickstart by @dexter-mh-lee in #3341
- fix(autocomplete): Fix empty autocomplete server error by @jjoyce0510 in #3346
- fix(Add custom elastic field mappings for all timeseries fields) by @rslanka in #3350
- fix(gitignore): Fix gitignore to ignore whole directory by @dexter-mh-lee in #3361
- fix(mce_builder): deleted alias by @vlavorini in #3356
- feat(data-platform): Add science and airflow data platform by @dexter-mh-lee in #3363
- fix(ui): fix url encoding issues by @gabe-lyons in #3359
- fix(gitignore): Update gitignore again - remove metadata-ingestion objects by @dexter-mh-lee in #3365
- fix(ci): add run_id to the task instance constructor for airflow by @swaroopjagadish in #3366
- fix(aws-deploy-docs): Fix documentation for elasticsearch by @dexter-mh-lee in #3360
- fix(bigquery_usage): Gracefully failing while parsing GCP log events. by @varunbharill in #3367
- feat(ingest): allow disabling sample values in profiling by @aseembansal-gogo in #3355
- fix(docs): fix docs for developing on metadata ingestion by @aseembansal-gogo in #3353
- test(CI): Timeout build job by @EnricoMi in #3364
- docs(OIDC): add note that root user is still accessible by @aseembansal-gogo in #3372
- test(metadata-io): Run metadata-io tests in parallel by @EnricoMi in #3358
- test(ElasticSearch): Retry ES requests by @EnricoMi in #3377
- fix(ingest): redshift usage properly count queries by @treff7es in #3370
- feat(subtypes): Support Viz for "view" subtypes by @jjoyce0510 in #3376
- fix(graphql): Correctly return tags and legacy global tags field by @jjoyce0510 in #3378
- fix(ingest): fixing support for kafka key schemas when only key schemas are present by @swaroopjagadish in #3379
- fix(search): Small bug fixes for search redesign by @dexter-mh-lee in #3381
- test(airflow): remove unneeded execution_date parameter from test by @hsheth2 in #3368
- feat(ingest): add mariadb as possible source by @aseembansal-gogo in #3245
- fix(search): fixing user and group links in search results by @gabe-lyons in #3383
- fix(subtypes): Fix subtypes tab visibility by @jjoyce0510 in #3386
- Revert "test(ElasticSearch): Retry ES requests" by @gabe-lyons in #3385
- Revert "Revert "test(ElasticSearch): Retry ES requests"" by @gabe-lyons in #3392
- Adding kafka connect data platform by @jjoyce0510 in #3388
- Replace big query logo with the latest by @jjoyce0510 in #3387
- oidc: Add "name" claim extraction if present by @jjoyce0510 in #3384
- feat(ingest): teaching lookml source that athena has 2 parts in its dataset names by @swaroopjagadish in #3393
- fix(ingest): fix issues with lookml view file resolution on non-view … by @swaroopjagadish in #3397
- feat(search): Search insights foundations by @jjoyce0510 in #3391
- fix(graphQL): Populating deprecated Dataset description field by @jjoyce0510 in #3403
- feat(search): Support Boolean OR Filters in Rest.li APIs by @jjoyce0510 in #3344
- fix(lookml): Fixing lookml integration test. by @varunbharill in #3405
- fix(browse): Add more special character handling by @dexter-mh-lee in #3404
- fix(search): Reduce default batch size by @dexter-mh-lee in #3407
- fix(ui): Extract customProperties map from "properties" OR ...
DataHub v0.8.15
Notable Changes
- Support the “NONE” Client Authentication Method for OIDC login.
- Migrated to the new UI for Charts, Dashboards, Data Flows (Pipelines), Data Jobs (Tasks) profile pages
- Primary and Foreign Keys rendered in the UI
- Ingestion
- Support for
redshift-usage
source - Fixes for
looker
ingestion datahub
cli supports -f/--force option to skip confirmations
- Support for
Changelog
- #3310 @jjoyce0510 Updating logo
- #3309 @jjoyce0510 Fixing lineage
- #3308 @jjoyce0510 Attach Client ID to token request in Authentication Mode none
- #3256 @aseembansal-gogo feat(ingest): add -f option to skip confirmations for automation en…
- #3298 @gabe-lyons feat(react): show primary keys & foreign keys in the schema
- #3172 @gabe-lyons marking data process aspects as deprecated
- #3301 @jjoyce0510 fix(upgrade): Improving NoCodeUpgrade logic to account for Bootstrap logic
- #3305 @jjoyce0510 feat(oidc): Support NONE client auth method in OIDC (stopgap)
- #3304 @gabe-lyons fix(docs): fix entity doc link
- #3303 @jjoyce0510 feat(UI): UI Migration for Charts, Dashboards, Pipelines, Tasks + Glossary Terms and Links for all.
- #3276 @bboylen feat(react): add groups tab to user profile
- #3299 @swaroopjagadish feat(build): adding support for python codegen for all aspects, not just the snapshot ones
- #3294 @swaroopjagadish fix(ingest): looker explores with joins, parsing failures on lateral flatten
- #3277 @chinmay-bhat feat(ingest): add redshift usage source
- #3290 @adriaanslechten feat(ingest): optional custom headers REST emitter
- #3293 @chinmay-bhat fix(build): update tox.ini to allow new dependencies to be installed
- #3292 @gabe-lyons fix(ingest): update generated files
- #3278 @jjoyce0510 refactor(graphql): GraphQL Public API Refactor + Documentation
- #3287 @swaroopjagadish fix(ingest): fix typo in looker tag generation
- #3275 @gabe-lyons feat(foreign keys): add foreign key models
- #3283 @aseembansal-gogo feat(ingest): add athena logo
- #3280 @gabe-lyons fix(react): fix updates from the UI
- #3279 @swaroopjagadish feat(ingest): add nice semantic run-ids that use source type and time of ingestion
- #3274 @gabe-lyons fix(chartinfo): only map chartinfo inputs if exists
- #3272 @gabe-lyons docs(adoption): updating adoption logos
- #3271 @jjoyce0510 fix(policies): Always ingest non-editable policies on boot
- #3259 @gabe-lyons feat(graphql): Adding write side validation and tests for add+remove API
- #3264 @swaroopjagadish fix(ingest): making lookml recursive and nested includes work
- #3262 @swaroopjagadish fix(ingest): looker cascading derived tables should express lineage to view not underlying table
- #3254 @abdvl fix(web): upgrade remove-markdown package to fix a ReDoS security issue
- #3011 @EnricoMi test(GraphService): Thorough graph service tests
- #3258 @jensenity chore: add banksalad to datahub adoption readme
DataHub v0.8.14
Release Highlights
- Small bug fixes over 0.8.13
Notable Changes
- Fix bug in OIDC config for setting response type
- Add WAU chart in the analytics page
- Starting with
acryl_datahub==0.8.13.1
(pypi), Looker and Lookml ingestion will now name views differently from before. You will need to delete old LookML metadata to start with a clean slate or specifyview_naming_pattern = “{name}”
in both your Looker and LookML ingestion recipes to get the old behavior. - Populate the user email field in usage statistics to correctly show top users on the entity page
- Full changelog below
Changelog
- #3215 @aseembansal-gogo feat(ingest): support for env variable in cli
- #3253 @remisalmon fix(ingest): allow ingestion of glossary terms without nodes
- #3255 @swaroopjagadish feat(ingest): looker and lookml improvements - connection, explores, folders
- #3010 @EnricoMi refactor(dao/utils): Move general createRelationshipFilter from Neo4jUtil to QueryUtils
- #2736 @jjoyce0510 rfc(RBAC): Fine-Grained Access Controls in GMS
- #3251 @jjoyce0510 Fixing response type bug
- #3249 @dexter-mh-lee Fix OIDC doc
- #3252 @dexter-mh-lee feat(analytics): Add WAU over the last 2 months chart
- #3250 @gabe-lyons feat(glossary): splitting apart tags & terms into their own visual sections
- #3244 @rslanka fix(usage statistics): populate the email field
- #3238 @aseembansal-gogo fix(ingest): add missing partition keys in schema for glue sources
- #3243 @swaroopjagadish fix(ingest): fixing snowflake and bigquery usage connectors to use real user urns
- #3241 @claudio-benfatto fix(docker): use wait-http-header to avoid printing cleartext credentials
- #3220 @dexter-mh-lee fix(frontend): Add additional sasl config for kafka producer in datahub-frontend