-
Notifications
You must be signed in to change notification settings - Fork 3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: add guides on deleting entities (#7636)
Co-authored-by: Hyejin Yoon <[email protected]> Co-authored-by: Hyejin Yoon <[email protected]>
- Loading branch information
1 parent
589d354
commit 864ac2d
Showing
5 changed files
with
121 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
# Deleting Entities By Urn | ||
|
||
## Why Would You Delete Entities? | ||
You may want to delete a dataset if it is no longer needed, contains incorrect or sensitive information, or if it was created for testing purposes and is no longer necessary in production. | ||
It is possible to [delete entities via CLI](/docs/how/delete-metadata.md), but a programmatic approach is necessary for scalability. | ||
|
||
There are two methods of deletion: soft delete and hard delete. | ||
**Soft delete** sets the Status aspect of the entity to Removed, which hides the entity and all its aspects from being returned by the UI. | ||
**Hard delete** physically deletes all rows for all aspects of the entity. | ||
|
||
For more information about soft delete and hard delete, please refer to [Removing Metadata from DataHub](/docs/how/delete-metadata.md#delete-by-urn). | ||
|
||
### Goal Of This Guide | ||
This guide will show you how to delete a dataset named `fct_user_deleted`. | ||
However, you can delete other entities like tags, terms, and owners with the same approach. | ||
|
||
## Prerequisites | ||
For this tutorial, you need to deploy DataHub Quickstart and ingest sample data. | ||
For detailed steps, please refer to [Prepare Local DataHub Environment](/docs/api/tutorials/references/prepare-datahub.md). | ||
|
||
## Delete Datasets With GraphQL | ||
|
||
> 🚫 Hard delete with GraphQL is currently not supported. | ||
> Please check out [API feature comparison table](/docs/api/datahub-apis.md#datahub-api-comparison) for more information. | ||
|
||
### GraphQL Explorer | ||
GraphQL Explorer is the fastest way to experiment with GraphQL without any dependancies. | ||
Navigate to GraphQL Explorer (`http://localhost:9002/api/graphiql`) and run the following query. | ||
|
||
```json | ||
mutation batchUpdateSoftDeleted { | ||
batchUpdateSoftDeleted(input: | ||
{ urns: ["urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_deleted,PROD)"], | ||
deleted: true }) | ||
} | ||
``` | ||
If you see the following response, the operation was successful: | ||
```json | ||
{ | ||
"data": { | ||
"batchUpdateSoftDeleted": true | ||
}, | ||
"extensions": {} | ||
} | ||
``` | ||
|
||
### CURL | ||
|
||
With CURL, you need to provide tokens. To generate a token, please refer to [Generate Access Token](/docs/api/tutorials/references/generate-access-token.md). | ||
With `accessToken`, you can run the following command. | ||
|
||
```shell | ||
curl --location --request POST 'http://localhost:8080/api/graphql' \ | ||
--header 'Authorization: Bearer <my-access-token>' \ | ||
--header 'Content-Type: application/json' \ | ||
--data-raw '{ "query": "mutation batchUpdateSoftDeleted { batchUpdateSoftDeleted(input: { deleted: true, urns: [\"urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_deleted,PROD)\"] }) }", "variables":{}}' | ||
``` | ||
|
||
Expected Response: | ||
```json | ||
{"data":{"batchUpdateSoftDeleted":true},"extensions":{}} | ||
``` | ||
|
||
## Delete Datasets With Python SDK | ||
|
||
The following code deletes a hive dataset named `fct_users_deleted`. | ||
You can refer to the complete code in [delete_dataset.py](https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/delete_dataset.py). | ||
|
||
```python | ||
import logging | ||
from datahub.cli import delete_cli | ||
from datahub.emitter.rest_emitter import DatahubRestEmitter | ||
from datahub.emitter.mce_builder import make_dataset_urn | ||
|
||
log = logging.getLogger(__name__) | ||
logging.basicConfig(level=logging.INFO) | ||
|
||
rest_emitter = DatahubRestEmitter(gms_server="http://localhost:8080") | ||
dataset_urn = make_dataset_urn(name="fct_users_created", platform="hive") | ||
|
||
delete_cli._delete_one_urn(urn=dataset_urn, soft=true, cached_emitter=rest_emitter) | ||
|
||
log.info(f"Deleted dataset {dataset_urn}") | ||
``` | ||
|
||
We're using the `MetdataChangeProposalWrapper` to change entities in this example. | ||
For more information about the `MetadataChangeProposal`, please refer to [MetadataChangeProposal & MetadataChangeLog Events](/docs/advanced/mcp-mcl.md) | ||
|
||
|
||
## Expected Outcomes | ||
The dataset `fct_users_deleted` has now been deleted, so if you search for a hive dataset named `fct_users_delete`, you will no longer be able to see it. | ||
|
||
![dataset-deleted](../../imgs/apis/tutorials/dataset-deleted.png) | ||
|
||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
import logging | ||
|
||
from datahub.cli import delete_cli | ||
from datahub.emitter.mce_builder import make_dataset_urn | ||
from datahub.emitter.rest_emitter import DatahubRestEmitter | ||
|
||
log = logging.getLogger(__name__) | ||
logging.basicConfig(level=logging.INFO) | ||
|
||
rest_emitter = DatahubRestEmitter(gms_server="http://localhost:8080") | ||
dataset_urn = make_dataset_urn(name="fct_users_created", platform="hive") | ||
|
||
delete_cli._delete_one_urn(urn=dataset_urn, soft=True, cached_emitter=rest_emitter) | ||
|
||
log.info(f"Deleted dataset {dataset_urn}") |