Replies: 3 comments 5 replies
-
I personally think that schemas/datasets are enough of a thing to deserve to be represented by a fully fledged object. But we need to clearly articulate whether or not this object has any kind of model collection aspect in its semantics in dbt. Here I'm mostly concerned about the inheritance story between schemas and tables (in the database) versus schema and models (as could be added in the dbt project) versus folders and models (the current dbt way). I'm not concerned about the implementation, or have much doubt about how it should work. I'm concerned about the expectations that users will have, and how it can make the mental model (hu hu) of models more complex to grasp. Or do you see that happening: # dbt_project.yml ?? I'd really prefer not to keep overloading this file though
databases: # 'projects'? 'catalogs'? what about data platforms that don't support this third level?
- name: my_database
schemas: # or 'datasets'
- name: important_schema # actual schema name? or schema config --> generate_schema_name?
tables:
- name: myTable1 |
Beta Was this translation helpful? Give feedback.
-
I am thinking about integrating the
@jtcohen6 how do you envision this? |
Beta Was this translation helpful? Give feedback.
-
Flagging that one aspect of schema management, |
Beta Was this translation helpful? Give feedback.
-
A more specific version of #5099
Philosophical premise
I am less inclined nowadays to say dbt should aspire to be "Terraform for data warehouses." Rather than a fully generic and extensible framework for managing data warehouse objects, where we take the existence of those object types (tables, schemas, tasks, streams, roles, UDFs, ...) as our starting point, I think we should always take jobs-to-be-done in data as our starting point (models, sources, exposures, tests, descriptions, granting access, ...), and treat data warehouse objects as implementation details / backend mechanisms for achieving those jobs.
Ok, but we need schemas
paraphrasing from dbt-labs/dbt-bigquery#289 (comment)
We're missing a mechanism for users to truly configure schemas/datasets in dbt today. We can agree on that, without necessarily implying that dbt should first / eventually become a totally generic and extensible framework for managing all types of database objects.
dbt is already in the business of creating schemas/datasets, as soon as you define and run a model in a schema/dataset that does not yet exist:
dbt-core/core/dbt/task/run.py
Line 423 in e95e36d
This has come up in the context of labels (dbt-labs/dbt-bigquery#22), persisting descriptions/comments (#1714), grants (dev blog), and managing "orphaned" objects (#4957).
Big ideas:
create schema
, it would take adapter-specific configs into account, persist the description as a comment, and so on.create schema
for the first time, it adds every grant. If the schema already exists, it runs an introspective query to figure the current grants, calculates diffs, and applies if needed.dbt manage
task (as proposed in [CT-426] [Feature] Manage full schema content with dbt by dropping tables/views not maintained in models #4957 / Feature CT-426 manage schemas #5392), it would take action to manage (warn or drop) orphaned objects in that schemaQuestions:
database
+schema
names be the actual database/schema names in the database? Or should they match the schema config which is passed into 'generate_schema_name', and naturally differ by environment? I lean toward the latter, though it could be tricky!Beta Was this translation helpful? Give feedback.
All reactions