[Feature] dbt should know about Attributes
(foundation for Metrics/Dimensions)
#4621
Replies: 6 comments 4 replies
-
At my current company I have a script that generates yaml for a table which has been created via I feel like if we want to solve consistent documentation, we should perhaps consider a new builtin CLI command? Something that could generate docs with existing descriptions, as well as point out inconsistent definitions, and perhaps edit the I'm leaning that way because I think attributes as they are proposed above are sort of trying to be too many things: are we solving metric definitions or consistent docs? Let's be explicit and narrow in on the problem we want to solve 😄 |
Beta Was this translation helpful? Give feedback.
-
I would like to offer another PoV from a GoodData company (I'm an employee there) that has implemented its own ROLAP engine (historically inspired by MicroStrategy). Same as the issue author, we (and our customers) see great value of building metrics on top of an abstraction over the physical DB (instead of directly over DB columns). We would like to offer our longterm experience in this area to help shape the universal semantic layer design. Let me briefly outline our approach. What you outline here is basically what we in GoodData call a Logical Data Model (LDM). The LDM consists of Once an LDM is set up the end users can then define their metrics using facts and attributes via a custom language called MAQL. A typical simple metric could look like
Such a metric could then be used in many different reports, in each of them broken down by any attributes "compatible" with the Price fact, e.g. Product, Customer, Campaign. (For more info about LDM you can read the intro docs). I hope that our approach could be of some inspiration in your proposed design which I generally like quite a lot, especially the overall direction of your thinking. I may be biased, though :-) Couple of questions:
Anyway, thanks for a great kick off! |
Beta Was this translation helpful? Give feedback.
-
Big thank you @aaronsteers for the clear and thorough write-up, and @tnightengale @david-kubecka for the thoughtful replies! I'm way overdue for responding to this one, in part because it sent me down a flurry of mental paths, and it's taken me a while to gather my thoughts together into anything serviceable. I believe the move into defining metrics in Semantically meaningful column properties ought to be inherited, and not re-typed every single time. That inheritance can run in two ways:
The separation/combination I described above is something I want to pursue, and it's something we can do with existing constructs—models and columns—plus a few new capabilities. So, do we need another construct here? What does a notion of an I've become increasingly convinced that there's real value in defining column "types." The closest thing to column "types" in dbt today is a mix of descriptions (achieved with reusable So, imagine a
Then we work toward a synthesized approach whereby:
Lots of questions:
Ok! This is just the beginning. I'm going to turn this into a discussion, in the hope that more folks join in :) |
Beta Was this translation helpful? Give feedback.
-
An argument to at least allow defining derived/computed metrics based on base metrics: I come from mara-schema which itself is based on the XML schema used to configure metrics in the mondrian OLAP engine. A best practice was to base metrics on plain columns and derived metrics from these base metrics. So a rate was a) you do not need to copy the column definition from the base metric to the derived metric metabase also uses such a model (base metrics + derived metrics based on it), so syncing such a model to metabase worked very well. |
Beta Was this translation helpful? Give feedback.
-
Hi, How about the Cube JS approach? See a snippet below, where you define measures in a very intuitive/SQL like way. cube(
}, By using and stickkng to the SQL like approach, it will be very intuitive and easy to adapt by dbt community, and well readable, such as the mara-schema example. Attributes really sound like something very abstract. Also, you will be able to use a Metric serving layer such as CubeJS to read the metric definitions from dbt and serve all your BI/Reporting tools from it |
Beta Was this translation helpful? Give feedback.
-
I agree with above in that it feels like attributes seems similar to columns, and we would then also be spending extra time duplicating things (which this is trying to avoid). However, could it work as an optional YAML file in the project config /similar with ‘attribute: [colalias1,colalias2,colalias3]’ style to have as a way to shortcut and propagate the documentation and meta tags? |
Beta Was this translation helpful? Give feedback.
-
Describe the Feature
Following from #4071, I called out in my comment that I believe metrics would be better established on top of "attributes" instead of building directly on "columns" and "models" abstractions in dbt.
As a foundation for deeper metadata understandings within dbt and to unify the documentation effort for existing dbt projects, we should first establish some type of ontological definition of what columns "mean", as they relate to an analytical framing.
Proposal
dimension_key
,dimension_property
,fact
, etc.as_of_date
marked as the primary temporal attribute for the table, this will inform how metrics calculations can be performed - and at what grain they are possible.)sales_revenue
, we could set anauto_map_by_name: true
property to find and map all references of the columnsales_revenue
, or we could explicitly map to models and column references.Benefits
Sample Code
Adapted from my comment on the related metrics topic.
Describe alternatives you've considered
Metrics: The alternatives for metrics is to create mappings directly over all tables and columns. The greater the number of tables of different aggregation levels, and/or projections for query optimization, the larger the redundancy of those metrics mappings will be.
Documentation: This is actually stemming from another inquiry I ran into a couple years ago: how to document all columns in all models, without having to put the same text description on every single instance. (And then, how to keep them up to date as you want to update how "sales revenue" is calculated on all of them.) As far as I'm aware, there isn't yet a good solution for this documentation problem, and so my general guidance has been "don't worry about descriptions on columns" - because it's just too much work and no single-source-of-truth to keep them in sync across tables.
Adding attributes would hopefully change that, since the many places the column exists, it will always carry (or link to) the same text description for users of the project.
Who will this benefit?
This would benefit teams who want column-level descriptions for themselves and their users.
This benefits users, because they can better documentation on columns, and better understand equivalency (or lack thereof) of similarly named column across a project.
Are you interested in contributing this feature?
Sure!
Anything else?
Prior art
Inspired by OLAP platforms and BI layers which support ROLAP capabilities:
Beta Was this translation helpful? Give feedback.
All reactions