b026
b026 Scorching takes on what we b026 get unsuitable concerning the metrics b026 layer and the place it b026 suits within the trendy knowledge stack
b026 The metrics layer has been b026 all the trend in 2022. b026 It’s simply forming within b026 the knowledge stack, however I’m b026 so excited to see it b026 coming alive. Lately dbt Labs b026 included a b026 metrics layer b026 into their product, and b026 Remodel open-sourced b026 MetricFlow b026 (their metric creation framework).
b026 A couple of weeks in b026 the past, I used to b026 be fortunate sufficient to speak b026 concerning the metrics layer with b026 two most prolific product thinkers b026 within the house — b026 Drew Banin b026 (Co-founder of b026 dbt Labs b026 ) and b026 Nick Handel b026 (Co-founder of b026 Remodel b026 ).
b026 We coated every little thing b026 from the fundamentals of a b026 metrics layer and what folks b026 get unsuitable about it to b026 real-life use instances and its b026 place within the trendy knowledge b026 stack.
b026 Earlier than we start… WTF b026 really is a metrics layer? b026 Immediately metrics are sometimes break b026 up throughout completely different knowledge b026 instruments, and completely different groups b026 or dashboards find yourself utilizing b026 completely different definitions for a b026 similar metric. The metrics layer b026 goals to repair this by b026 creating a typical set of b026 metrics and their definitions.
b026 Drew and Nick dove extra b026 into this definition, so let’s b026 leap proper into all of b026 their insights and fiery takes. b026 We talked for over an b026 hour, so it is a b026 condensed, edited model of our b026 dialogue. ( b026 Try the complete recording right b026 here. b026 )
b026 How would you clarify the b026 metrics layer to a newbie b026 knowledge analyst?
b026 Because it’s a brand new b026 idea, there’s quite a lot b026 of confusion about what actually b026 the metrics layer is. Drew b026 and Nick lower by the b026 confusion with succinct definitions about b026 creating a typical supply of b026 fact for metrics.
b026 Drew Banin b026 : “The shortest model I b026 can consider is…”
b026 Outline your metrics as soon b026 as and reference them in b026 all places b026 in order that in b026 case your metrics ever change, b026 you get up to date b026 outcomes in all places you b026 have a look at knowledge.
b026 Nick Handel b026 : “The way in which b026 that I’ve defined it to b026 household and people who find b026 themselves completely out of the b026 house is simply, companies have b026 knowledge. They use that knowledge b026 to measure their operations. The b026 purpose of this software program b026 is mainly to make it b026 very easy for the information b026 analysts (the people who find b026 themselves accountable for measuring that b026 knowledge) to outline these metrics, b026 and make it simple for b026 the remainder of the enterprise b026 to devour that single right b026 option to measure that knowledge.”
b026 What’s the actual downside the b026 metrics layer is trying to clear b026 up?
b026 Nick and Drew defined that b026 the metrics layer is motivated b026 by two key concepts: b026 precision b026 and b026 belief b026 .
b026 Nick b026 : “I believe we’re all b026 fairly satisfied concerning the worth b026 of information. We’ve got every b026 kind of various, attention-grabbing issues b026 that we are able to b026 do with knowledge, and the b026 price of doing these issues b026 is pretty excessive. There’s a b026 bunch of labor to get b026 the information into the place b026 the place we are able b026 to go and do something b026 that’s actually attention-grabbing and helpful.
b026 “Why does this matter? It’s b026 imagined to make that complete b026 strategy of getting the information b026 prepared for that supply of b026 worth a lot simpler and b026 likewise extra reliable.”
b026 It comes all the way b026 down to these two issues: b026 b026 productiveness b026 and b026 belief b026 . Is it simple to b026 supply the metric, and is b026 it the suitable metric? And b026 might you place it into b026 no matter software you’re attempting b026 to serve?
b026 Drew b026 : “That’s actually good framing. b026 I simply look inwards at b026 our group. The very first b026 metric we ever created was b026 weekly energetic tasks — what number of b026 dbt tasks had been run b026 within the earlier seven days? b026 Now we’re about 250 folks b026 and we’re measuring so many b026 issues throughout the enterprise with b026 a number of new folks b026 round.”
b026 We’re attempting to guarantee that b026 when somebody says ‘weekly energetic b026 accounts’ or ‘MRR’ or ‘MRR b026 break up by handle versus b026 self-service’, b026 all of us imply precisely b026 the identical factor b026 .
b026 Drew and Nick additionally emphasised b026 change administration as each a b026 significant problem and use case b026 for the metrics layer.
b026 Drew b026 : “I believe a lot b026 concerning the change administration a b026 part of it. When you b026 get the suitable folks collectively, b026 you possibly can exactly outline b026 a metric at that time b026 limit. However inevitably your online b026 business or product will evolve. b026 How do you retain it b026 in sync in perpetuity? That’s b026 the laborious half.”
b026 Nick b026 : “I actually agree with b026 that. Particularly if change administration b026 is occurring when there are b026 just a few folks within b026 the room, and different people b026 who find themselves relying on b026 the identical metrics weren’t part b026 of that dialog.”
b026 How ought to we take b026 into consideration the metrics layer, b026 and the way ought to b026 it interaction with different parts b026 of the fashionable knowledge stack?
b026 Nick broke the metrics layer b026 down into 4 key parts b026 (semantics, efficiency, querying, and governance), b026 whereas Drew targeted on its b026 function as a community connecting b026 a various set of information b026 instruments.
b026 Nick b026 : “The way in which b026 that I take into consideration b026 the metrics layer is mainly b026 4 items. There are the b026 b026 semantics b026 : How do I’m going b026 and outline this metric? This b026 could vary from ‘Right here’s b026 a SQL snippet’ or ‘That b026 is the definition of the b026 metric’ to a full semantic b026 layer that has entities and b026 measures and dimensions and relations.
b026 “Then there’s b026 efficiency b026 . Nice, now I’ve this b026 semantic mannequin. How do I’m b026 going and construct logic in b026 opposition to it, executed in b026 opposition to some compute atmosphere b026 (whether or not it’s a b026 warehouse or only a compute b026 engine on a knowledge lake)?
b026 “Then there’s, how do I b026 b026 question b026 this factor? What are b026 the interfaces that I exploit b026 to drag it out of b026 the information warehouse or knowledge b026 lake, resolve it into this b026 quantitative object that I can b026 then go and use in b026 some evaluation. That features each b026 broad methods of consuming knowledge b026 (like a Python interface or b026 GraphQL or a SQL interface) b026 in addition to direct integrations b026 (a software that builds a b026 customized wrapper round a REST b026 or GraphQL API and builds b026 a extremely firstclass expertise).
b026 “Then the final piece is b026 b026 governance b026 . There’s organizational governance and b026 technical governance. Organizational governance which b026 means, does the finance chief b026 agree on the human-understandable definition b026 of income in the identical b026 manner that the technical one b026 that’s defining the logic defines b026 that code?”
b026 Drew b026 : “Simply to supply an b026 alternate framing: We will consider b026 it by way of the b026 expertise for the one who b026 desires to devour knowledge to b026 reply some query or clear b026 up some downside, after which b026 additionally the folks constructing the b026 instruments the place these people b026 are consuming the information.
b026 “It’s slightly bit at odds b026 with one another, as a b026 result of the enterprise shoppers b026 need to see the very b026 same metric in each single b026 software and so they need b026 all of it to replace b026 in actual time. So you b026 might have this big community b026 of various instruments that conceivably b026 want to speak to one b026 another. That’s a tough factor b026 to arrange and make occur b026 in observe.
b026 That’s why the concept that b026 we name this the ‘metrics b026 layer’ is sensible. b026 It’s a single abstraction layer b026 that every little thing can b026 interface with b026 as a way to b026 get exact and constant definitions b026 in each single software.
b026 “To me, that’s the place b026 metadata actually shines. Like, that b026 is the metric, that is b026 the way it’s outlined, that b026 is its provenance, right here’s b026 the place it’s used. This b026 isn’t really the information itself. b026 It’s attributes of the information. b026 That’s the data that may b026 synchronize all these completely different b026 instruments collectively round shared knowledge b026 definitions.”
b026 What metadata ought to we b026 be monitoring about our metrics, b026 and why?
b026 Nick and Drew shared that b026 metadata is essential for understanding b026 metrics as a result of b026 firms lose vital tribal information b026 about knowledge outages and anomalies b026 over time as employees adjustments.
b026 Nick b026 : “The metric is likely b026 one of the most constant b026 objects in a company’s life.
b026 Merchandise change, tables change, every b026 little thing adjustments. Even the b026 definitions of those metrics evolve. b026 However most companies find yourself b026 monitoring the identical North Star b026 metrics from the very early b026 days. b026 When you can connect metadata b026 to it, that’s extremely helpful.
b026 “At Airbnb, we tracked nights b026 booked. It was vital from b026 the very early days when b026 BI was actually a printed-off b026 graph that they placed on b026 the wall, and it’s nonetheless b026 an important metric that the b026 corporate talks about within the b026 public earnings calls. If we b026 had been monitoring vital metadata b026 by time of what was b026 occurring to that metric, there b026 could be a wealth of b026 data that the corporate may b026 use.”
b026 They defined that these adjustments b026 are why it’s essential for b026 the metrics layer to work b026 together with each the information b026 layer and the enterprise layer — to b026 seize context that impacts knowledge b026 evaluation and high quality.
b026 Nick b026 : “Airbnb had an enormous b026 product launch, and completely different b026 metrics spiked in all completely b026 different instructions. Immediately, I’m undecided b026 {that a} knowledge scientist at b026 Airbnb may actually perceive what b026 occurred. They’re attempting to make b026 use of historic knowledge to b026 grasp issues, and so they b026 simply don’t have that context. b026 If something, they actually solely b026 have context for the final b026 two or three years, when b026 there was someone who’s within b026 the enterprise who remembers what b026 occurred, who did the evaluation, b026 and many others.”
b026 Drew b026 : “There’s quite a lot b026 of this that finally ends b026 up being technical — by way of b026 how instruments combine with one b026 another, and the way you b026 outline the metrics and model b026 them. However a lot of b026 it’s certainly the social and b026 enterprise context.
b026 In observe, the folks which b026 were round for the longest b026 time have essentially the most b026 context and doubtless know greater b026 than any of our precise b026 programs do.
b026 “We had a interval the b026 place we had slightly bit b026 of information loss for some b026 occasions we had been monitoring. b026 It seemed like, I believe b026 it was, Might 2021 was b026 the worst month ever. However b026 actually it was similar to, b026 no, we didn’t acquire the b026 information.
b026 “How would you realize that? b026 The place does that data b026 dwell? Is it a property b026 of the supply dataset that b026 propagates by to the metrics? b026 Who’s accountable for encoding that?”
b026 What are the true use b026 instances for a metrics layer?
b026 Drew and Nick referred to b026 as out quite a lot b026 of potential functions for the b026 metrics layer — e.g. enhancing BI and b026 analytics for early-stage knowledge groups, b026 serving to enterprise and knowledge b026 folks use knowledge fashions in b026 the identical manner, and making b026 helpful however time-consuming functions (like b026 experimentation, forecasting, and anomaly detection) b026 doable for all firms.
b026 Drew b026 : “I believe a number b026 of the use instances round b026 b026 BI and analytics b026 are essentially the most b026 clear, apparent, and current for b026 lots of firms.
b026 Many firms on the market b026 will not be on the b026 knowledge science and machine studying b026 a part of their journeys b026 but. Issues that make enterprise b026 intelligence and reporting higher (extra b026 exact and extra constant) cowl b026 90% of the issues that b026 they’re attempting to resolve with knowledge.
b026 “Casting our minds ahead, I b026 believe that there might be b026 a ton of advantages to b026 leveraging metrics for b026 knowledge science b026 use instances.
b026 “Particularly, one of many issues b026 that we’ve seen folks do b026 with dbt that was actually b026 formative for me — they’d construct these b026 knowledge fashions after which use b026 them each for BI reporting b026 and likewise to energy knowledge b026 science functions and modeling. The b026 truth that the information scientist b026 and the BI analysts are b026 utilizing the identical knowledge units b026 signifies that it’s much more b026 possible that they’re consuming the b026 identical knowledge in the identical b026 manner. While you lengthen it b026 to metrics, there’s like a b026 extremely pure option to make b026 that occur too.”
b026 Nick b026 : “I do partly agree b026 with that. But in addition b026 there are quite a lot b026 of knowledge science and machine b026 studying functions that require very b026 completely different datasets than what b026 a metric retailer produces.
b026 “In b026 analytics functions b026 , you attempt to embody b026 as a lot related data b026 as doable. When you have b026 an ecommerce retailer, folks can b026 browse it logged out. So b026 that you attempt to dedupe b026 customers and determine as customers b026 log into units. There’s a b026 complete observe of attempting to b026 determine which entities are utilizing b026 your service. That’s actually vital b026 for analytics as a result b026 of it permits us to b026 get a a lot clearer b026 image. However you don’t need b026 to try this for b026 machine studying b026 , as a result of b026 that’s all data leakage and b026 that may break your fashions.
b026 With machine studying, you attempt b026 to get as near the b026 uncooked knowledge units as doable. b026 With analytical functions, you attempt b026 to course of that data b026 into the clearest and finest b026 image of the world.
b026 “One of many functions that b026 I all the time take b026 into consideration is b026 experimentation b026 . The rationale we constructed b026 a metrics repo initially was b026 experimentation.
b026 “There have been 15–20 folks b026 on the information workforce on b026 the time. We had been b026 attempting to run extra product b026 experiments, and we had been b026 doing every little thing manually. b026 It was actually time intensive b026 to go and take task b026 logs and metric definitions and b026 be part of them collectively.
b026 Mainly, we wanted some programmatic b026 option to go and assemble b026 metrics. It’s a massively helpful b026 software for firms that do b026 it, however only a few b026 firms have the infrastructure or b026 construct the tooling to do b026 that. I believe that that’s b026 actually unlucky. And it’s most b026 likely the factor that I’m b026 most excited concerning the metrics layer.
b026 “If you consider each knowledge b026 software as having some value b026 and a few profit — the extra b026 you possibly can scale back b026 the price of pursuing that b026 software, the extra clearly the b026 justification turns into to pursue b026 some new software.
b026 “I believe experimentation is one b026 in all these examples. I b026 additionally take into consideration b026 anomaly detection b026 or b026 forecasting b026 . These are issues that b026 I believe most firms don’t b026 do — not as a result of b026 they’re not helpful, however simply b026 because producing the datasets to b026 even get began on these b026 functions is de facto laborious.”
b026 Let’s leap into some questions b026 concerning the metric layer and b026 the fashionable knowledge stack.
b026 First, let’s speak bundling vs b026 unbundling. Ought to the metrics b026 layer even be a separate b026 layer, or ought to or b026 not it’s a part of b026 an present layer within the stack?
b026 As with each debate within b026 the knowledge ecosystem, we ended b026 up simply answering, it relies b026 upon. Drew and Nick defined b026 that how we clear up b026 this downside is finally extra b026 vital than how we outline b026 that answer.
b026 Drew b026 : “I’m not in love b026 with the way in which b026 that we as an ecosystem b026 speak about new instruments as b026 being layers, just like the b026 lacking layer of the information b026 stack. That’s the unsuitable framing.
b026 “People who construct functions don’t b026 give it some thought that b026 manner. They’ve providers, and the b026 providers can speak to one b026 another. Some are inner providers b026 and a few are SaaS b026 providers. It turns into a b026 community of linked instruments slightly b026 than precisely, say, 4 layers. b026 Nobody runs an software anymore b026 with precisely the Linux, Apache, b026 MySQL, and PHP (LAMP) stack, b026 proper? We’re previous that.
b026 The phrase ‘layer’ is sensible b026 solely insofar because it’s a b026 layer of abstraction. However in b026 any other case, I reject b026 the terminology, though I can’t b026 consider something too significantly better b026 than that.
b026 “The very last thing I’m b026 going to say on bundling b026 and unbundling… For this factor b026 to work, it does should b026 be an middleman between a b026 really massive community of various b026 instruments. Treating it as a b026 boundary like that motivates which b026 instruments can construct it and b026 supply it. It’s not one b026 thing you’d see from a b026 BI software, as a result b026 of it’s not likely in b026 a BI software’s curiosity to b026 supply the layer to each b026 different BI software — which is just b026 like the factor that you b026 really want from this.”
b026 Nick b026 : “I believe I usually b026 agree with that.
b026 Mainly, folks have issues, and b026 corporations construct applied sciences to b026 resolve issues. If folks have b026 issues and there’s a helpful b026 expertise to construct, then I b026 believe it’s value taking a b026 shot and attempting to construct b026 that expertise and voicing these b026 opinions.
b026 “Finally, I believe that there b026 are good factors there of b026 the connection to completely different b026 organizational workflows. This isn’t one b026 thing that I believe we’ve b026 executed job of explaining, b026 however I believe that the b026 metrics retailer and the metrics b026 layer are two completely different b026 ideas.
b026 “The metrics retailer extends the b026 metrics layer to incorporate this b026 piece of organizational governance — how do b026 you get a bunch of b026 various enterprise customers concerned on b026 this dialog, and really give b026 them a task in one b026 thing that, frankly, they’ve an b026 enormous stake in? I believe b026 that that’s one thing that b026 isn’t actually caught on this b026 dialog across the metrics layer, b026 or headless BI, or any b026 of those completely different phrases. b026 But it surely’s actually, actually b026 vital.”
b026 For a conventional firm that b026 already has a knowledge warehouse b026 and BI layer, the place b026 does the metrics layer match b026 into their stack?
b026 Once more, the reply is b026 that it relies upon — b026 sigh b026 . The metrics layer would b026 dwell between the information warehouse b026 and BI software. Nevertheless, each b026 BI software is completely different b026 and a few are friendlier b026 to this integration than others.
b026 Nick b026 : “The metrics layer sits b026 on high of the information b026 warehouse and mainly wraps it b026 with semantic data. It then b026 permits completely different endpoints to b026 be consumed from and mainly b026 pushes metrics to these completely b026 different locations, whether or not b026 they’re generic or direct integrations b026 to these instruments.”
b026 Drew b026 : “It finally ends up b026 being very BI software–dependent. There b026 are some BI instruments the b026 place it is a very b026 pure kind of factor to b026 do, and others the place b026 it’s really fairly unnatural.”
b026 If an organization has already b026 outlined a ton of metrics b026 inside their BI software, what b026 ought to they do?
b026 Nick and Drew defined that b026 sluggish and regular wins the b026 race while you aren’t ranging b026 from scratch. As an alternative b026 of planning an enormous overhaul, b026 begin with one workforce or b026 software, combine a greater metrics b026 layer, and check the way b026 it works on your group.
b026 Nick b026 : “I might advocate for b026 not an enormous ‘change every b026 little thing unexpectedly’. I might b026 advocate for, outline some metrics, b026 push these by the APIs b026 and integrations, construct one thing b026 new, probably exchange one thing b026 outdated that was laborious to b026 handle, after which go from b026 there when you’ve seen the b026 way it works and consider b026 in that philosophy.”
b026 Drew b026 : “I’m with you. I b026 believe one thing domain-driven makes b026 quite a lot of sense. b026 You possibly can validate it b026 after which broaden. I’d most b026 likely begin with… it is b026 dependent upon your tolerance, however b026 the government dashboard that goes b026 to the CEO. Is that b026 the perfect place to kick b026 the tires? Perhaps not. But b026 when it really works there, b026 it’ll work in all places.”
b026 Can’t a metrics layer simply b026 be a part of a b026 characteristic retailer?
b026 Since Nick has constructed a b026 number of characteristic shops and b026 metrics layers, he had a b026 robust opinion on this subject — whereas b026 the metrics layer and options b026 retailer are related, they’re too b026 essentially completely different to merge b026 proper now.
b026 Nick b026 : “I’ve a extremely robust b026 opinion about this one as b026 a result of I’ve constructed b026 two characteristic shops and three b026 metrics layers. These two issues b026 are completely completely different.
b026 “On the core, they’re each b026 derived knowledge. However there are b026 such a lot of nuances b026 to constructing characteristic shops and b026 so many nuances to constructing b026 metric shops. I’m not saying b026 that these two issues won’t b026 ever merge — the concept of a b026 derived knowledge repository or one b026 thing like that sounds great. b026 However I simply don’t see b026 it occurring within the quick b026 time period.
b026 Everybody desires options to be b026 particular to their mannequin. No b026 one desires metrics to be b026 particular to their workforce or b026 their consumption. Folks need metrics b026 to be constant. Folks need b026 options to be distinctive and b026 no matter advantages their mannequin.
b026 “Actual-time versus batch — it is a b026 tremendous difficult downside within the b026 characteristic house. Organizational governance is b026 manner vital for the metrics b026 layer. The technical definitions are b026 sometimes completely different. The extent b026 of granularity is completely different b026 for options — you go manner finer b026 with options than you do b026 metrics.”
b026 Do you consider a caching b026 layer is vital for a b026 metrics layer?
b026 This was a convincing YES b026 from each Drew and Nick. b026 Caching makes the metrics layer b026 quick, which is vital for b026 making certain that knowledge practitioners b026 really use it. Nevertheless, it’s b026 vital that this caching doesn’t b026 replicate knowledge.
b026 Drew b026 : “I believe that the b026 velocity with which you’ll ask b026 a query and get a b026 solution again is de facto b026 vital.
b026 The distinction between one thing b026 taking a minute plus to b026 return again and never coming b026 again in any respect is b026 negligible in quite a lot b026 of instances. So, conceptually, I’m b026 very aligned with the concept b026 of caching metric knowledge and b026 with the ability to serve b026 it up actually rapidly.
b026 “I’ll simply say — and I believe b026 we’ve been open about this b026 previously — we most likely received’t try b026 this for V1 of metrics b026 inside dbt. However conceptually, I’m b026 fairly aligned with that being b026 an vital a part of b026 the system long-term.”
b026 Nick b026 : “Caching is tremendous vital. b026 Efficiency issues a ton, particularly b026 to enterprise customers. Even 10 b026 seconds is lower than a b026 perfect expertise.
b026 “I believe that there are b026 two vital nuances to caching. b026 One is, what do I b026 do know forward of time b026 that I would like, and b026 the way do I pre-compute b026 that and make that actually b026 snappy? After which if I b026 do compute one thing, how b026 do I then reuse it b026 in order that it’s quick b026 subsequent time? I believe that’s b026 the level of a caching b026 layer.
b026 “The opposite one is, I b026 don’t assume that caching must b026 occur exterior of the cloud b026 knowledge warehouse or the information b026 lake. I believe that you b026 need to use these programs. b026 The replication of information, in b026 my thoughts, is simply so b026 pricey and so laborious to b026 handle.”
b026 Lastly, should you had been b026 handed a megaphone and will b026 blast out a message for b026 your complete knowledge world, what b026 would you say?
b026 Drew b026 :
b026 There are quite a lot b026 of issues in knowledge that b026 you could clear up with b026 expertise, however a number of b026 the hardest and most vital b026 ones you will need to b026 clear up with conversations and b026 folks and alignment and generally b026 whiteboards. Figuring out which sort b026 of downside you’re attempting to b026 resolve at any given time b026 goes that will help you b026 choose the proper of answer.
b026 Nick b026 :
b026 I believe the metrics layer b026 is mainly a semantic layer b026 with a further idea of b026 a metric, which is tremendous b026 vital. So I might simply b026 say, the metrics layer ought b026 to be backed by a b026 general-purpose semantic layer. The spec b026 and the definition of that b026 semantic layer and the abstractions b026 is so unbelievably vital.
b026 Aspect be aware: I’m personally b026 tremendous enthusiastic about how a b026 metrics layer can work together b026 with an b026 energetic metadata platform b026 to supercharge information administration b026 for knowledge groups. It’s been b026 tremendous thrilling to see the b026 metrics layer turn into extra b026 mainstream, which was a prediction b026 I’d made at first of b026 this yr.

b026 Be taught extra concerning the b026 metrics layer and my six b026 massive concepts within the knowledge b026 world this yr.
b026 Report: The Way forward for b026 the Fashionable Knowledge Stack in b026 2022
b026