In this latest episode of our podcast, Mammoth Growth EMEA President Stuart Scott chats with Yair Dovrat, Founder of Zaraz, a lightweight tool which makes any website 40% faster with a single line of code.
Yair shares how Zaraz began, explaining how they identified the problem of data accuracy in web analytics and decided to automate the solution. He and his co-founder Yo'av Moshe initially built a QA software for web analytics and third-party tools, but when they joined Y Combinator in the Winter of 2020, they faced challenges in selling it. Through customer feedback and coaching from YC’s then-managing partner Michael Seibel, Yair and Yo’av realized the need for better management and visibility of third-party tools on websites, and that’s when inspiration struck them.
When Yair and Yo’av were considering what they wanted to do next with Zaraz, they saw an emerging trifecta that they could address:
- In late 2020, Google announced their plans to rank websites based on their core web vitals like page load speed.
- At the same time, Google floated the idea of penalizing slow websites by, for example, displaying a warning banner when you loaded the page. Though this latter action never came to pass, both of these points made marketers, web developers, and backend engineers eager for ways to quickly improve their core web metrics.
- Meanwhile Google Tag Manager and tools like Segment were making it easier and easier to add more tools to your tech stack with less coding, there was no solution available to offload this growing proliferation of third-party pixels.
Yair and Yo’av realized they position Zaraz as an alternative to Google Tag Manager, one that would be faster, more secure, and privacy-safe. By pivoting the value of Zaraz in this way, they positioned the company to be acquired by Cloudflare in December of 2021.
Zaraz works by loading a website’s third-party tools in the backend instead of in the browser. Since users can customize Zaraz to only pass the data they absolutely need from Google Ads and other third-party pixels, sites can easily reduce security risks by minimizing their surface area to outside attacks.
Towards the end of their conversation, Yair shares with Stuart the importance of standardizing tracking implementation, and the adoption of common APIs to ensure data consistency and trust. Yair concludes by sharing some of his future plans for Zaraz, including privacy-related features, data loss prevention, and expanding the capabilities of managed components in the years to come.
Stuart Scott (00:05):
Welcome everyone to the podcast. I'm here with Yair today who is the founder of a really exciting product called Zaraz, which is now a part of Cloudflare, and he's building really an alternative to Google Tag Manager that is faster, more secure, and more privacy safe. And he's also got some really interesting perspectives on the marketing technology space as a whole and where things are heading. And so I thought it'd be really interesting to have a bit of a conversation about that and share some of your thoughts with our audience. And so I guess the best place to start might be your founding story for Zaraz, what the problems you saw, how you came across them, and then how you decided to solve them.
Yair Dovrat (00:54):
Cool. Sure. First of all, thanks for having me. I guess the story begins, so I was a web analytics consultant for many, many years before starting Zaraz. And when you have my co-founder and I started working together, we were obviously looking for or a big market to tackle and problems we're very familiar with and he is obviously the expert of everything Web and I, in my previous work as a consultant, I noticed patterns of issues throughout my entire customer. So I was working with the leading internet companies in Israel, mostly consumer businesses, so like e-commerce publishers, sites that bring a lot of traffic. And I was dealing with implementing GTM and the entire Google Analytics stack and Google Apps, but also Mixpanel and the rest of the third-party analytics tools.
And the problem we actually started with is data accuracy. So we noticed what everyone knows, that tracking breaks all the time. So you have duplicate pages all over the place. You have, sometimes you release new versions of your website and you forget to add, I don't know, at the time it was GA track, then it was data layer pushes. Now it's a Zaraz track. You forget to add the event to the button and suddenly you are not even aware and your analytics, it is just inaccurate and you're spending billions of dollars on marketing in the wrong channels because you don't measure the results correctly. And we all know this story. So at the time I went to my boss and I was like, yeah, listen, I see these patterns all over the place. We should automate the solution. And he said, no, our job is to be consultants.
We sell hours of work and not products. So I was like, maybe I'll build it myself. And then you have and I actually started building a QA software for web analytics and third-party tools. And with that software that was really basic at the time, it used to crawl your website, detect all the third-party tools on the page and check for the most common errors like duplicate events, duplicate pages, missing events. And you could also record a flow. And if we would see in the network that I click button and the event is not sent to Mixpanel, for example, we would alert you on that. So with that software we joined Y Combinator, the Silicon Valley startup Accelerator. And despite us thinking it's a great product, we couldn't sell it. We had a few customers, but the YC partners' expectations were massive. They were like, you need to get to 1 million ACV really, really quick.
And I was like, I can't sell for 500 bucks, so how can we do that? And I think we had something like 130 demos in one month and we realized that we have serious issues with that QA software that people don't want to buy it, just because no one wants to deal with another GA bug. It's something despite the fact that head of analytics or the marketing departments of companies really liked the idea of having more governance and clean data, the fact that you just create more bottleneck on the engineering team wasn't something very appealing to customers, let's say. So we started with the YC partners and they were saying like, okay, so that obviously doesn't work. What else did you hear? What feedback did you get?
And we sat down and we're like, what did we hear? And then we noticed that, as I mentioned, our crawler used to check every page on a customer's website. And we used to come to demos with a scan report, and one of the reports was just a basic coverage report. So we would show you all third parties detected on your website and the pages where they exist, the pages where they're missing, and the percentages of that. And when people saw this report, the first reaction was first of all like, whoa, we have so many third-party tools. We were scanning websites with sometimes 70 to 120 different third-party tools. So first they didn't know they're using so many third parties. And I think this is really important because what we realized is that because there is a third-party solution for different teams, so you have the product team with their product analytics stuff or session recording or analytics stack, and you have marketing with their conversion pixel and you have advertising with all the ad performance stuff, no one really knows what's end up, what we are actually loading on websites. So I think the first reaction was like, okay, there's definitely lack of management. No one understood what's going on. Sometimes you have legacy tools that are there from the beginning of the company and it could sit there for 20 years despite the fact that no one used it
Orders it, right? Because it's such a big job and often such a mess that it's just a really painful task for someone that ultimately most people don't recognize as important or reward in the right ways.
Yeah, exactly. First of all, people, they change roles, so they come and go and then someone is leaving the company with all the knowledge, usually one or two person that deal with a specific implementation of a tool. So then the company is left with no internal knowledge on how exactly they want to deal with it. Also, I think most developers really hate those kinds of tasks. It's not an intellectual stimulating task to add or remove third-party events most of the time. So it's kind of get neglected and people didn't realize at the time that the performances and the security risks are that big, which we helped educate the market. So now I think it's clear and clear, but yeah, anyways, people were seeing this report and they were like, oh my god, we use so many third-party tools. And then the second and third questions were like, how does it impact our page speed and loading times? What's the impact on SEO? What happens if one of the tools has been hacked?
And when we sat there, we realized that, oh yeah, that happened again and again, many people asked about it. And at the time, our partner was Michael Seibel who was the CEO of Twitch and the CEO of YC. And he gave us the task of do that. Do you know if third-party tools are slowing down the website? They were like, no. And so he told us, okay, take eight hours and figure it out. So this is what we did. We spent a night basically crawling the entire web. So we were targeting the top websites in the United States. I don't remember how many we actually scanned, but it was in the tens of thousands. And what we did is a simple bot that will load the website that it is a few times and then it'll block all third-party requests just to get a simple understanding of if we compare the same website on production with or without third-party tools, what's the impact?
And then we realized the impact is huge. The average slowdown caused by third-party tools was like 40% specifically, we're not talking about third-party, I don't know jQuery, we're talking about third-party marketing and analytics tools mainly also widgets and chatbots, which we also solve for today. But all the tools that we copy and paste our website or we put through Google Tag Manager. And I think three important things happened at the same time. That's kind of got us to realize there's a massive business here. One, it was day year, I think it was winter 2020 where just before Covid hit where Google announced that core web vitals are going to be used to rank your website on the Google results. And so the reason why customers were the first question or the first reaction to the third-party list was, does it impact my SEO was because Google were pushing their advertisers, their customers to start dealing with core web vitals and improving their metrics.
Interesting actually over the last few years that all of those vendors have started to provide more options to do things server side. So whether that's Segment offering more things in cloud mode or whether that's Google Tag Manager launching a server side version, there's definitely a trend in the industry to try and move stuff out of the browser for I guess all the reasons you just talked about.
And I must say I'm pretty proud in our team, I think we led in the innovation front. I think we figured out some things early and I'm happy that they were picked up by other teams as well. For example, the way we dealt with the conversion pixels, because again now we see more and more advertising platforms or the social media platforms opening their APIs. So we have conversion API for Facebook, and now we have it for Snap and for TikTok and for all of these. But at the time, most of them didn't have those things and they were still using third-party cookies. So if you use third-party cookies, it means the request must live from the end users' browser, otherwise you won't be able to retarget them with the ad campaigns or to even measure the performance well. And so I think we were the first and the credit goes to the CTO to have to, basically what we've done is we were building the end requests on workers on server, which is not really a server.
Yeah, that's interesting. And do you think, I think you were in an interest, actually, sorry, we should probably go back and talk about what Zaraz actually is. What's the product you're actually building? I know I used it very briefly, but I'm sure it would be better in your words.
So from the user's perspective, it's very similar to the flow of Google Tag Manager server side where you basically create the third-party tool, you define the actions you want to take, the triggers that should trigger them, it’s I would say, more oriented towards developers who try to keep it super, super flexible and also Zaraz behind the scenes using a technology we call manage components, which is an open source project that the community is more and more engaged with, which is really cool to see. So yeah, you can theoretically build any tool with Zaraz and load with a customer manage component today.
And I think it's, the other thing I think's really interesting is every time I log into a new Google Tag Manager container, I'm surprised by the amount of mess I see that often things have been implemented by different people at different points in times with different sets of standards and naming conventions, but also often there's just a huge amount of duplication, right? Because ultimately you can create a single trigger for each event and then trigger multiple tags off that. But way too often we see people creating one trigger for every tag. And then I think the very nature of custom HTML tags, which still despite templates being available now still seem to make up the majority of tags out there, means that you're often replicating the same business logic and the same, I guess, code across multiple different destinations or even for multiple different tags for the same destination. And so you end up executing ultimately way more code than you need to. I guess that's one of the big problems you're trying to solve.
Yes. So we definitely wanted, and I hope we did manage to improve it, even if a tiny bit, we wanted to offer a much better user experience that will also enforce more governance and more consistency. So it starts with basic design decisions. Like in Google Tag Manager for example, in many containers, for every combination of event and tool, you have a different tag. So you would have Google Analytics event this, Google Analytics event that, so the list just becomes enormous and it's really hard to keep track. So most basic design decision at Zaraz to begin with was like, let's carry everything under a tool. So in Zaraz, even the first list you see is much easier to grasp. I think it's easier to allocate a role in this tools table to a specific, either tool or even if you have multiple instances of the same tool to a specific domain or So it starts with those things.
But I think the approach we were trying to take is, and this is why I said it's aimed more towards developers, is unlike GTM that I think it's kind of a hack. So GTM is like a parenthesis in the code. You want to let marketeers do the developer work. In a way, this is why it was created because there was more and more need for marketing teams. Hey, please change this pixel. Hey, please add this conversion tracking. So engineering teams were like, guys, that's too much work. Here's the tool for you, just manage it yourself. But this creates a lot of issues. So we were trying, for example, we do not use the data. We have data layer compatibility mainly because most of our customers are big enterprise customers and they're shifting from GTM, and so we don't want them to redo the entire code base and change every data layer push, but we really took a different approach.
That decision to go open source is a really interesting one because I mean ultimately you're building a proprietary product that you're trying to sell, but I think also open sourcing the standard probably changes the way in which you build that and the way in which hopefully it's adopted. Can you talk a bit more about that and why you went down that path, I suppose?
Yeah, I'll be very honest about it. I think the main reason we went, so it's a combination of this was all when we started our mission was to make the web faster and more secure and more private, and we really believe in it. So open sourcing, it's really in our nature and character. And I think also at Cloudflare is in general. So to achieve that, and we also face an issue. So there are thousands of third-party tools and there are always new third-party tools and we can only do so much. And unlike GTM where most people just load custom HTML script, so you can use it with any vendor, we wanted to achieve something bigger. So the decision to use the community help was quite obvious and we're really happy to see that it's been picked up and now we see vendors just releasing their own managed components.
And for them it's also great because it's a way to get to more customers easily. We usually add them, there's some standards you need to follow, but we can also add them to the library of tools we offer. So the adoption is very easy, but again, we never want to lock customers. So the idea of open sourcing was also, it's a mutual efforts of the internet now to kind of change this Wild, Wild West of how third parties are, what third parties are doing on website. I think the best thing about managed component that it's a little bit like after when you download an app, it asks for permissions. So managed components have to get permission to do things. So it's by design much more secure and better for everyone. And so yeah, it's a mutual effort. And with we also did not worry that much because we know our edge and our edge is on the edge because really it's very hard to compete with the level of service we can give just for the fact that we have the widest network around the world and Zaraz runs on every data center Cloudflare has.
So it's like, I think today more than 280. So we will win by just being the best component manager that loads faster, that gives customers the easiest and smoothest experience, but we would be really happy to see others adopting managed components. And we started seeing actually, so it's kind of cool.
Is that individual companies adopting managed components within their own stack, or is that other tool providers offering as a service?
Both. So there is this scenario of a customer that only uses, for example, GA4. So paying for the entire component like Zaraz solution, it doesn't necessarily make sense to them. So usually they use Cloudflare Workers, but only to load one specific tool and they have quite a simple setup. So we've seen some of this, and actually I don't think we've seen companies writing their own managed components. And some people use Web CM, which is something we built an open source, so people will have an alternative to Zaraz to large managed components that is free and unrelated to us. We try to maintain it, but I think some people contributed to that project as well, but no one yet built a full-blown competition or competitor for Zaras.
Yeah, that makes sense. Yeah. And then one thing you touched on a bit earlier is the fact that you're offering common APIs for things like e-commerce events and track events. And I think that's actually a really important point, not just because it makes life easier for developers and it makes it easier to maintain an implementation, all these third-party tools. But I think also for the implications it has for the data downstream in those tools, because one of the most common problems we see is that customers have adopted lots of different tools and they've implemented them all slightly separately, or maybe they've implemented them in the same way to track the same things, but gradually over time, that drifts as tool one implementation's updated or one is done at a later point in time and gain some additional data or maybe use a slightly different trigger.
And that creates these, I mean, ultimately that creates data silos or inconsistencies between tools, and that then starts to erode the trust that people have in the data. And I think that's something that's really easy to underestimate because you think actually it's not that hard to keep them consistent, but the second you have multiple core paths, every time I've seen it happen, it starts to drift over time. And ultimately once people stop trusting the data, they stop using it or they start spending more time checking it, than they do actually using it for what they're intending to. And I hope that lots of the people who do adopt Zaraz actually do adopt those common APIs and use them as their primary source of truth and don't just migrate whatever mess they have in a legacy tool into a new product. Because whilst they'll get some benefits from doing that, right, they'll get the page speed, for example, and they'll get some security benefits. They won't necessarily see that long-term improvement in data quality and trust that actually will accelerate their ability to use data, get more people across the organization using data, and ultimately get them a lot more value from all the work they're putting in to track things in the first place.
Yeah, yeah, that's a fair point. I think standardizing how you track is probably the most critical point. If you have this, figure it out, then the rest is easier. It's got to be much easier to maintain accuracy across tools and sometimes on the same tools. I think all of us owe Segment some kind of, I think they were the first to hack this with analytics jss, which is also open source. But yeah, this idea comes not from us, it comes from Segment. I think one thing that we contributed to it, again, because Zaraz natively works with other cluster products, cool thing I started seeing is because, okay, so you can do two things with Zaraz. You can just have your own custom HTTP tool, sorry, just an HTTP request. So what we started seeing is people use the Zaraz APIs as the infrastructure for both third-party implementation, but also their first -arty data collection.
So imagine you could, I don't know, you implement all the e-commerce funnel steps with our e-commerce API, and you can pass the payload with a simple HTTP request to your Cloudflare Worker or to any other endpoint over the internet. So you get the same context, the same data. And we've started seeing people building complete BI tools with it, because with workers, you can run whatever transformation you want. We've seen cases where people take this data and from workers, they reach out to, for example, Salesforce to enrich the data and then save it on their own data warehouses or data lakes. So yeah, it's pretty cool. So it's just I think another layer of, it's not only standardizing third-party analytics usage, sometimes we see people just rely on the APIs for their own data, which I think is very powerful.
That's really interesting. And I think you're right, again, creating that consistency between what you're doing in your data warehouse and your BI environment and what you're doing in your marketing technology tools or knowing that the event that goes to Google Ads is the same event that goes to a data warehouse, and then ultimately if the two don't add up, it's because the data's being processed in a different way, not because you've got some bug in collection. That means that one tool is getting, Google is seeing three times as many conversions as you are or whatever it is. And those, again, it's like the time that those things eat up and the way in which they prevent you from making progress because ultimately you're questioning the data more than you're using the data.
And I think it's also the flexibility that developers have now. So it becomes with Workers, for example, to build what's called today reverse ETLs where you grab data from a third-party and reach your data within it and save it no matter on a different third-party or your own data, it's just super easy. You can really build it in a few minutes with Workers. It's quite powerful. So yeah, that's cool to see. I think it's a big improvement to all of us.
And what's next for Zaraz then what you do next?
Oh my god, yeah, there's so much to do. Honestly, I think that, so we are thinking of a lot of privacy related features. We take it very seriously, actually a story. So when French CNIL, the French data authority, they had some kind of a ruling and guidance onto how Google Analytics is noncompliant unless you load it this way. And then they kind of, yeah,
Because they're exporting data to the US, right?
Exactly. Yeah. So they came with a bunch of requirements that if you want to load Google Analytics, you need to follow to be considered more like reliable and compliant with their perception of GDPR and all the regulations. So we sat down over a weekend and we said, okay, we have to give our users just a toggle on solution for each one of the requirements. So you cannot pass the, for example, the IP address. Then with the toggle on, now you can hide the originating IP address of your end user from Google Analytics and Facebook and other tools. We clean automatically the user agent from, for example, browser versions that in combination with other things can lead to a specific user. So we really are trying to make privacy enforcement technical issues easier for our customers, but there is a lot of space to grow there. So we're thinking about building what we call the data loss prevention feature for Zaraz, where basically you can always scan the entire payload for specific rules. For example, never include an email address in any request going to a specific tool and more custom stuff. So that's one big thing we would want to work on. And a lot of enterprises are concerned about now more and more,
Yeah, yeah. Well, when you load someone else's script, you would never have a way to ensure that, right? Because also not loading the actual script you are loading, you are basically requesting a remote resource that could change every moment, but your snippet won't change, so you wouldn't even be aware of it. So yeah, that's definitely a big, big thing and I'm really looking forward to releasing that. Another domain we're investigating now is allowing more actions with Zaraz. So when I say actions today, an action could be like, I don't know, send data to Mixpanel or Amplitude or Google Analytics. It could be load something on my website, like the script. It could be sent an HTTP request, but it could also be maybe trigger an email or we're working on something. We post Zaraz embeds. So for example, when you load a news article, you have those Twitter embeds or social media embeds.
These are also third-party scripts, and when you load the page, you're actually requesting all the media, the images, the text. So that creates a lot of slowdown. And as part of Cloudflare also have this amazing ability when we proxy a website to just inject code into the HTML. So we're working on basically rendering those social media posts, for example, on the edge. And it'll work in a way where you tag the place on the HTML where you want Zaraz to load them, and the entire rendering will happen on Cloudflare, so not on your end user's device, which is pretty cool. So this is another area we explore Zaraz embeds and basically how to deal with things that you actually see on a website that belong to third-party tools, widgets, embeds, things like
Things like live chat as well and other, yeah,
This you can also already do with match components, so you can load assets, CSS and stuff with managed components today. But yeah, more complex stuff around the things you actually see on the site. And I honestly think we have a lot to improve in making configuration easier. I think the latest development in AI is pretty promising in that front, and I'm curious to see what the team will come up with regarding this in the future.
Cool. Well, I think we're a bit over time anyway, Sue, I think thank you again for joining us on this today. I think it's been a really interesting conversation and yeah, I've really enjoyed it.
Yeah, me too. Pleasure to be here. Thank you very much.