The success of your business depends upon how you acquire, store, and analyze customer data. And at the center of this decision is the CDP (Customer Data Platform). There are two different types of CDPs, each with their own nuances. Understanding these different platforms can have huge ramifications for what you’re able to achieve with your customer data, and how quickly you’re able to do it. In this article we’ll compare and contrast Packaged and Composable CDPs, so you can decide which type is right for your business.
What is a CDP?
A CDP is a centralized, unified customer database that is accessible to other systems. A CDP is composed of three major components:
- Data capture and ingestion - from both your own sources (your app and/or website), as well as 3rd party tools (such as ad networks, your CRM and other SaaS tools)
- Storing, modeling and transforming the data - Data captured and ingested in stage 1 is stored, cleaned and combined to create a single customer profile
- Sending data to downstream tools - this structured data is then made available to other systems and tools, such as your product analytics tool, your email marketing tool, or your data warehouse.
There are two different types of CDPs: Packaged and Composable. And they manage these three components in very distinct ways.
What is the Difference Between a Packaged CDP and a Composable CDP?
Imagine you’re looking for a new wardrobe for your bedroom. A Packaged CDP is the equivalent of a pre-built wardrobe that you purchase “as-is” from a furniture store. All you have to do is move it into your bedroom, put all your clothes inside, and you’re done. On the other hand, a Composable CDP is like a wardrobe that you need to design and assemble yourself. It requires more work, but you’ll get exactly what you want, something you can’t find in stores.
In the context of a CDP:
- Packaged CDP: A Packaged CDP aims to manage all three of these processes listed above (data capture and ingestion, modeling and transformation, sending data to downstream tools) for you. To do so, they store, model and transform captured data themselves before sending it on to other tools.
- Composable CDP: A Composable CDP is a set of integrated tools (internal, open-source or SaaS) that are assembled to perform some or all functions of a CDP. Composable CDPs use the company’s data warehouse to store, model and transform captured data.
There are two big differences between Packaged CDPs and Composable CDPs:
- Where does the data modeling take place? In a Packaged CDP, it is the vendor (e.g. Segment/mParticle) that handles the data modeling and transformation required for downstream destinations. In the Composable CDP, these functions are centralized in the data warehouse. At this point it’s important to note that most companies that decide to go with a Packaged CDP also maintain a separate data warehouse.
- Number of tools - in the purest sense a Packaged CDP is a single tool for all of your CDP requirements, whereas a Composable CDP has different tools for each function.
While the two types of CDPs are distinct in terms of how they capture, store, and model data before sending it to downstream tools, we often see companies deploy a hybrid approach for multiple reasons:
- Most companies want to own their data, so even if they go with a Packaged CDP, they will still replicate data into their own data warehouse
- Packaged CDP vendors may not be able to serve all of the use cases a company has (more below)
How do I choose between a Packaged CDP and a Composable CDP?
The decision comes down to a range of factors:
- Simple v. Extensive
- Total cost of ownership
- Data flexibility
Simple v. Extensive
Even from just a quick glance at the diagram above, you might wonder, “Why would anyone want a Composable CDP when a Packaged CDP looks significantly simpler?” After all, with a Packaged CDP, you can access most CDP features, and you’d only have to implement and manage one tool.
But in our experience, it’s rarely so cut and dry. You may have some complex use cases which cannot be addressed with any single Packaged CDP. In this case, you could use a Composable CDP or a hybrid approach. Some examples we have seen:
- B2B companies want to convert sales funnel steps, such as ‘Sales Accepted Opportunity’ or ‘Marketing Qualified Lead,’ into analytics events to send to their downstream tools
- E-Commerce companies want an easy way to keep data objects such as product catalogs up to date in ad platforms
- B2B SaaS companies want to model product usage data and then send this to their B2B CRM (e.g. Salesforce) so their sales team can see how potential customers interact with the product
These are just a few examples where a Composable CDP can give you more flexibility and unlock new use cases. It is important to consider what is relevant to your business so as not to add unnecessary complexity.
Total cost of ownership
The total cost of ownership of your CDP stack is a function of:
- Cost of tools - Subscription/usage fees that you pay for each element of your CDP
- Operational costs - The people cost (typically engineering and data engineers) required for you to operate your CDP stack (note: the average Data Engineer costs $125k in the US)
Packaged CDPs - and the components of Composable CDPs - aim to make usage and maintenance as easy as possible (i.e. minimize operational cost). They typically charge usage-based subscription fees for this service, since it will save the company from paying for additional data engineers.
For businesses with standard use cases that can be solved out of the box, a Packaged CDP is often the cheapest way to go. This is because moving to a Composable CDP requires a higher operational cost to handle the modeling and transformation of data.
However, for larger companies with more complex use cases, the cost of additional engineers does not scale as fast as the cost of usage-based subscriptions. And at some point, companies will look for cheaper alternatives to satisfy their needs.
It is important to note that this is true for both Packaged CDPs and Composable CDPs. All elements of a Customer Data Platform can be built from open source alternatives (e.g. Apache Druid for data warehousing, Airbyte/Apache Airflow for ETL, Snowplow/Apache Kafka/Rudderstack for event streaming). While many of these tools offer free versions, building and maintaining this tech stack requires data engineering skills, and has significant operational costs associated with it. This is why a fully open-source Composable CDP is typically only done by the largest companies, such as AirBnB.
As total cost of ownership is a function of tool costs and data engineering salaries, again we see a range of options in practice:
- Packaged CDP only: small companies with limited tech resource and use cases that can be completely fulfilled by a CDP’s out-of-the-box features
- Packaged CDP, data warehouse + some composable elements (e.g. Reverse ETL tool / FiveTran): mid-large size companies with greater tech resources, and more complex use cases not satisfied by a Packaged CDP
- Fully composable/open-source: Large companies with the most data engineering resources and the highest levels of data transparency are more like to construct a Composable CDP from fully open-sourced and self-managed tools
Total cost of ownership is crucial to consider. Moving from 1 to 3 requires significant data engineering resources and change management.
Data flexibility is often touted as one of the main reasons to go for a Composable CDP. If customer data is centralized in your data warehouse, then you can enrich it with any other first party data. Examples include offline sales data and outputs from data science models. If you have a Composable CDP, you have many more options to augment these data before sending them to downstream destinations. In addition, a Composable CDP lets you send data to downstream tools in a variety of different formats.
Packaged CDPs traditionally focus on event-clickstream data. This constraint limits what you can send to downstream tools with different data models. One of the top examples here is Salesforce, which has an object-based data model.
The inability to flexibly do this ‘reverse ETL’ (i.e. sending data from your warehouse to downstream tools) was what gave Hightouch and Census their initial customer base.
With reverse ETL and data activation becoming increasingly commoditized, the battleground for data flexibility becomes two fold:
- Ease of data modeling
- Real-time data activation
Ease of data modeling
While leveraging multiple data sources can unlock new use cases, there is added complexity in tying these data sets together. This is why native reverse ETL providers are rapidly diversifying their product offerings to make data modeling more straightforward:
- Census has Entities which makes it easier to define relationships among different data sources
- Hightouch has Customer Studio, Identity Resolution and Match Booster that aim to make data modeling and activation easier for end-users
Real-time data activation
The issue with activating data from your warehouse, is that there will always be a delay. This is because data must be sent from the originating source to your data warehouse where it is modeled, before it’s sent to its downstream destination.
This can be problematic in situations where real-time data is required, such as trigger-based email marketing campaigns. For example, when a user signs up for your service, you might want to immediately send them a welcome email describing what they can do next to find value in your product.
Most Packaged CDPs were built from the ground up for real-time data collection, whereas reverse ETL providers are largely batch processors of data. The need for real-time insights is often at odds with the Composable CDP model, since data needs to be sent straight from the event creator to the activation destination.
Real-time data activation is also a determining factor in time-to-value. Sending batched data enables companies to activate historic data in their warehouse quickly and easily. On the other hand, real-time data relies on the tool being present to capture and send the data to downstream destinations.
As we mentioned earlier, Packaged CDPs have traditionally been associated with event-clickstream data. Setting up these events takes significant engineering and planning resources. Often when comparing Packaged CDPs with Composable CDPs, Packaged CDPs are deemed slow-to-implement because of this.
However, implementing this first party data capture is an important component whether you use a Packaged or a Composable CDP - the data must exist in your data warehouse if you want to reverse ETL it into tools!
Packaged CDPs offer replays of historical data as they store it. And with Composable CDPs, you can batch send this historical data into downstream tools. At a high level, neither option offers a significant advantage in time-to-value, although this may change based on your goals and tech stack.
The idea of "zero copy data" frequently comes up when we're looking at enhancing the movement and processing of data across various systems. The principle suggests that data is only stored in a single place, with the goal of boosting efficiency, minimizing resource consumption, and improving privacy for end users.
As most companies have a data warehouse, Composable CDPs often use “zero data copy” to signify the unnecessary duplication of user data a Packaged CDP requires to operate. There are two issues with this:
- In practice, data will always be duplicated across systems. In order to activate user data you must first send it to your downstream destinations like ad platforms and SaaS tools.
- Another argument we often see is that you’ll wind up paying ‘double storage’ costs for a Packaged CDP + data warehouse. However, storage is cheap in cloud warehouse billing. The real costs come in with transforming and modeling.
The argument should center less on “zero copy” (as we believe this is not feasible), but instead on data copy minimization. In this case a Composable CDP, with the data warehouse as its source of truth, comes out on top.
Nonetheless, the siloed resources and multiple vendors to manage in a composable CDP could potentially increase the risk of errors and create challenges in maintaining data privacy in tool integrations. These are key factors to consider as you explore the right type of CDP for your business.
Before You Take the Next Step, Weigh the Nuances of Different Types of CDPs
The choice between a Packaged CDP or Composable CDP largely hinges on your company's specific requirements, technical resources, and long-term goals. Both solutions come with their pros and cons, and understanding these differences can influence the success of your customer data management strategy. The hype around Composable CDPs comes into sharper focus when you realize that smaller organizations with standard use cases and limited technical resources might be better served by Packaged CDPs. Composable CDPs tend to offer larger, technically-savvy organizations more flexibility and control.
CDPs are at the center of everything we do at Mammoth Growth. We are partners with all major Packaged CDP providers as well as most of the major players in the Composable CDP space. Contact our experts and let’s talk about your customer data goals.