Self-serve Data Platform⚓︎

A well-architected infrastructure is necessary for developing, deploying, monitoring, and accessing Data Products. Managing infrastructure is not a trivial task and requires specialized technical skills. A high-level abstraction of infrastructure that reduces complexity and eliminates friction becomes essential in providing each domain team with the ability to autonomously own their data products. This is the core principle of self-serve data infrastructure as a platform, which is essential for enabling domain autonomy.

The data platform is an extension of the delivery platform, but the underlying technology stack to operate data products is different from operational platforms. For instance, different teams may choose to implement Data Products with different technologies, creating the need to provision and connect two different sets of infrastructure. This requires basic interoperability and interconnectivity that, before the advent of Data Mesh, would have been managed in a dedicated project. That model would have been a bottleneck and would have prevented many integrations between products.

To make analytical data product development accessible to generalist developers, a self-serve data platform needs to provide a new category of tools and interfaces in addition to simplifying provisioning. The platform must support a domain data product developer's workflow, which includes creating, maintaining, and running data products with less specialized knowledge. To achieve this, self-serve infrastructure needs to include capabilities that lower the cost and specialization required to build data products. A self-serve data platform provides access to scalable polyglot data storage, data product schema, data pipeline declaration and orchestration, data product lineage, compute, and data locality.

Last update: April 4, 2023
Created: March 30, 2023