Effective data management with domain-driven and product thinking approach

The idea of “divide and conquer” and “product thinking” can lead to effective data management. Product thinking in data i.e. “data as a product” is a mindset and also does not necessarily need a full-blown data mesh

Prajwalan Karanjit

Published in

Towards Data Science

6 min readMar 19, 2021

Image from Pixabay under Pixabay license

Data as an asset

Data as an asset thinking has been around for a while. This thinking is intended to facilitate effective data management, utilization, and likely monetization of data. It provides a baseline for data governance. Some of the key activities within data management and governance are ensuring that data is timely maintained, properly documented, quality improved, and owned (by business).

Challenges for a successful data management

But how many of such data management programs actually become successful? Not many.

A company is recognized based on direct customer-facing value propositions. Its success is measured upon product sales and customer engagement. So the profit thinking and focus on creating differentiation in terms of richer offering is always the priority. While data is at the core of most products, it is often the outer layer of application functionality and user experience that shadows the core.

Implementing data management organization-wide rarely becomes successful. Value stream leads need to focus on value propositions. Product owners need to focus on ways to improve customer experience. While the objective of data management is to make such things easier, it ends up being an overhead that slows them down and creates unnecessary dependencies.

How to address this?

Addressing this requires us to take a step back (or perhaps several) and look at the information system as a whole. This is where the “domain” concept and “data as a product” thinking can help. For a reference to these terms see here, here, and here.

Data as a “domain product”

All aspects of classical data management and governance will be domain-specific. Each domain is fully responsible and accountable for establishing the data catalog, ownership, stewardship, quality checks, master data management for its data.

Data as a product (or domain product) is one of the fundamental principles of data mesh. Based on this principle, data mesh also brings the notion of “data product”. A “data product” in a data mesh context is an architectural quantum — very similar to a microservice in the application architecture i.e. the smallest possible unit that is still fully complete and delivers a particular value. But data products could be as simple as just “data”. Other elements of data mesh data products such as code, APIs, etc. are enablers and provides leverage.

Each domain should consider the data it is exposing as if it is a product. The “product thinking” for this data is what makes it product-like. Hence, data is still an asset, but it is so inside a domain. Outside, it is a product.

For each domain, the product owner and product team develop a merchant mindset. This means…

Just like a merchant maintains an inventory of products, the domain maintains a searchable data catalog.
Just like a merchant ensures that documentation and stock quantity information, customer reviews, etc. are updated and visible for a product, the domain ensures documentation (definition, schema, API endpoints, etc.), and other metadata for data are easily available for consumers.
Just like a merchant provides after-sales support for a product, the domain provides access to event logs, API call history, API logs, etc. so that downstream applications can debug and handle errors.
Just like a merchant launches a new product, particularly software updates for e.g., provide backward compatibility and rollback possibility, the API interfaces and event schemas should cater for API versioning and schema evolution.
Just like having a product owner, it should provide a data owner.
Just like having a product roadmap, it should also have a data roadmap indicating data, integration, storage, and architectural changes.

The idea is that in the same ways as a vendor feels accountable for the product, the domain should feel the same.

Classical product thinking is more than just end user needs identification, it is also understanding business drivers, competition, disruptors and hard costs. By thinking data as a product, we bring all such qualities into data management.

So, what is a domain?

A domain is some sort of bounded context. A particular business capability, consisting of one or several applications and databases, could be a domain. Likewise, a collection of related applications that belong to different business capabilities may also be considered a domain.

There is more than one way of separating bounded contexts. The main idea here is divide and conquer.

It is about establishing a path to least resistance. The non-invasive model to data governance is applicable to overall data management also. It is more effective with a domain-driven approach.

By delegating the data management responsibility to the domains, we have made them accountable and enabled them to implement appropriate and context suitable approaches to data management. Context suitable approach is important, because in an aspect that is vital some business domain may not be of same important in the other.

Instead of creating dedicated roles such as data stewards, recognize the existing people into such roles. It is in several cases already part of their job. So the work is to make this more formal and explicit by enriching the job descriptions with data responsibilities. People are part of domains and by doing so we hold the domains accountable for how they define, produce and use data.

A simple example of divide and conquer is a country. A country has several states or districts. And sure there are aspects of governance that apply across states but each state/district also has its own governance model.

Contrasts, challenges, and downsides

This is in contrast to having one data management model to rule all. Even data lakes could be domain-specific where each domain may have different techniques for data lakes and different deployment models (on-premise, cloud). This gives domains more autonomy and data exchange between domains is facilitated either by APIs or events when synergy is required.

Downsides with this are the duplication of data, applications, and infrastructures which may lead to increased costs. However, this cost and overhead should be seen as an investment, considering the overall benefit. There may also be some technical challenges when cross-domain insights are required.

The return here is highly manageable domains that scale, evolve and grow and perhaps can even be safely carved out or sunset.

Conclusion

Effective data management is not just about data. It is not a silo activity and we should consider both problem space and solution space as a whole. Every domain is fully accountable for it and the organization should mandate them likewise. Every domain should maintain and govern its domain data as a business asset, but with product thinking.