The silo of a data lake

Partner Content

Data lakes hold enormous potential for manufacturers. Yet many organisations are not realising their true value because often only certain individuals know how to navigate them. In short, our data lake silo is human. Until this is addressed, organisations will continue to garner only minimal insights.

Unutilised data lakes

“Data integration is becoming less and less of a challenge.  The biggest pain point we have now is the ability to understand our data and make it useful.”

This self-recognition is quickly becoming commonplace for manufacturers and is the foundation for understanding why data lakes are vastly underutilised (if utilised at all).

Most manufacturers have a scarce group of subject matter experts (SMEs) that can effectively navigate their data lake.  The expertise of their SMEs has been learned through decades of experience and cannot easily or quickly be shared, creating a bottleneck to making data usable by the broader organisation.  In short, our data lake silo is human.

In a recent survey of digital leaders in manufacturing, over 75% of respondents identified Production Managers (93%), Quality (80%), Operations (76%), and Maintenance (75%) as intended data consumers in addition to SMEs (90%).

To empower all these data consumers, we need a more comprehensive approach to our data lakes. We must reduce the complexity between people and data and alleviate the demand on our SMEs.


Intelligence,(bi),And,Business,Analytics,(ba),With,Key,Performance,Indicators. Image courtesy of Shutterstock


Three core capabilities

Enabling your data lake to speak human and become usable across many data consumers requires three core capabilities:

Data Context

Contextualisation is the process of creating meaningful relationships between data sources and types. These relationships connect relevant data from assets or processes across a facility. For example, assets in your facility may have process variables, work orders, documents, and inspection data all residing in your data lake. Contextualisation establishes the relationships that allow users to access live process variables when looking at associated work orders.  This helps establish a data foundation that can be utilised by all data consumers.

Data Discovery

With context established, data must also be readily available for users to query and access. Users able to find the data they need can effectively use low or no-code solutions.  These solutions can be used to build a dashboard generating real-time insights that were previously only available intermittently through time-consuming excel reports.  Making data more discoverable removes the requirement for data consumers to understand the structure and naming convention of every source system that exists in the raw data format in a data lake.

Data Quality

A subset of data governance, data quality is key to ensuring data trust by the consumer. To quote Forrester as to the importance of data quality, “Data has no value unless the business trusts it and uses it.” In the context of operations, with thousands of time series data points updating at once per second (or faster in many cases), consumers must have an easy way to verify the quality of data driving recommendations and must be able to set and modify requirements based on their specific use case.  For example, a real-time performance monitoring solution will require stricter quality rules than a weekly production report.

Until these core capabilities are addressed, the broader organisation will struggle to realise value from data lakes and gaining even minimal insights will continue to rely heavily on subject matter expertise.

The fastest path forward

A common approach to fulfilling the above capabilities is with manpower, with organisations turning to SMEs or partners to manually contextualise data and manage metadata.  While this process is often successful during a proof of concept, manufacturers are quickly realising the scale required for an enterprise solution must be achieved through a more automated approach. This need for automation is why Industrial DataOps platforms, built specifically to address manufacturers’ data silo challenges, can be an essential complement to any data lake.


About the author

Cognite is a global industrial Software-as-a-Service (SaaS) leader, with an eye on the future and a drive to digitalise the industrial world.

To learn more about empowering your organisation to extract more value out of your data, download this IDC Technology Spotlight and/or check out Cognite’s on-demand webinar on Best Practices for Building an Industry 4.0 Platform.

*all images courtesy of Shutterstock