The 12 Providers That Matter Most And How They Stack Up

Machine Learning Data Catalogs, Q2 2018

In our 29-criteria evaluation of machine learning data catalogs (MLDCs) providers, we identified the 12 most significant ones — Alation, Cambridge Semantics, Cloudera, Collibra, Hortonworks, IBM, Infogix, Informatica, Oracle, Reltio, Unifi Software, and Waterline Data — and researched, analyzed, and scored them. This report shows how each provider measures up and helps enterprise architecture (EA) professionals make the right choice.

MLDCs Are The Stepping Stone For The Intelligent Business

The four V’s of big data (i.e., volume, variety, velocity, and veracity) may be a cliché. But firms are still struggling under the weight of their data: 36% to 38% of global data and analytics decision makers reported that their structured, semistructured, and unstructured data each totaled 1,000 TB or more in 2017, up from only 10% to 14% in 2016. And the growth of data is outpacing organizations’ ability to get value from it. The two biggest challenges our respondents reported in using systems of insight were 1) merging existing business processes to source data to analyze it and implement insights and 2) sourcing, gathering, managing, and governing the data as it grows.

For EA professionals, relying on people and manual processes to provision, manage, and govern data simply does not scale. Enterprises are waking up to this fact and turning to data catalogs to democratize access to data, enable tribal data knowledge to curate information, apply data policies, and activate all data for business value quickly. Data catalog investment links to:

  • The number of data lakes. According to global data and analytics decision makers at organizations investing in, implementing, or expanding their data catalogs, they have more than seven data lakes across their enterprise
  • Prioritization of insights. In addition, 51% of our survey respondents at organizations that are expanding/upgrading their data catalog implementations said that leveraging big data and analytics in decision making was a critical priority
  • Competitive advantage of AI. Those expanding their data catalog implementations are more likely to mention the use of AI for product testing and innovation (new value-based offerings) rather than traditional customer experience and operational business scenarios.


Share content with colleagues by email