DataGEMS

Data Discovery Platform with Generalized Exploratory, Management, and Search Capabilities

A growing number of open datasets from governments, academic institutions, and companies bring new opportunities for innovation, economic growth, and societal benefits. From real-time to historical data, from structured data in tabular form to unstructured text, images or videos, data is highly heterogeneous. Moreover, its volume and complexity create a “needle-in-the-haystack” problem: it is extremely challenging and time-consuming to discover, leverage and combine data within this expanding sea of data.

We are proud to be the coordinator of DataGEMS, a data discovery platform that aims to address those limitations, seamlessly integrating data sharing, discovery and analysis into a system that addresses the whole data lifecycle, i.e., sharing, storing, managing, discovering, analyzing and reusing (data and/or metadata), bridging the gap between the data provider and the data consumer. DataGEMS is a next-generation data discovery and management ecosystem that engulfs different types of data (structured, unstructured, real-time and historical) and enables users to (a) enrich data through powerful data profiling mechanisms (b) seamlessly discover and analyze data across and within datasets (e.g., tables, files, databases, text) using user-intuitive discovery and analysis mechanisms, such as using natural language and patterns, and (c) effectively explore and combine data with the help of stepwise guidance mechanisms during dataset discovery and analysis. DataGEMS builds on the principles of data openness and re-use and will be initially tested and deployed to promote data FAIRness and benefit diverse user communities and types of users coming from diverse domains (education, meteorology, and language data).