Conference Paper

EarthServer Search Engine: A Distributed Infrastructure for Earth Science Big Data Retrieval

Abstract: 

Earth Science data are composite, multidimensional and of significant size, and as such, pose a number of challenges upon their management and handling. With the amount of data sources (e.g, drones, sensor networks or higher level products generators) and the rate of generated data continuously increasing, yet another challenge is arising: making the information existing in multiple heterogeneous resources available efficiently. Currently, Array Database Management Systems provide the mandatory set of tools - storage and processing oriented - for managing unlimited-size multi-dimensional arrays, while OGC standards (e.g., WCS and WCPS) provide the guidelines for the design and exposed behavior of web accessible Earth Science data management engines. Through those approaches, the realization of standardized Earth Science big-data oriented infrastructures is facilitated, as shown in a number of cases.
Our approach, namely the EarthServer Search Engine, builds on top of these elements and manages to bring together diverse data sources while also enhancing the user-friendliness of the underlying array database management systems. An abstract data model is introduced, that extends and combines metadata from different standards, and enables the construction of an one-stop shop, with the use of tools that discover and harvest such data, in order to facilitate access to all of them for the end-user. All the information retrieved is exposed through well-established standards such as OGC CSW, OAI-PMH or OpenSearch. Additionaly, a query language is proposed, which adds a rich set of features on top of the Web Coverage Processing Service (WCPS) Language Interface Standard, by adopting the FLOWR expression syntax popularized by XQuery. The new language, with the assistance of a number of coordinating services, merges the path these two widely adopted standards have paved, to offer a more expressive way of querying array data, to allow mixed search over both data and metadata as well as federated search, and to add result manipulation capabilities with the option of generating mixed results.
Through these, the EarthServer Search Engine allows the discovery and processing of large data sets across heterogeneous infrastructures, and provides answers to requests which address metadata while also giving processing directives, offering advanced, and provider independent data analysis in a compact way. Therefore, the flow of scientific data resources among working groups and systems is greatly facilitated and the processing logic is further decoupled from data organization and management.
The effectiveness of this approach has been evaluated in the context of the EarthServer project, by bringing together five high volume array databases.

Author: 

Panagiotis Liakos, George Kakaletris, Panagiota Koltsida

Presenter Biography: 

Panagiotis Liakos is a researcher and developer at the University of Athens where he received both his BSc.(2008) and MSc. (2011) in Computer Science. Since February 2010, he is with the MADgIK Research Group at UoA and has been involved in various EU-funded ICT Projects, including PERNASVIP, iMarine and EarthServer. His research interests are in Data Mining, Graph Compression and Social Network Analysis. Panagiotis was the recipient of the First Prize in ACM's WSDM 2013 Data Challenge with a novel graph compression approach. He has over 6 years of professional experience in developing software using Java, Python, Ruby on Rails, PostgreSQL (with PostGIS) and ElasticSearch among others. Currently, he is a member of the EarthServer research and development team responsible for a series of software elements provided by UoA.

Panagiota Koltsida is a senior software engineer who has worked for several european research projects at the university of Athens (NKUA) and ATHENA Research Center. She has received both her BSc. (2006) and MSc (2009) from the department of Informatics and Telecommunications at the University of Athens. Her reserarch interests include Information Retrieval systems, Web Information Systems and Data Management. She has more than 7 years of experience using java, XML and web related technologies. Currently she is involved in the iMarine and EarthServer EU projects developing various software elements focusing on marine and big data.

? Top