CEDA document repository

HPFELD : Hosted Processing Facility for the Exploitation of Large Datasets

Marsh, Kevin and Kershaw, Philip and Latter, Barry and Smith, Garin and Boissier, Enguerran (2013) HPFELD : Hosted Processing Facility for the Exploitation of Large Datasets. In: ESA Big Data From Space Conference 2013, 5-7 July 2013, ESRIN, Frascati, Italy.

[img] PDF (HPFELD poster presented at ESA "Big Data from Space" conference, June 2013)
hpfeld_poster_2012.pdf - Presentation
Available under License Creative Commons Attribution.

Download (3MB)

Abstract

The era of 'big data' means that data centres are under increasing pressure to hold and support datasets which are much larger than before. The sheer volume of such datasets means that it is becoming impractical for users to be expected to download and store them on their local systems. Even if they could do this, they are then faced with the problem of finding enough local computing resource to process the data in a timely fashion. This is particularly true for Earth Observation (EO)data from satellites. Consequently, these data are effectively unusable by a significant proportion of the user community. A more efficient approach would be to allow the data centre archive themselves to be coupled to processing capability, and made available to users over the internet. In this way, remote users could select a pre-configured algorithm (or upload their own) to run on the dataset. The actual processing would be run on a host system which was 'close' to the data archive, and the results of the processing, would be made available to view on-line or download to their local systems. In this system, other complementary datasets for the data centre could be easily incorporated into the processing, such as comparison with different 42 model datasets. The host system would also be able to leverage the power of 'cloud' technologies, with the HPFELD system itself providing the environment in which the processing is performed. The HPFELD project was an attempt to see if existing technologies (such as G-POD, OPeNDAP and OpenID) could be combined to rapidly produce a demonstration system It was part funded by the TSB, and was a collaboration between STFC (CEDA), and the commercial companies Magellium and Terradue. The demonstrator system was set up to process METOP IASI L1C and ECMWF data to derive methane, with the aim of making the processing as flexible and easy to use as possible. Both of these datasets are held in the BADC archive (http://badc.nerc.ac.uk). This system has been used to show the benefits of using this approach when processing very large datasets.

Item Type: Conference or Workshop Item (Poster)
Additional Information: HPFELD was a 1 year project, part funded by the TSB.
Subjects: Data and Information
Computer Science
Atmospheric Sciences
Meteorology and Climatology
Depositing User: Dr Kevin Marsh
Date Deposited: 15 Jul 2013 07:53
Last Modified: 15 Jul 2013 07:53
URI: http://cedadocs.ceda.ac.uk/id/eprint/954

Actions (login required)

View Item View Item