Publication Date:
2019-07-17
Description:
The development and deployment of data processing systems to process Earth Observing System (EOS) data has proven to be costly and prone to technical and schedule risk. Integration of science algorithms into a robust operational system has been difficult. The core processing system, based on commercial tools, has demonstrated limitations at the rates needed to produce the several terabytes per day for EOS, primarily due to job management overhead. This has motivated an evolution in the EOS Data Information System toward a more distributed one incorporating Science Investigator-led Processing Systems (SIPS). As part of this evolution, the Goddard Earth Sciences Distributed Active Archive Center (GES DAAC) has developed a simplified processing system to accommodate the increased load expected with the advent of reprocessing and launch of a second satellite. This system, the Simple, Scalable, Script-based Science Processor (S42) may also serve as a resource for future SIPS. The current EOSDIS Core System was designed to be general, resulting in a large, complex mix of commercial and custom software. In contrast, many simpler systems, such as the EROS Data Center AVHRR IKM system, rely on a simple directory structure to drive processing, with directories representing different stages of production. The system passes input data to a directory, and the output data is placed in a "downstream" directory. The GES DAAC's Simple Scalable Script-based Science Processing System is based on the latter concept, but with modifications to allow varied science algorithms and improve portability. It uses a factory assembly-line paradigm: when work orders arrive at a station, an executable is run, and output work orders are sent to downstream stations. The stations are implemented as UNIX directories, while work orders are simple ASCII files. The core S4P infrastructure consists of a Perl program called stationmaster, which detects newly arrived work orders and forks a job to run the appropriate executable (registered in a configuration file for that station). Although S4P is written in Perl, the executables associated with a station can be any program that can be run from the command line, i.e., non-interactively. An S4P instance is typically monitored using a simple Graphical User Interface. However, the reliance of S4P on UNIX files and directories also allows visibility into the state of stations and jobs using standard operating system commands, permitting remote monitor/control over low-bandwidth connections. S4P is being used as the foundation for several small- to medium-size systems for data mining, on-demand subsetting, processing of direct broadcast Moderate Resolution Imaging Spectroradiometer (MODIS) data, and Quick-Response MODIS processing. It has also been used to implement a large-scale system to process MODIS Level 1 and Level 2 Standard Products, which will ultimately process close to 2 TB/day.
Keywords:
Earth Resources and Remote Sensing
Type:
International Geoscience and Remote Sensing Symposium; Jul 09, 2001 - Jul 13, 2001; Sydney; Australia
Format:
text
Permalink