Background & Summary

Light from the sun reflected back across the water-air interface carries characteristic spectral signatures of several key water quality constituents due to their wavelength-specific absorption and scattering properties1,2. Chlorophyll a, total suspended solids, and colored dissolved organic matter are the dominant optically active constituents in inland and coastal waters3,4, and common measures of water quality used for the management of ecosystem and public health5,6,7,8. Accurate measurements of spectral reflectance (i.e., the upwelling radiance normalized by the downwelling solar irradiance) are the foundation for synoptic and cost-effective environmental monitoring applications using satellite sensors, automated sensors installed near the water surface and portable instruments for manual field surveys9.

Space-borne instruments have been providing accurate estimates of chlorophyll a and particle backscattering in the open ocean since the late 1990s with data from the Sea-viewing Wide Field-of-view Sensor (SeaWiFS) followed by many others, including the MEdium Resolution Imaging Spectrometer (MERIS) and Moderate Resolution Imaging Spectroradiometer (MODIS) in the 2000s, and the Ocean and Land Colour Instrument (OLCI) and Visible Infrared Imaging Radiometer Suite (VIIRS) over the last decade10,11,12,13,14,15,16,17. However, in coastal and inland waters, uncertainties in these estimates are typically much higher due to factors that include diverse atmospheric contributions, stray light from adjacent land areas, potentially uncorrelated variability of optically active constituents, and, in optically shallow water, bottom reflection9,18,19,20. Further, coarse-resolution imagers with a nominal resolution near 1 km are limited in nearshore and narrow systems where modern high-resolution missions like Landsat-8 and Sentinel-2 offer valid observations21. Overall, the retrieval of water quality in lakes, rivers, estuaries, lagoons and nearshore coastal waters remains an active area of research where improvements are needed so that satellite observations can fulfill their potential and become part of routine monitoring programs for ecosystem states, trends, and public-health alerting systems22,23,24,25,26.

Large and globally representative in situ datasets are essential for the development and validation of bio-optical algorithms to support large-scale monitoring using satellite Earth observation technologies. Such datasets are particularly scarce and geographically fragmented from inland and coastal waters as radiometric measurements are not part of most routine sampling programs and many lakes are remote and difficult to access.

We address these shortcomings with our GLObal Reflectance community dataset for Imaging and optical sensing of Aquatic environments (GLORIA). GLORIA includes over 7000 curated hyperspectral remote sensing reflectance (Rrs, sr−1) and co-located chlorophyll a (Chla, mg m−3), total suspended solids (TSS, g m−3), absorption by colored dissolved organic matter (CDOM) at 440 nm wavelength (aCDOM(440), m−1) and Secchi depth (m) measurements. The data were contributed by researchers affiliated with 59 institutions in 20 countries who made the measurements for a range of objectives under diverse funding sources and resource levels, but shared attention to strict sampling protocols, tenacity to reach remote and inaccessible sites, commitment to establish long-term trend monitoring sites, and the recognition of the value of open-access datasets for public benefit. With its almost global coverage, geomorphic range of water bodies, and 30-year time span (Fig. 2), GLORIA represents the de-facto state of knowledge of in situ coastal and inland water bio-geo-optical diversity. Subsets of the data have already produced significant contributions to global algorithm development for the satellite-based estimation of Chla, TSS, and aCDOM(440) using data-intensive machine-learning methods27,28,29,30,31 or global semi-analytical approaches32. Where they were available, we also provide uncertainty estimates of Rrs and water quality measurements as standard deviations and means from replicate measurements. Nevertheless, some methodological detail which is currently considered relevant may not have been recorded at the time of observation, which limits our ability to retrospectively assess sources of uncertainty to subsets of the global dataset.

GLORIA builds upon the existing data repositories aimed at remote sensing studies of aquatic environments. We address poorly represented optically complex coastal and inland waters in existing open-data platforms such as the SeaWiFS Bio-optical Archive and Storage System (SeaBASS, https://seabass.gsfc.nasa.gov)33,34. In contrast to other relevant data repositories, such as the Lake Bio-optical Measurements and Matchup Data for Remote Sensing (LIMNADES, https://limnades.stir.ac.uk) database, GLORIA is open-access. By carrying out consistent quality control across the entire dataset, and providing comprehensive methodological details associated with each measurement, we have produced an analysis-ready, standalone data package for the community.

The commitment of space agencies towards maintaining and enhancing optical Earth observing systems and the burgeoning fleet of commercial platforms indicate that our coupled reflectance-water quality attribute dataset fills a strong need to facilitate algorithm and application development. We anticipate that our collection of field setups and methodologies will encourage targeted data collection for the calibration and validation of upcoming satellite sensors35,36,37, as well as the growth of in situ observatories38,39,40.

Methods

The GLORIA dataset was collated from the aquatic optics community of researchers or research groups working towards a range of goals including the routine monitoring of high-priority sites, one-off bio-optical characterization of a range of water bodies, data gathering to support algorithm development, or designated sampling for validating equivalent satellite-derived products. Efforts to gather this data started in 2018 with the second atmospheric correction intercomparison exercise (ACIX-II Aqua), an international collaboration to test processors that generate aquatic reflectance products from radiance measurements made at the top of the atmosphere19. Requests for contributions were made at pertinent conference sessions and via the research networks of individuals. These requests were for quality assured remote sensing reflectance spectra at 1 nm intervals within the 350 to 900 nm wavelength range and at least one co-located water quality attribute (Chla, TSS, aCDOM(440), or Secchi depth), and associated uncertainties. The sections below provide more details of the data and processing.

Radiometric data collection and processing

The central radiometric quantity reported in our dataset is remote sensing reflectance, Rrs (sr−1). It is defined as the ratio of the water-leaving radiance just above the water surface (Lw(0+), W m−2 sr−1 nm−1) over above-water downwelling irradiance (Es, W m−2 nm−1)(Eq. 1, Fig. 1). We use the symbology of Ruddick et al.41 with slight modifications:

$${R}_{{\rm{rs}}}(\lambda ,\Theta ,\phi )=\frac{{L}_{{\rm{w}}}(\lambda ,\Theta ,\phi )}{{E}_{{\rm{s}}}(\lambda )}$$
(1)
Fig. 1
figure 1

Optical processes of absorption and scattering in the atmosphere and the water determine the amount and spectral nature of light received by a sensor. Remote sensing reflectance, the central radiometric quantity of the GLORIA dataset, is the ratio of the water-leaving radiance just above the water surface (Lw) over above-water downwelling irradiance (Es).

Rrs and Lw are dependent on the viewing nadir angle Θ (measured from the downward vertical axis) and azimuth viewing angle ɸ (measured clockwise from the sun); λ identifies the wavelength dependence. For aquatic remote sensing applications, it is conventional to define Rrs as derived from a sensor looking straight down Lw(λ, Θ = 0) where ɸ is not defined42. Therefore, we omit λ, Θ, and ɸ for notational brevity. Several methods and instruments were used for the measurement of the downwelling and upwelling radiometric quantities reported in our dataset. Here we provide brief descriptions of the broad types of methodologies used for their measurement, and a list at the end of this section gives a formal summary.

Lw can be measured directly using a radiometer just above the water surface, looking vertically down and shielded from light reflected off the water surface43. Other common techniques include measurement of the upwelling radiance at nadir below the water surface (Lu(0-))44, or from above the water surface where the sensor is directed at a non-zero nadir angle (Lt)45. Both of these radiance measurements require conversions to Lw, which are referenced in the list at the end of this section. In brief, Lu(0-) can be derived by extrapolating upwelling radiance from measurements at practical depths below the water surface to just below the water surface46. Propagation through the water-air interface by accounting for the reduction of radiance by internal reflection off the water surface yields Lw. The estimation of Lw from Lt is more involved, as Lt contains a considerable amount of sky radiance reflected off the water surface into the sensor field of view (reflected sky radiance) in addition to Lw(Θ, ɸ), where we note the angular dependence to emphasize the need for conversion to Θ = 0. Sky radiance (Lsky) is therefore usually measured simultaneously with Lt at the same azimuth angles and at zenith angles Θz (from the upward vertical axis) near 40°42.

Three different approaches were used to measure Es in the present dataset and a detailed review is provided by Ruddick at al.47. Most commonly Es was measured directly using a plane irradiance sensor above the water surface directed straight upwards. The second most used method employed a downward pointing radiance sensor measuring the reflectance of a horizontally held Lambertian plaque with known reflective properties. This method has the advantage that a single sensor can be used for all measurements needed for the calculation of Rrs, potentially reducing cost, equipment load and uncertainties from the intercalibration of several sensors. In some cases, Es was estimated from irradiance measurements below the water surface (just below the surface: Ed(0-), or at depth z: Ed(z)). These measurements are typical of autonomous installations on vertical sensor chains or a single sensor package on a vertically profiling platform44.

The instruments used for the radiometric measurements for each entry of the GLORIA dataset are part of the metadata (file GLORIA_meta_and_lab.csv) and are provided in the list at the end of this section. These include those customarily used for validation of satellite-derived aquatic reflectance, such as RAMSES (TriOS, Germany), HyperOCR (manufactured by Sea-Bird Scientific, USA; previously manufactured by Satlantic Inc., Canada) and C-OPS (Biospherical Instruments Inc., USA). The RAMSES and HyperOCR have 256 channel silicon photodiode array detectors with a 10 nm spectral resolution and a spectral sampling of 3.3 nm per pixel. The typical setup for RAMSES instruments for our dataset is an above-surface installation with a vertical Es sensor and Lsky and Lt sensors at 40–42° zenith and nadir angles, respectively (Fig. 1). HyperOCR instruments are typically installed on a floating frame to measure Es, and Lu or Lw at zero nadir angle while the HyperPRO (and HyperPro II) are free-falling setups of the HyperOCR designed to measure vertical profiles in the water column. The C-OPS configuration is similar to the HyperOCR, but the instrument only has 19 spectral bands of 10 nm width. The HyperSAS is a three-sensor setup of the HyperOCR for above-surface installation on structures overlooking the water or ships, similar to the RAMSES setup. The Water Insight WISP-3 is a self-contained handheld unit with optical inputs for Es, Lsky and Lt leading to separate spectrometers48.

A number of instruments used accommodate a single optical input into handheld units or portable instruments and need to be pointed to provide the different radiometric measurements (ASD FieldSpec range, Satlantic HyperGun, Spectra Vista, Spectral Evolution, Spectron Engineering and Photo Research SpectraScan devices).

Some investigators integrated compact spectrometers (manufactured by Ocean Insight, Inc., formerly known as Ocean Optics, Inc., USA) with data loggers and optical fibers on frames or poles that can be pointed away from observation platforms. Measurements would either be accomplished through several instruments and optical fibers oriented for the respective radiometric quantities, or a single sequentially reoriented fiber.

Data contributors provided radiometric measurements interpolated to 1 nm intervals over the 350 to 900 nm wavelength range. The instrument-specific bandwidths of the original measurements are provided in the data table (file GLORIA_meta_and_lab.csv, column ‘Spectral_resolution_nm’). Due to instrument and processing constraints, some spectra span the range from 400 to 750 nm, or nearby bounds. The radiometric data for each GLORIA entry may be from a single measurement, or the mean or median of several measurements over a time interval. When available, the data contributors provided the spectral Rrs means, standard deviations, and numbers of measurements for sampling events. Quality control was conducted on all received spectra (see section Technical validation).

The measurement setups and instruments used for radiometric measurements are listed below. The number of the method corresponds to the column ‘Measurement_method’ in GLORIA_meta_and_lab.csv. References to published descriptions of the approach and applications are provided where available.

  1. 1.

    Sequential Lt, Lsky, and Es via a plaque on MP (moving platform)

    Instruments: ASD FieldSpec, Photo Research PR-650 SpectraScan Colorimeter, Sea-Bird Scientific/Satlantic HyperGun, Spectra Vista GER1500, Spectral Evolution SR-3500/PSR-1100f, Spectron Engineering SE-590, TriOS RAMSES

    Approach: Mobley45

    Applications: Bresciani et al.49; Kudela et al.50; Zolfaghari et al.51

  2. 2.

    Lt, Lsky, and Es on MP

    Instruments: Water Insight WISP-3

    Approach: Mobley45

    Applications: Hommersom et al.48

  3. 3.

    Lu(0-) and Es on pole connected to a spectrometer via fiber optics from MP or water edge

    Instruments: Ocean Insight/Ocean Optics USB2000/USB2000 + /USB4000

    Approach: Chipman et al.52

    Applications: Gurlin et al.53; Schalles and Hladik54; Li et al.55; Mishra et al.56; Brezonik et al.57; Werther et al.58

  4. 4.

    Lw(0+) skylight blocked and Es afloat away from MP

    Instruments: Sea-Bird Scientific/Satlantic HyperOCR

    Approach: Lee et al.55

    Applications: Wei et al.59

  5. 5.

    Lu(0-) afloat away from MP, Es on MP

    Instruments: Sea-Bird Scientific/Satlantic HyperOCR, TriOS RAMSES

  6. 6.

    Lt, Lsky, and Es on MP

    Instruments: Sea-Bird Scientific/Satlantic HyperSAS, TriOS RAMSES

    Approach: Mobley45; Simis and Olsson60

    Applications: Qin et al.61; Warren et al.62

  7. 7.

    Lt, Lsky, and Es on a frame deployed on MP

    Instruments: TriOS RAMSES

    Approach: Mobley45; Mobley63

    Applications: Maciel et al.64; Cairo et al.65; da Silva et al.66

  8. 8.

    Lu(0-) and Ed(0-) in-water profiling from MP, Es on MP

    Instruments: Biospherical C-OPS, Sea-Bird Scientific/Satlantic HyperOCR, TriOS RAMSES

    Approach: Mueller et al.44; Lubac and Loisel67

    Applications: Binding et al.68

  9. 9.

    Lu(0-) and Ed(z) units on a depth adjustable bar (measurements at −0.21 and −0.67 m) on a frame afloat away from MP, Es unit lifted above water surface for Es

    Instruments: TriOS RAMSES

    Approach: Fritz et al.69

  10. 10.

    Lu(0-) and Ed(0-) from winch on MP, Es on MP

    Instruments: TriOS RAMSES

    Approach: Zibordi and Talone70

  11. 11.

    Lt and Es on pole from water edge

    Instruments: TriOS RAMSES

    Approach: Kutser et al.71

  12. 12.

    Lu(0-) and Ed(0-) autonomous in-water profiling from a fixed platform

    Instruments: Sea-Bird Scientific/Satlantic HyperOCR

    Approach: Mueller et al.44

    Applications: Minaudo et al.72

  13. 13.

    Sequential Lt and Es via a plaque, mounted on gimbal stabilized pole from MP

    Instruments: Ocean Insight/Ocean Optics STS-VIS

  14. 14.

    Lu(0-) (and Ed(0-) only for depth information) from in-water profiling from MP, Es recorded simultaneously from same MP very close to profiler deployment

    Instruments: TriOS RAMSES

    Approach: Mueller et al.44; Stramski et al.73

    Applications: Bracher et al.74; Tilstone et al.75

  15. 15.

    Lt, Lsky, Es, combined with one Lu unit (aperture at −0.05 to −0.10 m) placed on a pole

    Instruments: TriOS RAMSES

  16. 16.

    Sequential Lu(0-) and Es via a plaque, both measurements using an optical fiber to a black masked perspex tube

    Instruments: Spectron Engineering SE-590

    Approach: Dekker76

  17. 17.

    Lu(0-) and Ed(z) units on a floating frame (measurements at −0.4 m (Lu) and −0.1 m (Ed)) drifting 10 m away from vessel

Instruments: TriOS RAMSES

Approach: Fritz et al.69

SeaBASS data

GLORIA includes approximately 1100 entries from SeaBASS33. We searched SeaBASS for reflectance spectra with concomitant water quality measurements and ensured that these are from inland and coastal waters only by mapping sampling locations of all records from water depths less than 200 m. Where water depth was not part of the SeaBASS record, we assigned it based on the General Bathymetric Chart of the Ocean (GEBCO_2021 Grid sub-ice topo/bathy)77. Several metadata fields were unavailable for this data, but SeaBASS dataset identifiers are provided to allow further research if needed. All SeaBASS data were included in our quality control process. While SeaBASS allows the upload of uncertainty data for radiometry and water quality, the entries we located for inland and coastal waters did not contain this information.

Water sample analysis

Water quality attributes Chla, TSS and aCDOM(440) were determined using well established high-accuracy laboratory methods. The method for each analysis is identified in the columns ‘Chl_method’, ‘TSS_method’, and ‘aCDOM_method’ in the file GLORIA_meta_and_lab.csv and method details are provided in GLORIA_variables_and_methods.xlsx. Where available, data means and standard deviations from replicate analyses of Chla, TSS and aCDOM(440) are provided in separate files.

The most frequently used methods for Chla were via solvent-based pigment extraction from filter pads followed by fluorometric (U.S. EPA 445.0) or spectrophotometric (U.S. EPA 446.0) analysis. In the majority of samples, pigments were extracted in 90% acetone with the aid of mechanical tissue grinding. Modifications of these methods included the use of 90% acetone buffered with MgCO3 and different approaches to support the mechanical breakdown of the algal cells. Other methods for Chla followed national and international standards (DIN 38412-16:1985-12, NEN 6520, HJ 897–2017, SL88-2012 and ISO 10260:1992). Methods which included a correction for phaeophytin, a degradation product of Chla78, are indicated by a flag (‘1’) in the data table (column ‘Phaeophytin_correction’) and the corresponding Chla value is found in column ‘Chla’; where phaeophytin was not corrected for the flag is ‘0’ and Chla is provided in column ‘Chl_plus_phaeo’ unless the correction for phaeophytin was not applicable as for certain fluorometric instrument setups79. Many investigators also used high-pressure liquid chromatography (HPLC) for Chla determination and the Chla value is found in column ‘Chla’. The only exception to lab-determined Chla are measurements from the Thetis profiler in Lake Geneva (Switzerland) where Chla associated with Rrs measurements was estimated from absorption line height at 676 nm80 and the linear relationship between the night-time fluorometric Chla (measured by a WetLabs ECO Triplet BBFL2W) with absorption line height (average coefficient of determination: R2 = 0.92)72.

TSS concentration was measured gravimetrically by weighing the dried residue of a water sample filtered on a pre-combusted and pre-weighed filter pad. aCDOM(440) was generally quantified following Mitchell et al.81. Therefore, the optical density of water samples, typically filtered through 0.2 μm pore size polycarbonate membranes to remove particulates, was measured in a spectrophotometer and converted to absorption. Secchi depth was determined as the depth at which a disk, typically black and white of 20 or 30 cm in diameter, is no longer visible by an observer when it is lowered into the water82,83.

Ancillary and metadata

Each data entry is associated with fields identifying the data contributor, cross-references to other databases, and details describing the sampling site and environmental conditions. Several categorical variables allow cursory stratification of the dataset according to water body type (lake, estuary, coastal ocean, river or other), data collection purpose (e.g., routine surface water monitoring or event-driven sampling), dominant biogeochemical water type (e.g., sediment-dominated or algal-dominated), and optical stability (e.g., low for shallow lakes, rivers and estuaries or high for deep lakes and some coastal ocean environments).

Specific characteristics of the sampling event such as geocoordinates, date and time stamps, environmental conditions (e.g., cloud cover, wind speed and wave height), and environmental settings (e.g., elevation above sea level, dominant land cover and slope) are provided where known. Several metadata fields provide cross references to details of instrumentation, measurement and processing methods for all radiometric and water quality data.

Data Records

The GLORIA dataset is hosted at the PANGAEA Data Publisher for Earth & Environmental Science84. The data is contained in several comma-separated value (csv) files and a Microsoft Excel file provides keys to column names and method details (Table 1). Individual data points are identified across all files using the GLORIA_ID.

Table 1 Files of the GLORIA dataset and their content.

The 7,572 GLORIA Rrs spectra originate from 31 countries over an almost global geographical range from 67°N to 54°S and from 122°W to 178°E (Fig. 2) with the majority of samples from lakes (60%), followed by coastal waters (32%), estuaries (4%), and the remainder from rivers and other water body types. The wide range of radiometric and water quality measurements in GLORIA (Fig. 2) is consistent with the global diversity of Rrs spectral shapes with respect to optical water types85,86 and visual color ranges87,88 (Fig. 3). The range of water quality attributes is comprehensive and their frequency distributions are shown in (Fig. 2).

Fig. 2
figure 2

Summary of the geographical, temporal and water quality distributions of the GLORIA samples. (a) Dots mark the location of each sample and the histograms on the edges of the map show the longitudinal and latitudinal distributions of the dataset. (b) The earliest samples were collected in 1990 and the sampling effort has been steady since 2001. (cf) Histograms of log-transformed water quality attributes illustrate the extreme range of values and their typical log-normal distributions.

Fig. 3
figure 3

Summary of the diversity of GLORIA’s Rrs spectra. (a) Thirteen Rrs spectra chosen at random, one from each optical water type displayed in b. (b) Bar chart of the number of GLORIA spectra assigned to each optical water type from Spyrakos et al.85. (c) Chromaticity diagram98 showing the visual color derived from each GLORIA Rrs spectrum using the tristimulus weighting functions according to the Commission Internationale de l’Éclairage (CIE)99; WP: white point.

Technical Validation

All data submitted for inclusion into this compilation had undergone quality control by the providers. Our curation process included detailed information recovery with them to ensure sampling, sample processing, and laboratory analysis methods are fit for purpose. Further checks on the gathered data were carried out as described below.

Reflectance spectra

Reflectance spectra were checked for outliers and unrealistic spectral shapes using a series of quality control indicators (Table 2). By flagging, but keeping, spectra with moderate or suspected quality issues, we were able to retain a larger dataset and we advise the user to inspect the flags to evaluate the dataset for their purposes. The quality control methods are described below. Data entries with quality issues are identified by setting the corresponding quality flag to one (1) in the file GLORIA_qc_flags.csv.

Table 2 Quality control tests and associated flag names in table GLORIA_qc_flags.csv.

The first round of quality control was a procedural detection of high-frequency variability (suspected noise), baseline shifts (e.g., from suboptimal glint removal), the presence of an oxygen absorption feature near 762 nm (e.g., from sensor intercalibration issues), and negative slopes in the ultraviolet to blue part of the spectrum (e.g., from suboptimal diffuse sky radiance correction). These are the first five flags in Table 2.

Additionally, we calculated the Quality Water Index Polynomial (QWIP) score89. This approach was developed to identify hyperspectral aquatic reflectance data that fall outside general trends observed in a large dataset from optically deep waters. Briefly, the QWIP is a 4th order polynomial which describes a well-formed central tendency for a spectrally integrated metric (Apparent Visible Wavelength90, AVW) to predict a Normalized Difference Index (NDI; λ = 492, 665 nm) across a continuum of water types. For a given spectrum, the difference between the calculated NDI and that predicted by the AVW is known as the QWIP score. If a given QWIP score exceeded a prescribed deviation from the polynomial relationship, in this case |0.2|, the data was identified by the flag ‘QWIP_fail’ in the file GLORIA_qc_flags.csv (Table 2). AVW and the QWIP score are provided in the file GLORIA_qc_ancillary.csv (Table 3).

Table 3 Ancillary information for quality control flags in table GLORIA_qc_ancillary.csv.

On visual inspection, some spectra that passed the above criteria still appeared to have subtle problems. Further issues may be caused by instrument drift, instrument shading, stray light contamination, or errors during sky glint correction, and are often exacerbated by environmental conditions59. Such suspicious spectra can be recognized by experienced practitioners familiar with how inherent optical properties of surface waters vary naturally and determine reflectance through radiative transfer processes (Fig. 1)91. Utilizing this knowledge within the co-author community, we conducted systematic expert elicitation by randomly dividing the Rrs spectra into batches of 400 to 700 and assigning each batch to an expert for identifying suspicious looking data. The spectra that were flagged ‘Suspect’ were then evaluated by three more experts for the purpose of improving consistency across the batches from different individuals. The resulting set of suspect spectra are identified by the flag ‘Suspect’ in the file GLORIA_qc_flags.csv (Table 2).

Uncertainty in R rs from above-surface measurements by means of reconstruction with a coupled water-atmospheric radiance model

Determining the uncertainty inherent in Rrs observations is challenging because of the variable nature of illumination and water surface conditions during repeat observations. This is especially true for measurements of upwelling light made above the water surface where sun glint and reflected sky radiance contribute to Lt, which applies to about 42% of the samples in GLORIA. To a large extent, spurious observations resulting from such random effects were already removed at source, such that the remaining variability is the result of various quality screening procedures and expert interpretation. However, it is possible to use models of atmospheric irradiance and bio-optical properties to model the most likely contribution of sun glint and reflected sky radiance on the Rrs observation, and thereby test the reported Rrs for physical consistency. To this end, we used the 3C algorithm92 to reconstruct Rrs from records where Lt, Lsky and Es were available.

3C provides a reconstruction of Rrs using nonlinear optimization of atmospheric and water optical models, allowing for a range of optical properties to solve the relationship between the upwelling radiance and downwelling irradiance provided as input. Due to the flexibility of the surface corrections, 3C is proposed to enable robust Rrs to be obtained across a wide range of measurement conditions. The resultant 3C-Rrs is expected to have reduced propagation of error from the variable spectral shape of sky reflectance and glint. This provides an advantage over methods which consider these corrections either constant, or a function of wind speed60, which is the case for the majority of Rrs from above-surface measurements reported in the GLORIA database (column ‘Skyglint_removal’ in GLORIA_meta_and_lab.csv). The difference between 3C-Rrs and the originally reported Rrs is, therefore, an approximate measure of algorithmic uncertainty. A close match between the 3C reconstruction and the originally reported Rrs provides confidence that the reported observation was physically consistent. Larger discrepancies are assumed to be associated with challenging observation conditions, resulting in suspect Lsky, Lt or Es, but can also be caused by water or atmospheric properties which the model cannot reconstruct.

For this analysis, we used the 1589 spectra which included Lt, Lsky, Es, observation time, and geographic location, and for which the method to calculate Rrs was not already based on 3C. This analysis is also independent from the quality flagging in the previous section, so that all observations were included and the results present a worst-case scenario which best represents the algorithmic uncertainty inherent to calculating Rrs, albeit without knowledge of quality control criteria applied prior to the data being reported. The 3C water optical model was configured with wide bounds for the concentration of Chla (initial condition 5 mg m−3, range 0.01–1000 mg m−3) and TSS (initial condition 10 g m−3, range 0–1000 g m−3) whilst otherwise configured as detailed in Groetsch et al.92 and Jordan et al.93.

The median bias between reported and 3C-Rrs was in the order of 0.0005 sr−1, with 3C yielding lower Rrs, as should be expected because incomplete correction relying on a static correction factor for surface reflections leads to higher Rrs (Fig. 4A). Bias gradually decreased with wavelength, which suggests the reported data have been suboptimally corrected for diffuse sky radiance. There is considerable spread in the model-observation bias, in the order of 0.00004 to 0.0016 sr−1 for Rrs(560) in the interquartile range.

Fig. 4
figure 4

Spectral bias of reported Rrs compared with 3C-modeled Rrs from 1589 spectra for which Lt, Lsky and Es were available. (A) Median and interquartile (reported - modeled). (B) Relative bias in Rrs (reported - modeled)/modeled. Discontinuities in the bias spectrum are caused by sensors having different wavelength ranges within parts of the dataset.

In relative terms (Fig. 4B), median bias in Rrs between observed and 3C-Rrs is smallest in the green spectral range (order of 6.4%), where peak Rrs amplitude is typically observed in this dataset, and largest in the UV and NIR regions of the spectrum where Rrs is typically lower. The spread (interquartile range) in the relative bias in Rrs(560) is 5–16%, but much wider in the UV and NIR range, exceeding −30% and 170%.

The largest differences in Rrs bias between reported and 3C spectra were found between contributed datasets, rather than between observation methods. The majority of datasets showed absolute relative differences in Rrs(400–800) in the 0–10% range, but there are also cases where the difference exceeds 100%.

This analysis points to an overall high degree of uncertainty in the methods using above-water Lt measurements and the need for rigorous quality control by observers. For future work, we suggest adding Rrs model reconstruction as part of the data collection effort, which allows inspection of glint terms to objectively flag observations as suspect, before other quality controls are implemented. Furthermore, to support future algorithmic improvements (e.g., to elaborate bidirectional reflectance distribution functions), all component spectra and observation geometries should be included in datasets and these should be reported at the native resolution of each sensor involved to avoid convolution error when calculating Rrs94.

Water quality

The water quality measurements were investigated using frequency distributions to identify outliers. Separate frequency distributions were created by ‘Water_type’, a subjective classification assigned by the data contributors according to the dominant optical constituent for each water body (TSS-dominated, Chla-dominated, CDOM-dominated, Chla + CDOM-dominated, moderately turbid coastal, clear). Any measurements above three standard deviations from the water-type specific mean were reevaluated to ensure they were of high confidence.

Usage Notes

References to method details

The methods used for radiometric measurements and laboratory analyses are identified in the columns ‘Measurement_method’, ‘Chl_method’, ‘TSS_method’, and ‘aCDOM_method’ in the file GLORIA_meta_and_lab.csv. Associated details with references are provided in separate sheets in the file GLORIA_variables_and_methods.xlsx. Looking up the method for a particular measurement requires the ‘Dataset_ID’ and the method name.

Quality flags

Each Rrs measurement is associated with quality flags (file GLORIA_qc_flags.csv). The quality flags are binary and indicate the presence (‘1’) or absence (‘0’) of the quality issue described in Table 2. Missing values indicate that the flag could not be determined because the spectrum did not include the required wavelength range. Some numerical values generated during the quality control are provided in the file GLORIA_qc_ancillary.csv (Table 3).

Cross-references to other datasets

Some of the data in GLORIA is part of other data publications, or is also included in the community repositories SeaBASS33 and/or LIMNADES. The columns ‘SeaBASS_ID’, ‘LIMNADES_ID’, and ‘LIMNADES_UID’ in the data table (GLORIA_meta_and_lab.csv) provide identifiers used in the respective datasets to facilitate cross referencing entries, for example for the avoidance of duplicates. Other references to prior publication of the data are provided in the ‘Comments’ column in GLORIA_meta_and_lab.csv in the form of a digital object identifier (DOI).