ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • machine learning
  • 2020-2022  (17)
  • 1
    Publication Date: 2021-10-27
    Description: Wheat production plays an important role in Morocco. Current wheat forecast systems use weather and vegetation data during the crop growing phase, thus limiting the earliest possible release date to early spring. However, Morocco's wheat production is mostly rainfed and thus strongly tied to fluctuations in rainfall, which in turn depend on slowly evolving climate dynamics. This offers a source of predictability at longer time scales. Using physically guided causal discovery algorithms, we extract climate precursors for wheat yield variability from gridded fields of geopotential height and sea surface temperatures which show potential for accurate yield forecasts already in December, with around 50% explained variance in an out-of-sample cross validation. The detected interactions are physically meaningful and consistent with documented ocean-atmosphere feedbacks. Reliable yield forecasts at such long lead times could provide farmers and policy makers with necessary information for early action and strategic adaptation measurements to support food security.
    Keywords: 551.6 ; causal discovery algorithms ; teleconnections ; seasonal forecast ; machine learning ; wheat forecast ; climate precursors
    Language: English
    Type: map
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    Publication Date: 2021-10-07
    Description: The quantification of factors leading to harmfully high levels of particulate matter (PM) remains challenging. This study presents a novel approach using a statistical model that is trained to predict hourly concentrations of particles smaller than 10  μm (PM10) by combining satellite-borne aerosol optical depth (AOD) with meteorological and land-use parameters. The model is shown to accurately predict PM10 (overall R 2 = 0.77, RMSE = 7.44  μg/m 3) for measurement sites in Germany. The capability of satellite observations to map and monitor surface air pollution is assessed by investigating the relationship between AOD and PM10 in the same modeling setup. Sensitivity analyses show that important drivers of modeled PM10 include multiday mean wind flow, boundary layer height (BLH), day of year (DOY), and temperature. Different mechanisms associated with elevated PM10 concentrations are identified in winter and summer. In winter, mean predictions of PM10 concentrations 〉35  μg/m 3 occur when BLH is below ∼500 m. Paired with multiday easterly wind flow, mean model predictions surpass 40  μg/m 3 of PM10. In summer, PM10 concentrations seemingly are less driven by meteorology, but by emission or chemical particle formation processes, which are not included in the model. The relationship between AOD and predicted PM10 concentrations depends to a large extent on ambient meteorological conditions. Results suggest that AOD can be used to assess air quality at ground level in a machine learning approach linking it with meteorological conditions.
    Keywords: 551.5 ; aerosol optical depth ; air quality ; PM10 ; machine learning ; drivers of air pollution ; MAIAC
    Language: English
    Type: map
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Publication Date: 2021-10-06
    Description: Access to credible estimates of water use is critical for making optimal operational decisions and investment plans to ensure reliable and affordable provisioning of water. Furthermore, identifying the key predictors of water use is important for regulators to promote sustainable development policies to reduce water use. In this paper, we propose a data-driven framework, grounded in statistical learning theory, to develop a rigorously evaluated predictive model of state-level, per capita water use in the United States as a function of various geographic, climatic, and socioeconomic variables. Specifically, we compare the accuracy of various statistical methods in predicting the state-level, per capita water use and find that the model based on the random forest algorithm outperforms all other models. We then leverage the random forest model to identify key factors associated with high water-usage intensity among different sectors in the United States. More specifically, irrigated farming, thermoelectric energy generation, and urbanization were identified as the most water-intensive anthropogenic activities, on a per capita basis. Among the climate factors, precipitation was found to be a key predictor of per capita water use, with drier conditions associated with higher water usage. Overall, our study highlights the utility of leveraging data-driven modeling to gain valuable insights related to the water use patterns across expansive geographical areas.
    Keywords: 333.91 ; machine learning ; sustainable water-use ; water analytics ; water consumption
    Language: English
    Type: map
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2021-09-29
    Description: To understand and predict large, complex, and chaotic systems, Earth scientists build simulators from physical laws. Simulators generalize better to new scenarios, require fewer tunable parameters, and are more interpretable than nonphysical deep learning, but procedures for obtaining their derivatives with respect to their inputs are often unavailable. These missing derivatives limit the application of many important tools for forecasting, model tuning, sensitivity analysis, or subgrid‐scale parametrization. Here, we propose to overcome this limitation with deep emulator networks that learn to calculate the missing derivatives. By training directly on simulation data without analyzing source code or equations, this approach supports simulators in any programming language on any hardware without specialized routines for each case. To demonstrate the effectiveness of our approach, we train emulators on complete or partial system states of the chaotic Lorenz‐96 simulator and evaluate the accuracy of their dynamics and derivatives as a function of integration time and training data set size. We further demonstrate that emulator‐derived derivatives enable accurate 4D‐Var data assimilation and closed‐loop training of parametrizations. These results provide a basis for further combining the parsimony and generality of physical models with the power and flexibility of machine learning.
    Description: Plain Language Summary: Many Earth science simulators are implemented as monolithic programs that calculate changes in the state of a system over time. In many cases, using or improving these simulators also requires the derivatives of their outputs with respect to inputs, which describe how future states depend on past states. These derivatives can be difficult or costly to compute. Several recent studies have applied deep learning (DL) to simulation data to construct emulators of their dynamics. Here, we use the fact that DL models can be easily and automatically differentiated to obtain approximate derivatives of the original simulator and test this idea on a simple and common chaotic model of the atmosphere. We verify in several experiments that the emulator derivatives, which require neither additional training nor extensive postprocessing to obtain, can indeed be used as a valid substitute for the derivatives of the simulator.
    Description: Key Points: Deep learning models trained on simulation data can learn the dynamics of Earth science simulators. Deep learning models also learn the input–output derivatives of the state‐update function, which are unavailable for many simulators. We show on Lorenz‐96 that these learned derivatives can be used directly for data assimilation and parametrization tuning.
    Keywords: 550 ; machine learning ; deep learning ; data assimilation ; parametrization tuning ; model Jacobians ; Lorenz‐96
    Type: map
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2021-09-24
    Description: Data-driven approaches, most prominently deep learning, have become powerful tools for prediction in many domains. A natural question to ask is whether data-driven methods could also be used to predict global weather patterns days in advance. First studies show promise but the lack of a common data set and evaluation metrics make intercomparison between studies difficult. Here we present a benchmark data set for data-driven medium-range weather forecasting (specifically 3–5 days), a topic of high scientific interest for atmospheric and computer scientists alike. We provide data derived from the ERA5 archive that has been processed to facilitate the use in machine learning models. We propose simple and clear evaluation metrics which will enable a direct comparison between different methods. Further, we provide baseline scores from simple linear regression techniques, deep learning models, as well as purely physical forecasting models. The data set is publicly available at https://github.com/pangeo-data/WeatherBench and the companion code is reproducible with tutorials for getting started. We hope that this data set will accelerate research in data-driven weather forecasting.
    Keywords: 551.6 ; machine learning ; NWP ; artificial intelligence ; benchmark
    Language: English
    Type: map
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2021-09-15
    Description: We present a machine learning approach to statistically derive geothermal heat flow (GHF) for Antarctica. The adopted approach estimates GHF from multiple geophysical and geological data sets, assuming that GHF is substantially related to the geodynamic setting of the plates. We apply a Gradient Boosted Regression Tree algorithm to find an optimal prediction model relating GHF to the observables. The geophysical and geological features are primarily global data sets, which are often unreliable in polar regions due to limited data coverage. Quality and reliability of the data sets are reviewed and discussed in line with the estimated GHF model. Predictions for Australia, where an extensive database of GHF measurements exists, demonstrate the validity of the approach. In Antarctica, only a sparse number of direct GHF measurements are available. Therefore, we explore the use of regional data sets of Antarctica and its tectonic Gondwana neighbors to refine the predictions. With this, we demonstrate the need for adding reliable data to the machine learning approach. Finally, we present a new geothermal heat flow map, which exhibits intermediate values compared to previous models, ranging from 35 to 156 mW/m2, and visible connections to the conjugate margins in Australia, Africa, and India.
    Description: Plain Language Summary: The heat energy transferred from the Earth's interior to the surface (geothermal heat flow) can substantially affect the dynamics of an overlying ice sheet. It can lead to melting at the base and hence, decouple the ice sheet from the bedrock. In Antarctica, this parameter is poorly constrained, and only a sparse number of thermal gradient measurements exist. Indirect methods, therefore, try to estimate the continental Antarctic heat flow. Here, we use a machine learning approach to combine multiple information on geology, tectonic setting, and heat flow measurements from all continents to predict Antarctic values. We further show that using reliable data is crucial for the resulting prediction and a mindful choice of features is recommendable. The final result exhibits values within the range of previously proposed heat flow maps and shows local similarities to the continents once connected to East Antarctica within the supercontinent Gondwana. We suggest a minimum and maximum heat flow map, which can be used as input for ice sheet modeling and sea level rise predictions.
    Description: Key Points: A new geothermal heat flow map of Antarctica is established by adopting a machine learning approach. Input features include both global and regional geological and tectonic information, and heat flow observations. A Gondwana reconstruction shows connections of heat flow at the conjugate margins of East Antarctica.
    Description: Deutsche Forschungsgemeinschaft (DFG) http://dx.doi.org/10.13039/501100001659
    Keywords: 551 ; 559 ; heat flow ; Antarctica ; machine learning ; gradient boosting regression
    Type: article
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2021-09-14
    Description: Abstract
    Keywords: geospatial data ; machine learning ; predictive modelling ; site probability
    Type: Dataset , dataset
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Publication Date: 2021-07-21
    Description: In recent years, feedforward neural networks (NNs) have been successfully applied to reconstruct global plasmasphere dynamics in the equatorial plane. These neural network‐based models capture the large‐scale dynamics of the plasmasphere, such as plume formation and erosion of the plasmasphere on the nightside. However, their performance depends strongly on the availability of training data. When the data coverage is limited or non‐existent, as occurs during geomagnetic storms, the performance of NNs significantly decreases, as networks inherently cannot learn from the limited number of examples. This limitation can be overcome by employing physics‐based modeling during strong geomagnetic storms. Physics‐based models show a stable performance during periods of disturbed geomagnetic activity if they are correctly initialized and configured. In this study, we illustrate how to combine the neural network‐ and physics‐based models of the plasmasphere in an optimal way by using data assimilation. The proposed approach utilizes advantages of both neural network‐ and physics‐based modeling and produces global plasma density reconstructions for both quiet and disturbed geomagnetic activity, including extreme geomagnetic storms. We validate the models quantitatively by comparing their output to the in‐situ density measurements from RBSP‐A for an 18‐month out‐of‐sample period from June 30, 2016 to January 01, 2018 and computing performance metrics. To validate the global density reconstructions qualitatively, we compare them to the IMAGE EUV images of the He+ particle distribution in the Earth's plasmasphere for a number of events in the past, including the Halloween storm in 2003.
    Description: Key Points: We develop an approach to combine a neural network with a physics‐based model of the plasmasphere using data assimilation. The approach is extensively validated using in‐situ density measurements and observed plasmapause position derived from the Imager for Magnetopause‐to‐Aurora Global Exploration EUV. The developed model reproduces the plasmasphere dynamics during quiet, moderate, disturbed, and extreme geomagnetic events.
    Description: Geo.X
    Description: EU Horizon 2020
    Description: Deutsche Forschungsgemeinschaft (DFG) http://dx.doi.org/10.13039/501100001659
    Description: Helmholtz Association (亥姆霍兹联合会致力) http://dx.doi.org/10.13039/501100009318
    Keywords: 538.7 ; data assimilation ; Kalman filter ; machine learning ; neural networks ; plasmasphere ; plasma density
    Type: article
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Publication Date: 2021-07-04
    Description: Most common machine learning (ML) algorithms usually work well on balanced training sets, that is, datasets in which all classes are approximately represented equally. Otherwise, the accuracy estimates may be unreliable and classes with only a few values are often misclassified or neglected. This is known as a class imbalance problem in machine learning and datasets that do not meet this criterion are referred to as imbalanced data. Most datasets of soil classes are, therefore, imbalanced data. One of our main objectives is to compare eight resampling strategies that have been developed to counteract the imbalanced data problem. We compared the performance of five of the most common ML algorithms with the resampling approaches. The highest increase in prediction accuracy was achieved with SMOTE (the synthetic minority oversampling technique). In comparison to the baseline prediction on the original dataset, we achieved an increase of about 10, 20 and 10% in the overall accuracy, kappa index and F‐score, respectively. Regarding the ML approaches, random forest (RF) showed the best performance with an overall accuracy, kappa index and F‐score of 66, 60 and 57%, respectively. Moreover, the combination of RF and SMOTE improved the accuracy of the individual soil classes, compared to RF trained on the original dataset and allowed better prediction of soil classes with a low number of samples in the corresponding soil profile database, in our case for Chernozems. Our results show that balancing existing soil legacy data using synthetic sampling strategies can significantly improve the prediction accuracy in digital soil mapping (DSM). Highlights Spatial distribution of soil classes in Iran can be predicted using machine learning (ML) algorithms. The synthetic minority oversampling technique overcomes the drawback of imbalanced and highly biased soil legacy data. When combining a random forest model with synthetic sampling strategies the prediction accuracy of the soil model improves significantly. The resulting new soil map of Iran has a much higher spatial resolution compared to existing maps and displays new soil classes that have not yet been mapped in Iran.
    Description: Alexander von Humboldt‐Stiftung http://dx.doi.org/10.13039/100005156
    Description: German Research Foundation http://dx.doi.org/10.13039/501100001659
    Description: Soil and Water Research Institute, Agricultural Research, Education and Extension Organization, Karaj, Iran
    Keywords: 631.4 ; covariates ; imbalanced data ; machine learning ; random forest ; soil legacy data
    Type: article
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    Publication Date: 2021-07-03
    Description: The characterization of uncertainties in geophysical quantities is an important task with widespread applications for time series prediction, numerical modeling, and data assimilation. In this context, machine learning is a powerful tool for estimating complex patterns and their evolution through time. Here, we utilize a supervised machine learning approach to dynamically predict the spatiotemporal uncertainty of near‐surface wind velocities over the ocean. A recurrent neural network (RNN) is trained with reanalyzed 10 m wind velocities and corresponding precalculated uncertainty estimates during the 2012–2016 time period. Afterward, the neural network's performance is examined by analyzing its prediction for the subsequent year 2017. Our experiments show that a recurrent neural network can capture the globally prevalent wind regimes without prior knowledge about underlying physics and learn to derive wind velocity uncertainty estimates that are only based on wind velocity trajectories. At single training locations, the RNN‐based wind uncertainties closely match with the true reference values, and the corresponding intra‐annual variations are reproduced with high accuracy. Moreover, the neural network can predict global lateral distribution of uncertainties with small mismatch values after being trained only at a few isolated locations in different dynamic regimes. The presented approach can be combined with numerical models for a cost‐efficient generation of ensemble simulations or with ensemble‐based data assimilation to sample and predict dynamically consistent error covariance information of atmospheric boundary forcings.
    Description: Plain Language Summary: Machine learning is increasingly used for a wide range of applications in geosciences. In this study, we use an artificial neural network in the context of time series prediction. In particular, the goal is to use a neural network for learning spatial and temporal uncertainties that are associated with globally estimated wind velocities. Three well‐known wind velocity products are used for the time period 2012–2016 in different training, validation, and prediction scenarios. Our experiments show that a neural network can learn the prevailing global wind regimes and associate these with corresponding uncertainty estimates. Such a trained neural network can be used for different applications, for example, a cost‐efficient generation of ensemble simulations or for improving traditional data assimilation schemes.
    Description: Key Points: A recurrent neural network is set up to predict spatiotemporal uncertainties in wind velocity reanalyses. Global uncertainty maps can be derived from only few individual training locations. This method has benefits for time series prediction, ensemble simulations, and data assimilation.
    Keywords: 551.5 ; machine learning ; artificial neural network ; wind velocity ; atmospheric reanalysis ; ensemble simulation ; data assimilation
    Type: article
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...