ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Ihre E-Mail wurde erfolgreich gesendet. Bitte prüfen Sie Ihren Maileingang.

Leider ist ein Fehler beim E-Mail-Versand aufgetreten. Bitte versuchen Sie es erneut.

Vorgang fortführen?

Exportieren
Filter
  • covariates
  • 2020-2022  (1)
  • 1995-1999  (2)
  • 1980-1984
  • 1965-1969
  • 1955-1959
Sammlung
Verlag/Herausgeber
Erscheinungszeitraum
  • 2020-2022  (1)
  • 1995-1999  (2)
  • 1980-1984
  • 1965-1969
  • 1955-1959
  • +
Jahr
  • 1
    Publikationsdatum: 2021-07-04
    Beschreibung: Most common machine learning (ML) algorithms usually work well on balanced training sets, that is, datasets in which all classes are approximately represented equally. Otherwise, the accuracy estimates may be unreliable and classes with only a few values are often misclassified or neglected. This is known as a class imbalance problem in machine learning and datasets that do not meet this criterion are referred to as imbalanced data. Most datasets of soil classes are, therefore, imbalanced data. One of our main objectives is to compare eight resampling strategies that have been developed to counteract the imbalanced data problem. We compared the performance of five of the most common ML algorithms with the resampling approaches. The highest increase in prediction accuracy was achieved with SMOTE (the synthetic minority oversampling technique). In comparison to the baseline prediction on the original dataset, we achieved an increase of about 10, 20 and 10% in the overall accuracy, kappa index and F‐score, respectively. Regarding the ML approaches, random forest (RF) showed the best performance with an overall accuracy, kappa index and F‐score of 66, 60 and 57%, respectively. Moreover, the combination of RF and SMOTE improved the accuracy of the individual soil classes, compared to RF trained on the original dataset and allowed better prediction of soil classes with a low number of samples in the corresponding soil profile database, in our case for Chernozems. Our results show that balancing existing soil legacy data using synthetic sampling strategies can significantly improve the prediction accuracy in digital soil mapping (DSM). Highlights Spatial distribution of soil classes in Iran can be predicted using machine learning (ML) algorithms. The synthetic minority oversampling technique overcomes the drawback of imbalanced and highly biased soil legacy data. When combining a random forest model with synthetic sampling strategies the prediction accuracy of the soil model improves significantly. The resulting new soil map of Iran has a much higher spatial resolution compared to existing maps and displays new soil classes that have not yet been mapped in Iran.
    Beschreibung: Alexander von Humboldt‐Stiftung http://dx.doi.org/10.13039/100005156
    Beschreibung: German Research Foundation http://dx.doi.org/10.13039/501100001659
    Beschreibung: Soil and Water Research Institute, Agricultural Research, Education and Extension Organization, Karaj, Iran
    Schlagwort(e): 631.4 ; covariates ; imbalanced data ; machine learning ; random forest ; soil legacy data
    Materialart: article
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 2
    Digitale Medien
    Digitale Medien
    Springer
    Pharmaceutical research 16 (1999), S. 1608-1615 
    ISSN: 1573-904X
    Schlagwort(e): tenidap ; pharmacokinetics ; EM algorithm ; nonlinear mixed-effects modelling ; covariates
    Quelle: Springer Online Journal Archives 1860-2000
    Thema: Chemie und Pharmazie
    Notizen: Abstract Purpose. To develop a pharmacokinetic model for tenidap and to identify important relationships between the pharmacokinetic parameters and available covariates. Methods. Plasma concentration data from several phase I and phase II studies were used to develop a pharmacokinetic model for tenidap, a novel anti-rheumatic drug. An appropriate pharmacokinetic model was selected on the basis of individual nonlinear regression analyses and an EM algorithm was used to perform a nonlinear mixed-effects analysis. Scatter plots of posterior individual pharmacokinetic parameters were used to identify possible covariate effects. Results. Predicted responses were in good agreement with the observed data. A bi-exponential model with zero order absorption was subsequently used to develop the mixed-effects model. Covariate relationships selected on the basis of differences in the objective function, although statistically significant, were not particularly strong. Conclusions. The pharmacokinetics of tenidap can be described by a bi-exponential model with zero order absorption. Based on differences in the log-likelihood, significant covariate-parameter relationships were identified between smoking and CL, and between gender and Vss and CLd. Simulated sparse data analyses indicated that the model would be robust for the analysis of sparse data generated in observational studies.
    Materialart: Digitale Medien
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 3
    Digitale Medien
    Digitale Medien
    Springer
    Annals of the Institute of Statistical Mathematics 50 (1998), S. 627-653 
    ISSN: 1572-9052
    Schlagwort(e): Censored survival data ; immune proportion ; covariates ; mixture models ; failure time data ; exponential family ; boundary hypothesis tests
    Quelle: Springer Online Journal Archives 1860-2000
    Thema: Mathematik
    Notizen: Abstract We analyse an exponential family of distributions which generalises the exponential distribution for censored failure time data, analogous to the way in which the class of generalised linear models generalises the normal distribution. The parameter of the distribution depends on a linear combination of covariates via a possibly nonlinear link function, and we allow another level of heterogeneity: the data may contain "immune" individuals who are not subject to failure. Thus the data is modelled by a mixture of a distribution from the exponential family and a "mass at infinity" representing individuals who never fail. Our results include large sample distributions for parameter estimators and for hypothesis test statistics obtained by maximising the likelihood of a sample. The asymptotic distribution of the likelihood ratio test statistic for the hypothesis that there are no immunes present in the population is shown to be "non-standard"; it is a 50-50 mixture of a chi-squared distribution on 1 degree of freedom and a point mass at 0. Our analysis clearly shows how "negligibility" of individual covariate values and "sufficient followup" conditions are required for the asymptotic properties.
    Materialart: Digitale Medien
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
Schließen ⊗
Diese Webseite nutzt Cookies und das Analyse-Tool Matomo. Weitere Informationen finden Sie hier...