ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

Treffer pro Seite

Treffer 1 - 3 | 3 Treffer

Sortierung

Unbekannt

Synthetic resampling strategies and machine learning for digital soil mapping in Iran (2021)

Taghizadeh‐Mehrjardi, Ruhollah ; Schmidt, Karsten ; Eftekhari, Kamran ; [weitere]

Blackwell Publishing Ltd | Oxford, UK

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2021-07-04

Beschreibung: Most common machine learning (ML) algorithms usually work well on balanced training sets, that is, datasets in which all classes are approximately represented equally. Otherwise, the accuracy estimates may be unreliable and classes with only a few values are often misclassified or neglected. This is known as a class imbalance problem in machine learning and datasets that do not meet this criterion are referred to as imbalanced data. Most datasets of soil classes are, therefore, imbalanced data. One of our main objectives is to compare eight resampling strategies that have been developed to counteract the imbalanced data problem. We compared the performance of five of the most common ML algorithms with the resampling approaches. The highest increase in prediction accuracy was achieved with SMOTE (the synthetic minority oversampling technique). In comparison to the baseline prediction on the original dataset, we achieved an increase of about 10, 20 and 10% in the overall accuracy, kappa index and F‐score, respectively. Regarding the ML approaches, random forest (RF) showed the best performance with an overall accuracy, kappa index and F‐score of 66, 60 and 57%, respectively. Moreover, the combination of RF and SMOTE improved the accuracy of the individual soil classes, compared to RF trained on the original dataset and allowed better prediction of soil classes with a low number of samples in the corresponding soil profile database, in our case for Chernozems. Our results show that balancing existing soil legacy data using synthetic sampling strategies can significantly improve the prediction accuracy in digital soil mapping (DSM). Highlights Spatial distribution of soil classes in Iran can be predicted using machine learning (ML) algorithms. The synthetic minority oversampling technique overcomes the drawback of imbalanced and highly biased soil legacy data. When combining a random forest model with synthetic sampling strategies the prediction accuracy of the soil model improves significantly. The resulting new soil map of Iran has a much higher spatial resolution compared to existing maps and displays new soil classes that have not yet been mapped in Iran.

Beschreibung: Alexander von Humboldt‐Stiftung http://dx.doi.org/10.13039/100005156

Beschreibung: German Research Foundation http://dx.doi.org/10.13039/501100001659

Beschreibung: Soil and Water Research Institute, Agricultural Research, Education and Extension Organization, Karaj, Iran

Schlagwort(e): 631.4 ; covariates ; imbalanced data ; machine learning ; random forest ; soil legacy data

Materialart: article

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

CITATION GEO-LEO

S·F·X

Volltext

Digitale Medien

A Pharmacokinetic Model for Tenidap in Normal Volunteers and Rheumatoid Arthritis Patients (1999)

Evans, Lynne ; Aarons, Leon ; Coates, Peter

Springer

Pharmaceutical research 16 (1999), S. 1608-1615

zur Merkliste hinzufügen auf der Merkliste

Details

ISSN: 1573-904X

Schlagwort(e): tenidap ; pharmacokinetics ; EM algorithm ; nonlinear mixed-effects modelling ; covariates

Quelle: Springer Online Journal Archives 1860-2000

Thema: Chemie und Pharmazie

Notizen: Abstract Purpose. To develop a pharmacokinetic model for tenidap and to identify important relationships between the pharmacokinetic parameters and available covariates. Methods. Plasma concentration data from several phase I and phase II studies were used to develop a pharmacokinetic model for tenidap, a novel anti-rheumatic drug. An appropriate pharmacokinetic model was selected on the basis of individual nonlinear regression analyses and an EM algorithm was used to perform a nonlinear mixed-effects analysis. Scatter plots of posterior individual pharmacokinetic parameters were used to identify possible covariate effects. Results. Predicted responses were in good agreement with the observed data. A bi-exponential model with zero order absorption was subsequently used to develop the mixed-effects model. Covariate relationships selected on the basis of differences in the objective function, although statistically significant, were not particularly strong. Conclusions. The pharmacokinetics of tenidap can be described by a bi-exponential model with zero order absorption. Based on differences in the log-likelihood, significant covariate-parameter relationships were identified between smoking and CL, and between gender and Vss and CLd. Simulated sparse data analyses indicated that the model would be robust for the analysis of sparse data generated in observational studies.

Materialart: Digitale Medien

URL: http://dx.doi.org/10.1023/A:1018969024101

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

Artikel (Nationallizenzen)

Volltext

Digitale Medien

Asymptotic Properties of a Class of Mixture Models for Failure Data: The Interior and Boundary Cases (1998)

Vu, H. T. V. ; Maller, R. A. ; Zhou, X.

Springer

Annals of the Institute of Statistical Mathematics 50 (1998), S. 627-653

zur Merkliste hinzufügen auf der Merkliste

Details

ISSN: 1572-9052

Schlagwort(e): Censored survival data ; immune proportion ; covariates ; mixture models ; failure time data ; exponential family ; boundary hypothesis tests

Quelle: Springer Online Journal Archives 1860-2000

Thema: Mathematik

Notizen: Abstract We analyse an exponential family of distributions which generalises the exponential distribution for censored failure time data, analogous to the way in which the class of generalised linear models generalises the normal distribution. The parameter of the distribution depends on a linear combination of covariates via a possibly nonlinear link function, and we allow another level of heterogeneity: the data may contain "immune" individuals who are not subject to failure. Thus the data is modelled by a mixture of a distribution from the exponential family and a "mass at infinity" representing individuals who never fail. Our results include large sample distributions for parameter estimators and for hypothesis test statistics obtained by maximising the likelihood of a sample. The asymptotic distribution of the likelihood ratio test statistic for the hypothesis that there are no immunes present in the population is shown to be "non-standard"; it is a 50-50 mixture of a chi-squared distribution on 1 degree of freedom and a point mass at 0. Our analysis clearly shows how "negligibility" of individual covariate values and "sufficient followup" conditions are required for the asymptotic properties.

Materialart: Digitale Medien

URL: http://dx.doi.org/10.1023/A:1003704728573

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

Artikel (Nationallizenzen)

Volltext

Treffer 1 - 3 | 3 Treffer