ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

feed icon rss

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
  • 1
    Electronic Resource
    Electronic Resource
    Berkeley, Calif. : Berkeley Electronic Press (now: De Gruyter)
    Statistical applications in genetics and molecular biology 6.2007, 1, art25 
    ISSN: 1544-6115
    Source: Berkeley Electronic Press Academic Journals
    Topics: Biology
    Notes: When trying to learn a model for the prediction of an outcome given a set of covariates, a statistician has many estimation procedures in their toolbox. A few examples of these candidate learners are: least squares, least angle regression, random forests, and spline regression. Previous articles (van der Laan and Dudoit (2003); van der Laan et al. (2006); Sinisi et al. (2007)) theoretically validated the use of cross validation to select an optimal learner among many candidate learners. Motivated by this use of cross validation, we propose a new prediction method for creating a weighted combination of many candidate learners to build the super learner. This article proposes a fast algorithm for constructing a super learner in prediction which uses V-fold cross-validation to select weights to combine an initial set of candidate learners. In addition, this paper contains a practical demonstration of the adaptivity of this so called super learner to various true data generating distributions. This approach for construction of a super learner generalizes to any parameter which can be defined as a minimizer of a loss function.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    Electronic Resource
    Electronic Resource
    Berkeley, Calif. : Berkeley Electronic Press (now: De Gruyter)
    Statistical applications in genetics and molecular biology 5.2006, 1, art11 
    ISSN: 1544-6115
    Source: Berkeley Electronic Press Academic Journals
    Topics: Biology
    Notes: A new data filtering method for SELDI-TOF MS proteomic spectra data is described. We examined technical repeats (2 per subject) of intensity versus m/z (mass/charge) of bone marrow cell lysate for two groups of childhood leukemia patients: acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL). As others have noted, the type of data processing as well as experimental variability can have a disproportionate impact on the list of ``interesting'' proteins (see Baggerly et al. (2004)). We propose a list of processing and multiple testing techniques to correct for 1) background drift; 2) filtering using smooth regression and cross-validated bandwidth selection; 3) peak finding; and 4) methods to correct for multiple testing (van der Laan et al. (2005)). The result is a list of proteins (indexed by m/z) where average expression is significantly different among disease (or treatment, etc.) groups. The procedures are intended to provide a sensible and statistically driven algorithm, which we argue provides a list of proteins that have a significant difference in expression. Given no sources of unmeasured bias (such as confounding of experimental conditions with disease status), proteins found to be statistically significant using this technique have a low probability of being false positives.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Electronic Resource
    Electronic Resource
    Berkeley, Calif. : Berkeley Electronic Press (now: De Gruyter)
    Statistical applications in genetics and molecular biology 5.2006, 1, art14 
    ISSN: 1544-6115
    Source: Berkeley Electronic Press Academic Journals
    Topics: Biology
    Notes: Simultaneously testing a collection of null hypotheses about a data generating distribution based on a sample of independent and identically distributed observations is a fundamental and important statistical problem involving many applications. Methods based on marginal null distributions (i.e., marginal p-values) are attractive since the marginal p-values can be based on a user supplied choice of marginal null distributions and they are computationally trivial, but they, by necessity, are known to either be conservative or to rely on assumptions about the dependence structure between the test-statistics. Re-sampling based multiple testing (Westfall and Young, 1993) involves sampling from a joint null distribution of the test-statistics, and controlling (possibly in a, for example, step-down fashion) the user supplied type-I error rate under this joint null distribution for the test-statistics. A generally asymptotically valid null distribution avoiding the need for the subset pivotality condition for the vector of test-statistics was proposed in Pollard, van der Laan (2003) for null hypotheses about general real valued parameters. This null distribution was generalized in Dudoit, vanderLaan, Pollard (2004) to general null hypotheses and test-statistics. In ongoing recent work van der Laan, Hubbard (2005), we propose a new generally asymptotically valid null distribution for the test-statistics and a corresponding bootstrap estimate, whose marginal distributions are user supplied, and can thus be set equal to the (most powerful) marginal null distributions one would use in univariate testing to obtain a p-value. Previous proposed null distributions either relied on a restrictive subset pivotality condition (Westfall and Young) or did not guarantee this latter property (Dudoit, vanderLaan, Pollard, 2004). It is argued and illustrated that the resulting new re-sampling based multiple testing methods provide more accurate control of the wished Type-I error in finite samples and are more powerful. We establish formal results and investigate the practical performance of this methodology in a simulation and data analysis.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Electronic Resource
    Electronic Resource
    Berkeley, Calif. : Berkeley Electronic Press (now: De Gruyter)
    Statistical applications in genetics and molecular biology 4.2005, 1, art29 
    ISSN: 1544-6115
    Source: Berkeley Electronic Press Academic Journals
    Topics: Biology
    Notes: Simultaneously testing a collection of null hypotheses about a data generating distribution based on a sample of independent and identically distributed observations is a fundamental and important statistical problem involving many applications. In this article we propose a new re-sampling based multiple testing procedure asymptotically controlling the probability that the proportion of false positives among the set of rejections exceeds q at level alpha, where q and alpha are user supplied numbers. The procedure involves 1) specifying a conditional distribution for a guessed set of true null hypotheses, given the data, which asymptotically is degenerate at the true set of null hypotheses, and 2) specifying a generally valid null distribution for the vector of test-statistics proposed in Pollard & van der Laan (2003), and generalized in our subsequent article Dudoit, van der Laan, & Pollard (2004), van der Laan, Dudoit, & Pollard (2004), and van der Laan, Dudoit, & Pollard (2004b). Ingredient 1) is established by fitting the empirical Bayes two component mixture model (Efron (2001b)) to the data to obtain an upper bound for marginal posterior probabilities of the null being true, given the data. We establish the finite sample rational behind our proposal, and prove that this new multiple testing procedure asymptotically controls the wished tail probability for the proportion of false positives under general data generating distributions. In addition, we provide simulation studies establishing that this method is generally more powerful in finite samples than our previously proposed augmentation multiple testing procedure (van der Laan, Dudoit, & Pollard (2004b)) and competing procedures from the literature. Finally, we illustrate our methodology with a data analysis.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Electronic Resource
    Electronic Resource
    Berkeley, Calif. : Berkeley Electronic Press (now: De Gruyter)
    Statistical applications in genetics and molecular biology 6.2007, 1, art28 
    ISSN: 1544-6115
    Source: Berkeley Electronic Press Academic Journals
    Topics: Biology
    Notes: Microarray studies often need to simultaneously examine thousands of genes to determine which are differentially expressed. One main challenge in those studies is to find suitable multiple testing procedures that provide accurate control of the error rates of interest and meanwhile are most powerful, that is, they return the longest list of truly interesting genes among competitors. Many multiple testing methods have been developed recently for microarray data analysis, especially resampling based methods, such as permutation methods, the null-centered and scaled bootstrap (NCSB) method, and the quantile-transformed-bootstrap-distribution (QTBD) method. Each of these methods has its own merits and limitations. Theoretically permutation methods can fail to provide accurate control of Type I errors when the so-called subset pivotality condition is violated. The NCSB method does not suffer from that limitation, but an impractical number of bootstrap samples are often needed to get proper control of Type I errors. The newly developed QTBD method has the virtues of providing accurate control of Type I errors under few restrictions. However, the relative practical performance of the above three types of multiple testing methods remains unresolved. This paper compares the above three resampling based methods according to the control of family wise error rates (FWER) through data simulations. Results show that among the three resampling based methods, the QTBD method provides relatively accurate and powerful control in more general circumstances.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Electronic Resource
    Electronic Resource
    350 Main Street , Malden , MA 02148 , USA , and 108 Cowley Road , Oxford OX4 1JF , UK . : Blackwell Publishing
    Risk analysis 22 (2002), S. 0 
    ISSN: 1539-6924
    Source: Blackwell Publishing Journal Backfiles 1879-2005
    Topics: Energy, Environment Protection, Nuclear Power Engineering
    Notes: Probability models incorporating a deterministic versus stochastic infectious dose are described for estimating infection risk due to airborne pathogens that infect at low doses. Such pathogens can be occupational hazards or candidate agents for bioterrorism. Inputs include parameters for the infectious dose model, distribution parameters for ambient pathogen concentrations, the breathing rate, the duration of an exposure period, the anticipated number of exposure periods, and, if a respirator device is used, distribution parameters for respirator penetration values. Application of the models is illustrated with a hypothetical scenario involving exposure to Coccidioides immitis, a fungus present in soil in areas of the southwestern United States. Inhaling C. immitis spores causes a respiratory tract infection and is a recognized occupational hazard in jobs involving soil dust exposure in endemic areas. An uncertainty analysis is applied to risk estimation in the context of selecting respiratory protection with a desired degree of efficacy.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Electronic Resource
    Electronic Resource
    Springer
    Lifetime data analysis 3 (1997), S. 77-91 
    ISSN: 1572-9249
    Source: Springer Online Journal Archives 1860-2000
    Topics: Mathematics
    Notes: Abstract In biostatistical applications interest often focuses on the estimation of the distribution of time T between two consecutive events. If the initial event time is observed and the subsequent event time is only known to be larger or smaller than an observed monitoring time C, then the data conforms to the well understood singly-censored current status model, also known as interval censored data, case I. Additional covariates can be used to allow for dependent censoring and to improve estimation of the marginal distribution of T. Assuming a wrong model for the conditional distribution of T, given the covariates, will lead to an inconsistent estimator of the marginal distribution. On the other hand, the nonparametric maximum likelihood estimator of FT requires splitting up the sample in several subsamples corresponding with a particular value of the covariates, computing the NPMLE for every subsample and then taking an average. With a few continuous covariates the performance of the resulting estimator is typically miserable. In van der Laan, Robins (1996) a locally efficient one-step estimator is proposed for smooth functionals of the distribution of T, assuming nothing about the conditional distribution of T, given the covariates, but assuming a model for censoring, given the covariates. The estimators are asymptotically linear if the censoring mechanism is estimated correctly. The estimator also uses an estimator of the conditional distribution of T, given the covariates. If this estimate is consistent, then the estimator is efficient and if it is inconsistent, then the estimator is still consistent and asymptotically normal. In this paper we show that the estimators can also be used to estimate the distribution function in a locally optimal way. Moreover, we show that the proposed estimator can be used to estimate the distribution based on interval censored data (T is now known to lie between two observed points) in the presence of covariates. The resulting estimator also has a known influence curve so that asymptotic confidence intervals are directly available. In particular, one can apply our proposal to the interval censored data without covariates. In Geskus (1992) the information bound for interval censored data with two uniformly distributed monitoring times at the uniform distribution (for T has been computed. We show that the relative efficiency of our proposal w.r.t. this optimal bound equals 0.994, which is also reflected in finite sample simulations. Finally, the good practical performance of the estimator is shown in a simulation study.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Electronic Resource
    Electronic Resource
    Springer
    Lifetime data analysis 6 (2000), S. 237-250 
    ISSN: 1572-9249
    Keywords: right-censored data ; reporting delays ; influence curve ; Kaplan-Meier estimator
    Source: Springer Online Journal Archives 1860-2000
    Topics: Mathematics
    Notes: Abstract In disease registries there can be a delay between death of asubject and the reporting of this death to the data analyst.If researchers use the Kaplan-Meier estimator and implicitlyassumed that subjects who have yet to have death reported arestill alive, i.e. are censored at the time of analysis, the Kaplan-Meierestimator is typically inconsistent. Assuming censoring is independentof failure, we provide a simple estimator that is consistentand asymptotically efficient. We also provide estimates of theasymptotic variance of our estimator and simulations that demonstratethe favorable performance of these estimators. Finally, we demonstrateour methods by analyzing AIDS survival data. This analysis underscoresthe pitfalls of not accounting for delay when estimating thesurvival distribution and suggests a significant reduction inbias by using our estimator.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Publication Date: 2009-03-15
    Print ISSN: 0013-936X
    Electronic ISSN: 1520-5851
    Topics: Chemistry and Pharmacology , Energy, Environment Protection, Nuclear Power Engineering
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...