ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

feed icon rss

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
  • 1
    Electronic Resource
    Electronic Resource
    Berkeley, Calif. : Berkeley Electronic Press (now: De Gruyter)
    Statistical applications in genetics and molecular biology 3.2004, 1, art4 
    ISSN: 1544-6115
    Source: Berkeley Electronic Press Academic Journals
    Topics: Biology
    Notes: Likelihood-based cross-validation is a statistical tool for selecting a density estimate based on n i.i.d. observations from the true density among a collection of candidate density estimators. General examples are the selection of a model indexing a maximum likelihood estimator, and the selection of a bandwidth indexing a nonparametric (e.g. kernel) density estimator. In this article, we establish a finite sample result for a general class of likelihood-based cross-validation procedures (as indexed by the type of sample splitting used, e.g. V-fold cross-validation). This result implies that the cross-validation selector performs asymptotically as well (w.r.t. to the Kullback-Leibler distance to the true density) as a benchmark model selector which is optimal for each given dataset and depends on the true density. Crucial conditions of our theorem are that the size of the validation sample converges to infinity, which excludes leave-one-out cross-validation, and that the candidate density estimates are bounded away from zero and infinity. We illustrate these asymptotic results and the practical performance of likelihood-based cross-validation for the purpose of bandwidth selection with a simulation study. Moreover, we use likelihood-based cross-validation in the context of regulatory motif detection in DNA sequences.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    Electronic Resource
    Electronic Resource
    Berkeley, Calif. : Berkeley Electronic Press (now: De Gruyter)
    Statistical applications in genetics and molecular biology 5.2006, 1, art19 
    ISSN: 1544-6115
    Source: Berkeley Electronic Press Academic Journals
    Topics: Biology
    Notes: Consider the standard multiple testing problem where many hypotheses are to be tested, each hypothesis is associated with a test statistic, and large test statistics provide evidence against the null hypotheses. One proposal to provide probabilistic control of Type-I errors is the use of procedures ensuring that the expected number of false positives does not exceed a user-supplied threshold. Among such multiple testing procedures, we derive the most powerful method, meaning the test statistic cutoffs that maximize the expected number of true positives. Unfortunately, these optimal cutoffs depend on the true unknown data generating distribution, so could never be used in a practical setting. We instead consider splitting the sample so that the optimal cutoffs are estimated from a portion of the data, and then testing on the remaining data using these estimated cutoffs. When the null distributions for all test statistics are the same, the obvious way to control the expected number of false positives would be to use a common cutoff for all tests. In this work, we consider the common cutoff method as a benchmark multiple testing procedure. We show that in certain circumstances the use of estimated optimal cutoffs via sample splitting can dramatically outperform this benchmark method, resulting in increased true discoveries, while retaining Type-I error control. This paper is an updated version of the work presented in Rubin et al. (2005), later expanded upon by Wasserman and Roeder (2006).
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Electronic Resource
    Electronic Resource
    Berkeley, Calif. : Berkeley Electronic Press (now: De Gruyter)
    Statistical applications in genetics and molecular biology 3.2004, 1, art13 
    ISSN: 1544-6115
    Source: Berkeley Electronic Press Academic Journals
    Topics: Biology
    Notes: The present article proposes general single-step multiple testing procedures for controlling Type I error rates defined as arbitrary parameters of the distribution of the number of Type I errors, such as the generalized family-wise error rate. A key feature of our approach is the test statistics null distribution (rather than data generating null distribution) used to derive cut-offs (i.e., rejection regions) for these test statistics and the resulting adjusted p-values. For general null hypotheses, corresponding to submodels for the data generating distribution, we identify an asymptotic domination condition for a null distribution under which single-step common-quantile and common-cut-off procedures asymptotically control the Type I error rate, for arbitrary data generating distributions, without the need for conditions such as subset pivotality. Inspired by this general characterization of a null distribution, we then propose as an explicit null distribution the asymptotic distribution of the vector of null value shifted and scaled test statistics. In the special case of family-wise error rate (FWER) control, our method yields the single-step minP and maxT procedures, based on minima of unadjusted p-values and maxima of test statistics, respectively, with the important distinction in the choice of null distribution. Single-step procedures based on consistent estimators of the null distribution are shown to also provide asymptotic control of the Type I error rate. A general bootstrap algorithm is supplied to conveniently obtain consistent estimators of the null distribution. The special cases of t- and F-statistics are discussed in detail. The companion articles focus on step-down multiple testing procedures for control of the FWER (van der Laan et al., 2004b) and on augmentations of FWER-controlling methods to control error rates such as tail probabilities for the number of false positives and for the proportion of false positives among the rejected hypotheses (van der Laan et al., 2004a). The proposed bootstrap multiple testing procedures are evaluated by a simulation study and applied to genomic data in the fourth article of the series (Pollard et al., 2004).
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Electronic Resource
    Electronic Resource
    Berkeley, Calif. : Berkeley Electronic Press (now: De Gruyter)
    Statistical applications in genetics and molecular biology 2.2003, 1, art5 
    ISSN: 1544-6115
    Source: Berkeley Electronic Press Academic Journals
    Topics: Biology
    Notes: Identification of transcription factor binding sites (regulatory motifs) is a major interest in contemporary biology. We propose a new likelihood based method, COMODE, for identifying structural motifs in DNA sequences. Commonly used methods (e.g. MEME, Gibbs motif sampler) model binding sites as families of sequences described by a position weight matrix (PWM) and identify PWMs that maximize the likelihood of observed sequence data under a simple multinomial mixture model. This model assumes that the positions of the PWM correspond to independent multinomial distributions with four cell probabilities. We address supervising the search for DNA binding sites using the information derived from structural characteristics of protein-DNA interactions. We extend the simple multinomial mixture model to a constrained multinomial mixture model by incorporating constraints on the information content profiles or on specific parameters of the motif PWMs. The parameters of this extended model are estimated by maximum likelihood using a nonlinear constraint optimization method. Likelihood-based cross-validation is used to select model parameters such as motif width and constraint type. The performance of COMODE is compared with existing motif detection methods on simulated data that incorporate real motif examples from Saccharomyces cerevisiae. The proposed method is especially effective when the motif of interest appears as a weak signal in the data. Some of the transcription factor binding data of Lee et al. (2002) were also analyzed using COMODE and biologically verified sites were identified.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Electronic Resource
    Electronic Resource
    Berkeley, Calif. : Berkeley Electronic Press (now: De Gruyter)
    Statistical applications in genetics and molecular biology 3.2004, 1, art15 
    ISSN: 1544-6115
    Source: Berkeley Electronic Press Academic Journals
    Topics: Biology
    Notes: This article shows that any single-step or stepwise multiple testing procedure (asymptotically) controlling the family-wise error rate (FWER) can be augmented into procedures that (asymptotically) control tail probabilities for the number of false positives and the proportion of false positives among the rejected hypotheses. Specifically, given any procedure that (asymptotically) controls the FWER at level alpha, we propose simple augmentation procedures that provide (asymptotic) level-alpha control of: (i) the generalized family-wise error rate, i.e., the tail probability, gFWER(k), that the number of Type I errors exceeds a user-supplied integer k, and (ii) the tail probability, TPPFP(q), that the proportion of Type I errors among the rejected hypotheses exceeds a user-supplied value 0〈q〈1. Existing approaches for control of the proportion of false positives typically rely on the assumption that the test statistics are independent, while our proposed augmentation procedures control the gFWER and TPPFP for general data generating distributions, with arbitrary dependence structures among variables. Applying the augmentation methods to step-down multiple testing procedures that control the FWER asymptotically exactly at level alpha (van der Laan et al., 2004), yields procedures that also provide exact asymptotic control of the gFWER and TPPFP at level alpha. The adjusted p-values for the gFWER and TPPFP-controlling augmentation procedures are shown to be simple functions of the adjusted p-values for the original FWER-controlling procedure. Finally, two simple conservative procedures are proposed for controlling the false discovery rate.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Electronic Resource
    Electronic Resource
    Berkeley, Calif. : Berkeley Electronic Press (now: De Gruyter)
    Statistical applications in genetics and molecular biology 3.2004, 1, art14 
    ISSN: 1544-6115
    Source: Berkeley Electronic Press Academic Journals
    Topics: Biology
    Notes: The present article proposes two step-down multiple testing procedures for asymptotic control of the family-wise error rate (FWER): the first procedure is based on maxima of test statistics (step-down maxT), while the second relies on minima of unadjusted p-values (step-down minP). A key feature of our approach is the characterization and construction of a test statistics null distribution (rather than data generating null distribution) for deriving cut-offs for these test statistics (i.e., rejection regions) and the resulting adjusted p-values. For general null hypotheses, corresponding to submodels for the data generating distribution, we identify an asymptotic domination condition for a null distribution under which the step-down maxT and minP procedures asymptotically control the Type I error rate, for arbitrary data generating distributions, without the need for conditions such as subset pivotality. Inspired by this general characterization, we then propose as an explicit null distribution the asymptotic distribution of the vector of null value shifted and scaled test statistics. Step-down procedures based on consistent estimators of the null distribution are shown to also provide asymptotic control of the Type I error rate. A general bootstrap algorithm is supplied to conveniently obtain consistent estimators of the null distribution.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Electronic Resource
    Electronic Resource
    [s.l.] : Nature America, Inc.
    Nature genetics 23 (1999), S. 42-42 
    ISSN: 1546-1718
    Source: Nature Archives 1869 - 2009
    Topics: Biology , Medicine
    Notes: [Auszug] We explore an ensemble of multivariate statistical methods for the analysis of gene expression data from cDNA microarray experiments. The statistical questions we investigate are motivated by the experimental program carried out in the laboratories of professors Brown and Botstein at Stanford ...
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
  • 9
    Publication Date: 1999-11-01
    Print ISSN: 1061-4036
    Electronic ISSN: 1546-1718
    Topics: Biology , Medicine
    Published by Springer Nature
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    Publication Date: 2017-05-15
    Print ISSN: 1548-7091
    Electronic ISSN: 1548-7105
    Topics: Biology , Medicine
    Published by Springer Nature
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...