ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
  • 1
    Publication Date: 2013-02-24
    Description: Gaussian processes are widely used in nonparametric regression, classification and spatiotemporal modelling, facilitated in part by a rich literature on their theoretical properties. However, one of their practical limitations is expensive computation, typically on the order of n 3 where n is the number of data points, in performing the necessary matrix inversions. For large datasets, storage and processing also lead to computational bottlenecks, and numerical stability of the estimates and predicted values degrades with increasing n . Various methods have been proposed to address these problems, including predictive processes in spatial data analysis and the subset-of-regressors technique in machine learning. The idea underlying these approaches is to use a subset of the data, but this raises questions concerning sensitivity to the choice of subset and limitations in estimating fine-scale structure in regions that are not well covered by the subset. Motivated by the literature on compressive sensing, we propose an alternative approach that involves linear projection of all the data points onto a lower-dimensional subspace. We demonstrate the superiority of this approach from a theoretical perspective and through simulated and real data examples.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2013-02-24
    Description: Karl Pearson edited Biometrika for the first 35 years of its existence. Not only did he shape the journal, he also contributed over 200 pieces and inspired, more or less directly, most of the other contributions. The journal could not be separated from the man.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Publication Date: 2013-02-24
    Description: In the modelling of longitudinal data from several groups, appropriate handling of the dependence structure is of central importance. Standard methods include specifying a single covariance matrix for all groups or independently estimating the covariance matrix for each group without regard to the others, but when these model assumptions are incorrect, these techniques can lead to biased mean effects or loss of efficiency, respectively. Thus, it is desirable to develop methods for simultaneously estimating the covariance matrix for each group that will borrow strength across groups in a way that is ultimately informed by the data. In addition, for several groups with covariance matrices of even medium dimension, it is difficult to manually select a single best parametric model among the huge number of possibilities given by incorporating structural zeros and/or commonality of individual parameters across groups. In this paper we develop a family of nonparametric priors using the matrix stick-breaking process of Dunson et al. (2008) that seeks to accomplish this task by parameterizing the covariance matrices in terms of their modified Cholesky decompositions (Pourahmadi, 1999). We establish some theoretical properties of these priors, examine their effectiveness via a simulation study, and illustrate the priors using data from a longitudinal clinical trial.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2013-02-24
    Description: Full Bayesian computational inference for model determination in undirected graphical models is currently restricted to decomposable graphs or other special cases, except for small-scale problems, say up to 15 variables. In this paper we develop new, more efficient methodology for such inference, by making two contributions to the computational geometry of decomposable graphs. The first of these provides sufficient conditions under which it is possible to completely connect two disconnected complete subsets of vertices, or perform the reverse procedure, yet maintain decomposability of the graph. The second is a new Markov chainMonte Carlo sampler for arbitrary positive distributions on decomposable graphs, taking a junction tree representing the graph as its state variable. The resulting methodology is illustrated with numerical experiments on three models.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2013-02-24
    Description: In longitudinal data analysis, statistical inference for sparse data and dense data could be substantially different. For kernel smoothing, the estimate of the mean function, the convergence rates and the limiting variance functions are different in the two scenarios. This phenomenon poses challenges for statistical inference, as a subjective choice between the sparse and dense cases may lead to wrong conclusions. We develop methods based on self-normalization that can adapt to the sparse and dense cases in a unified framework. Simulations show that the proposed methods outperform some existing methods.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2013-02-24
    Description: The problem of testing smooth components of an extended generalized additive model for equality to zero is considered. Confidence intervals for such components exhibit good across-the-function coverage probabilities if based on the approximate result , where f is the vector of evaluated values for the smooth component of interest and V f is the covariance matrix for f according to the Bayesian view of the smoothing process. Based on this result, a Wald-type test of f =0 is proposed. It is shown that care must be taken in selecting the rank used in the test statistic. The method complements previous work by extending applicability beyond the Gaussian case, while considering tests of zero effect rather than testing the parametric hypothesis given by the null space of the component’s smoothing penalty. The proposed p -values are routine and efficient to compute from a fitted model, without requiring extra model fits or null distribution simulation.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2013-02-24
    Description: Motivated by analysis of genetical genomics data, we introduce a sparse high-dimensional multivariate regression model for studying conditional independence relationships among a set of genes adjusting for possible genetic effects. The precision matrix in the model specifies a covariate-adjusted Gaussian graph, which presents the conditional dependence structure of gene expression after the confounding genetic effects on gene expression are taken into account. We present a covariate-adjusted precision matrix estimation method using a constrained 1 minimization, which can be easily implemented by linear programming. Asymptotic convergence rates in various matrix norms and sign consistency are established for the estimators of the regression coefficients and the precision matrix, allowing both the number of genes and the number of the genetic variants to diverge. Simulation shows that the proposed method results in significant improvements in both precision matrix estimation and graphical structure selection when compared to the standard Gaussian graphical model assuming constant means. The proposed method is applied to yeast genetical genomics data for the identification of the gene network among a set of genes in the mitogen-activated protein kinase pathway.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Publication Date: 2013-02-24
    Description: This paper introduces, constructs and studies a new class of arrays, called strong orthogonal arrays, as suitable designs for computer experiments. A strong orthogonal array of strength t enjoys better space-filling properties than a comparable orthogonal array in all dimensions lower than t while retaining the space-filling properties of the latter in t dimensions. Latin hypercubes based on strong orthogonal arrays of strength t are more space-filling than comparable orthogonal array-based Latin hypercubes in all g dimensions for any 2 ≤ g ≤ t – 1.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Publication Date: 2013-02-24
    Description: This paper considers the construction of blocked two-level regular designs with weak minimum aberration. We first obtain the minimum value of the number of two-factor interactions which are aliased with the block effects. Based on this result, two methods are then proposed in two different scenarios to construct weak minimum aberration blocked two-level designs with respect to some existing combined wordlength patterns.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    Publication Date: 2013-02-24
    Description: Rathbun et al. (2007) and Waagepetersen (2008) propose estimating functions for parameters of Poisson point process intensity that may be applied when space- and/or time-varying covariates are sampled from a probability-based sampling design. This paper demonstrates that Waageptersen’s estimating function is optimal in a class of weighted estimating functions.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    Publication Date: 2013-02-24
    Description: We show that the proportional likelihood ratio model proposed recently by Luo & Tsai (2012) enjoys model-invariant properties under certain forms of nonignorable missing mechanisms and randomly double-truncated data, so that target parameters in the population can be estimated consistently from those biased samples. We also construct an alternative estimator for the target parameters by maximizing a pseudolikelihood that eliminates a functional nuisance parameter in the model. The corresponding estimating equation has a U-statistic structure. As an added advantage of the proposed method, a simple score-type test is developed to test a null hypothesis on the regression coefficients. Simulations show that the proposed estimator has a small-sample efficiency similar to that of the nonparametric likelihood estimator and performs well for certain nonignorable missing data problems.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2013-02-24
    Description: This paper considers benchmarking issues in the context of small area estimation. We find optimal estimators within the class of benchmarked linear estimators under linear constraints. This extends existing results for external and internal benchmarking, and also links the two. Necessary and sufficient conditions for self-benchmarking are found for an augmented model. Most results of this paper are found using ideas of orthogonal projection
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    Publication Date: 2013-02-24
    Description: Suppose we are interested in the effect of a binary treatment on an outcome where that relationship is confounded by an ordinal confounder. We assume that the true confounder is not observed but, rather, we observe a nondifferentially mismeasured version of it. We show that, under certain monotonicity assumptions about its effect on the treatment and on the outcome, an effect measure controlling for the mismeasured confounder will fall between the corresponding crude and true effect measures. We also present results for coarsened and, under further assumptions, multiple misclassified confounders.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    Publication Date: 2013-02-24
    Description: Applying concepts from partial identification to the domain of finite population sampling, we propose a method for interval estimation of a population mean when the probabilities of sample selection lie within a posited interval. The interval estimate is derived from sharp bounds on the Hajek (1971) estimator of the population mean. We demonstrate the method’s utility for sensitivity analysis by applying it to a sample of needles collected as part of a syringe tracking and testing programme in New Haven, Connecticut.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    Publication Date: 2013-02-24
    Description: Since many environmental processes are spatial in extent, a single extreme event may affect several locations, and the spatial dependence must be taken into account in an appropriate way. This paper proposes a framework for conditional simulation of max-stable processes and gives closed forms for the regular conditional distributions of Brown–Resnick and Schlather processes. We test the method on simulated data and present applications to extreme rainfall around Zurich and extreme temperatures in Switzerland. The proposed framework provides accurate conditional simulations and can handle problems of realistic size.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    Publication Date: 2013-02-24
    Description: We study the nonparametric estimation of the cumulative incidence function and the cause-specific hazard function for current status data with competing risks via kernel smoothing. A smoothed naive nonparametric maximum likelihood estimator and a smoothed full nonparametric maximum likelihood estimator are shown to have pointwise asymptotic normality and faster convergence rates than the corresponding unsmoothed nonparametric likelihood estimators. Using the smoothed estimators and the plug-in principle, we can estimate the cause-specific hazard function, which has not been studied previously. We also propose semi-smoothed estimators of the cause-specific hazard as an alternative to the smoothed estimator and demonstrate that neither is uniformly more efficient than the other. Numerical studies show that a smoothed bootstrap method works well for selecting the bandwidths in the smoothed nonparametric estimation. The use of the estimators is exemplified by an application to cumulative incidence and hazard of subtype-specific HIV infection from a sero-prevalence study in injecting drug users in Thailand.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2013-02-24
    Description: Highlights, trends and influences are identified associated with the pages of Biometrika subsequent to the editorship of Karl Pearson.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    Publication Date: 2013-02-24
    Description: Copy number variant is an important type of genetic structural variation appearing in germline DNA, ranging from common to rare in a population. Both rare and common copy number variants have been reported to be associated with complex diseases, so it is important to identify both simultaneously based on a large set of population samples. We develop a proportion adaptive segment selection procedure that automatically adjusts to the unknown proportions of the carriers of the segment variants. We characterize the detection boundary that separates the region where a segment variant is detectable by some method from the region where it cannot be detected. Although the detection boundaries are very different for the rare and common segment variants, it is shown that the proposed procedure can reliably identify both whenever they are detectable. Compared with methods for single-sample analysis, this procedure gains power by pooling information from multiple samples. The method is applied to analyse neuroblastoma samples and identifies a large number of copy number variants that are missed by single-sample methods.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    Publication Date: 2013-02-24
    Description: We derive sufficient conditions for the cross-correlation coefficient of a multivariate spatial process to vary with location when the spatial model is augmented with nugget effects. The derived class is valid for any choice of covariance functions, and yields substantial flexibility between multiple processes. The key is to identify the cross-correlation coefficient matrix with a contraction matrix, which can be either diagonal, implying a parsimonious formulation, or a fully general contraction matrix, yielding greater flexibility but added model complexity. We illustrate the approach with a bivariate minimum and maximum temperature dataset in Colorado, allowing the two variables to be positively correlated at low elevations and nearly independent at high elevations, while still yielding a positive definite covariance matrix.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    Publication Date: 2013-02-24
    Description: Variable screening techniques have been proposed to mitigate the impact of high dimensionality in classification problems, including t -test marginal screening (Fan & Fan, 2008) and maximum marginal likelihood screening (Fan & Song, 2010). However, these methods rely on strong modelling assumptions that are easily violated in real applications. To circumvent the parametric modelling assumptions, we propose a new variable screening technique for binary classification based on the Kolmogorov–Smirnov statistic. We prove that this so-called Kolmogorov filter enjoys the sure screening property under much weakened model assumptions. We supplement our theoretical study by a simulation study.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    Publication Date: 2013-02-24
    Description: Cumulative sum or cusum charts are typically used to detect a change in the distribution of a sequence of observations, e.g., shifts in the mean. Usually, after signalling, the chart is restarted by setting it to some value below the signalling threshold. We propose a non-restarting cusum chart which is able to detect periods during which the stream is out of control. Further, we advocate an upper boundary to prevent the cusum chart rising too high, which helps to detect a change back into control. We present an algorithm to control the false discovery rate when considering cusum charts based on multiple streams of data. We consider two definitions of a false discovery: signalling out-of-control when the observations have been in control since the start and signalling out-of-control when the observations have been in control since the last time the chart was at zero. We prove that the false discovery rate is controlled under both these definitions simultaneously. Simulations reveal the difference in false discovery rate control when using these and other desirable definitions of a false discovery.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2012-11-16
    Description: In a matched observational study of treatment effects, a sensitivity analysis asks about the magnitude of the departure from random assignment that would need to be present to alter the conclusions of an analysis that assumes that matching for measured covariates removes all bias. The reported degree of sensitivity to unmeasured biases depends on both the process that generated the data and the chosen methods of analysis, so a poor choice of method may lead to an exaggerated report of sensitivity to bias. This suggests the possibility of performing more than one analysis with a correction for multiple inference, say testing one null hypothesis using two or three different tests. In theory and in an example, it is shown that, in large samples, the gains from testing twice will often be large, because testing twice has the larger of the two design sensitivities of the component tests, and the losses due to correcting for two tests will often be small, because two tests of one hypothesis will typically be highly correlated, so a correction for multiple testing that takes this into account will be small. An illustration uses data from the U.S. National Health and Nutrition Examination Survey concerning lead in the blood of cigarette smokers.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    Publication Date: 2012-11-16
    Description: Inferences related to the second-order properties of functional data, as expressed by covariance structure, can become unreliable when the data are non-Gaussian or contain unusual observations. In the functional setting, it is often difficult to identify atypical observations, as their distinguishing characteristics can be manifold but subtle. In this paper, we introduce the notion of a dispersion operator, investigate its use in probing the second-order structure of functional data, and develop a test for comparing the second-order characteristics of two functional samples that is resistant to atypical observations and departures from normality. The proposed test is a regularized M -test based on a spectrally truncated version of the Hilbert–Schmidt norm of a score operator defined via the dispersion operator. We derive the asymptotic distribution of the test statistic, investigate the behaviour of the test in a simulation study and illustrate the method on a structural biology dataset.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    Publication Date: 2012-11-16
    Description: Linear classifiers are very popular, but can have limitations when classes have distinct subpopulations. General nonlinear kernel classifiers are very flexible, but do not give clear interpretations and may not be efficient in high dimensions. We propose the bidirectional discrimination classification method, which generalizes linear classifiers to two or more hyperplanes. This new family of classification methods gives much of the flexibility of a general nonlinear classifier while maintaining the interpretability, and much of the parsimony, of linear classifiers. They provide a new visualization tool for high-dimensional, low-sample-size data. Although the idea is generally applicable, we focus on the generalization of the support vector machine and distance-weighted discrimination methods. The performance and usefulness of the proposed method are assessed using asymptotics and demonstrated through analysis of simulated and real data. Our method leads to better classification performance in high-dimensional situations where subclusters are present in the data.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    Publication Date: 2012-11-16
    Description: Two transformations are proposed that give orthogonal components with a one-to-one correspondence between the original vectors and the components. The aim is that each component should be close to the vector with which it is paired, orthogonality imposing a constraint. The transformations lead to a variety of new statistical methods, including a unified approach to the identification and diagnosis of collinearities, a method of setting prior weights for Bayesian model averaging, and a means of calculating an upper bound for a multivariate Chebychev inequality. One transformation has the property that duplicating a vector has no effect on the orthogonal components that correspond to nonduplicated vectors, and is determined using a new algorithm that also provides the decomposition of a positive-definite matrix in terms of a diagonal matrix and a correlation matrix. The algorithm is shown to converge to a global optimum.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2012-11-16
    Description: Scaled sparse linear regression jointly estimates the regression coefficients and noise level in a linear model. It chooses an equilibrium with a sparse regression method by iteratively estimating the noise level via the mean residual square and scaling the penalty in proportion to the estimated noise level. The iterative algorithm costs little beyond the computation of a path or grid of the sparse regression estimator for penalty levels above a proper threshold. For the scaled lasso, the algorithm is a gradient descent in a convex minimization of a penalized joint loss function for the regression coefficients and noise level. Under mild regularity conditions, we prove that the scaled lasso simultaneously yields an estimator for the noise level and an estimated coefficient vector satisfying certain oracle inequalities for prediction, the estimation of the noise level and the regression coefficients. These inequalities provide sufficient conditions for the consistency and asymptotic normality of the noise-level estimator, including certain cases where the number of variables is of greater order than the sample size. Parallel results are provided for least-squares estimation after model selection by the scaled lasso. Numerical results demonstrate the superior performance of the proposed methods over an earlier proposal of joint convex minimization.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    Publication Date: 2012-11-16
    Description: In this article, we propose a regression method for simultaneous supervised clustering and feature selection over a given undirected graph, where homogeneous groups or clusters are estimated as well as informative predictors, with each predictor corresponding to one node in the graph and a connecting path indicating a priori possible grouping among the corresponding predictors. The method seeks a parsimonious model with high predictive power through identifying and collapsing homogeneous groups of regression coefficients. To address computational challenges, we present an efficient algorithm integrating the augmented Lagrange multipliers, coordinate descent and difference convex methods. We prove that the proposed method not only identifies the true homogeneous groups and informative features consistently but also leads to accurate parameter estimation. A gene network dataset is analysed to demonstrate that the method can make a difference by exploring dependency structures among the genes.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2012-11-16
    Description: This article proposes a method of moments technique for estimating the sparsity of signals in a random sample. This involves estimating the largest eigenvalue of a large Hermitian trigonometric matrix under mild conditions. As illustration, the method is applied to two well-known problems. The first focuses on the sparsity of a large covariance matrix and the second investigates the sparsity of a sequence of signals observed with stationary, weakly dependent noise. Simulation shows that the proposed estimators can have significantly smaller mean absolute errors than their main competitors.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    Publication Date: 2012-11-16
    Description: We introduce a doubly stochastic marked point process model for supervised classification problems. Regardless of the number of classes or the dimension of the feature space, the model requires only 2–3 parameters for the covariance function. The classification criterion involves a permanental ratio for which an approximation using a polynomial-time cyclic expansion is proposed. The approximation is effective even if the feature region occupied by one class is a patchwork interlaced with regions occupied by other classes. An application to DNA microarray analysis indicates that the cyclic approximation is effective even for high-dimensional data. It can employ feature variables in an efficient way to reduce the prediction error significantly. This is critical when the true classification relies on nonreducible high-dimensional features.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    Publication Date: 2012-11-16
    Description: Researchers in the biological sciences nowadays often encounter the curse of dimensionality. To tackle this, sufficient dimension reduction aims to estimate the central subspace, in which all the necessary information supplied by the covariates regarding the response of interest is contained. Subsequent statistical analysis can then be made in a lower-dimensional space while preserving relevant information. Many studies are concerned with the transformed response rather than the original one, but they may have different central subspaces. When estimating the central subspace of the transformed response, direct methods will be inefficient. In this article, we propose a more efficient two-stage estimator of the central subspace of a transformed response. This approach is extended to censored responses and is applied to combining multiple biomarkers. Simulation studies and data examples support the superiority of the procedure.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    Publication Date: 2012-11-16
    Description: Transient semi-Markov processes have traditionally been used to describe the transitions of a patient through the various states of a multistate survival model. A survival distribution in this context is a sojourn through the states until passage to a fatal absorbing state or certain endpoint states. Using complete sojourn data, this paper shows how such survival distributions and associated hazard functions can be estimated nonparametrically and also how nonparametric bootstrap pointwise confidence bands can be constructed for them when patients are subject to independent right censoring from each state during the sojourn. Limitations to the estimability of such survival distributions that result from random censoring with bounded support are clarified. The methods are applicable to any sort of sojourn through any finite state process of arbitrary complexity involving feedback into previously occupied states.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    Publication Date: 2012-11-16
    Description: In some problems involving functional data, it is desired to undertake prediction or classification before the full trajectory of a function is observed. In such cases, it is often preferable to suffer somewhat greater error in return for making a decision relatively early. The prediction and classification problems can be treated similarly, using mean squared prediction error, or classification error, respectively, as the means for quantifying performance, so in this paper we focus principally on classification. We introduce a method for determining when an early decision can reasonably be made, using only part of the trajectory, and we show how to use the method to choose among data types. Our approach is fully nonparametric, and no specific model is required. Properties of error-rate are studied as functions of time and data type. The effectiveness of the proposed method is illustrated in both theoretical and numerical terms. The classification referred to in this paper would be termed supervised classification in machine learning, to distinguish it from unsupervised classification, or clustering.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2012-11-16
    Description: Linear mixed models cover a wide range of statistical methods, which have found many uses in the estimation for complex surveys. The purpose of this work is to consider methods by which linear mixed models may be used at the design stage of a survey to incorporate available auxiliary information. This paper reviews the ideas of balanced sampling and the cube algorithm, and proposes an implementation of the latter by which penalized balanced samples can be selected. Such samples can reduce or eliminate the need for linear mixed model weight adjustments, a result demonstrated theoretically and via simulation. Horvitz–Thompson estimators for such samples will be highly efficient for any responses well approximated by a linear mixed model in the auxiliary information. In Monte Carlo experiments using nonparametric and temporal linear mixed models, the strategy of penalized balanced sampling with Horvitz–Thompson estimation dominates a variety of standard strategies.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    Publication Date: 2012-11-16
    Description: Monte Carlo algorithms are commonly used to identify a set of models for Bayesian model selection or model averaging. Because empirical frequencies of models are often zero or one in high-dimensional problems, posterior probabilities calculated from the observed marginal likelihoods, renormalized over the sampled models, are often employed. Such estimates are the only recourse in several newer stochastic search algorithms. In this paper, we prove that renormalization of posterior probabilities over the set of sampled models generally leads to bias that may dominate mean squared error. Viewing the model space as a finite population, we propose a new estimator based on a ratio of Horvitz–Thompson estimators that incorporates observed marginal likelihoods, but is approximately unbiased. This is shown to lead to a reduction in mean squared error compared to the empirical or renormalized estimators, with little increase in computational cost.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2012-11-16
    Description: Many proper scoring rules such as the Brier and log scoring rules implicitly reward a probability forecaster relative to a uniform baseline distribution. Recent work has motivated weighted proper scoring rules, which have an additional baseline parameter. To date two families of weighted proper scoring rules have been introduced, the weighted power and pseudospherical scoring families. These families are compatible with the log scoring rule: when the baseline maximizes the log scoring rule over some set of distributions, the baseline also maximizes the weighted power and pseudospherical scoring rules over the same set. We characterize all weighted proper scoring families and prove a general property: every proper scoring rule is compatible with some weighted scoring family, and every weighted scoring family is compatible with some proper scoring rule.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    Publication Date: 2012-11-16
    Description: Projective shape consists of the information about a configuration of points that is invariant under projective transformations. It is an important tool in machine vision to pick out features that are invariant to the choice of camera view. The simplest example is the cross ratio for a set of four collinear points. Recent work involving ideas from multivariate robustness enables us to introduce here a natural preshape on projective shape space. This makes it possible to adapt the Procrustes analysis that forms the basis of much methodology in the simpler setting of similarity shape space.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    Publication Date: 2012-11-16
    Description: To study disease association with risk factors in epidemiologic studies, cross-sectional sampling is often more focused and less costly for recruiting study subjects who have already experienced initiating events. For time-to-event outcome, however, such a sampling strategy may be length biased. Coupled with censoring, analysis of length-biased data can be quite challenging, due to induced informative censoring in which the survival time and censoring time are correlated through a common backward recurrence time. We propose to use the proportional mean residual life model of Oakes & Dasu ( Biometrika 77 , 409–10, 1990) for analysis of censored length-biased survival data. Several nonstandard data structures, including censoring of onset time and cross-sectional data without follow-up, can also be handled by the proposed methodology.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    Publication Date: 2012-11-16
    Description: Several two-stage multiple testing procedures have been proposed to detect gene-environment interaction in genome-wide association studies. In this article, we elucidate general conditions that are required for validity and power of these procedures, and we propose extensions of two-stage procedures using the case-only estimator of gene-treatment interaction in randomized clinical trials. We develop a unified estimating equation approach to proving asymptotic independence between a filtering statistic and an interaction test statistic in a range of situations, including marginal association and interaction in a generalized linear model with a canonical link. We assess the performance of various two-stage procedures in simulations and in genetic studies from Women’s Health Initiative clinical trials.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 39
    Publication Date: 2012-11-16
    Description: Resampling-based methods for multiple hypothesis testing often lead to long run times when the number of tests is large. This paper presents a simple rule that substantially reduces computation by allowing resampling to terminate early on a subset of tests. We prove that the method has a low probability of obtaining a set of rejected hypotheses different from those rejected without early stopping, and obtain error bounds for multiple hypothesis testing. Simulation shows that our approach saves more computation than other available procedures.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 40
    Publication Date: 2012-11-16
    Description: We explore the use of estimating equations for efficient statistical inference in case of missing data. We propose a semiparametric efficient empirical likelihood approach, and show that the empirical likelihood ratio statistic and its profile counterpart asymptotically follow central chi-square distributions when evaluated at the true parameter. The theoretical properties and practical performance of our approach are demonstrated through numerical simulations and data analysis.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 41
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2012-08-22
    Description: Consider parametric models that are too complicated to allow calculation of a likelihood but from which observations can be simulated. We examine parameter estimators that are linear functions of a possibly large set of candidate features. A combination of simulations based on a fractional design and sets of discriminant analyses is then used to find an optimal estimator of the vector parameter and its covariance matrix. The procedure is an alternative to the approximate Bayesian computation scheme.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 42
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2012-08-22
    Description: Prior information or background knowledge may suggest that interactions arise only within certain factors. When such knowledge is available, we propose using a new class of designs: designs of variable resolution. Several constructions are presented. Statistical justifications for using such designs from minimum G 2 aberration and design efficiency perspectives are provided.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 43
    Publication Date: 2012-08-22
    Description: Merging data from multiple studies has been widely adopted in biomedical research. In this paper, we consider two major issues related to merging longitudinal datasets. We first develop a rigorous hypothesis testing procedure to assess the validity of data merging, and then propose a flexible joint estimation procedure that enables us to analyse merged data and to account for different within-subject correlations and follow-up schedules in different studies. We establish large sample properties for the proposed procedures. We compare our method with meta analysis and generalized estimating equations and show that our test provides robust control of Type I error against both misspecification of working correlation structures and heterogeneous dispersion parameters. Our joint estimating procedure leads to an improvement in estimation efficiency on all regression coefficients after data merging is validated.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 44
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2012-08-22
    Description: Using convex optimization, we construct a sparse estimator of the covariance matrix that is positive definite and performs well in high-dimensional settings. A lasso-type penalty is used to encourage sparsity and a logarithmic barrier function is used to enforce positive definiteness. Consistency and convergence rate bounds are established as both the number of variables and sample size diverge. An efficient computational algorithm is developed and the merits of the approach are illustrated with simulations and a speech signal classification example.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 45
    Publication Date: 2012-08-22
    Description: Penalization methods have been shown to yield both consistent variable selection and oracle parameter estimation under correct model specification. In this article, we study such methods under model misspecification, where the assumed form of the regression function is incorrect, including generalized linear models for uncensored outcomes and the proportional hazards model for censored responses. Estimation with the adaptive least absolute shrinkage and selection operator, lasso, penalty is proven to achieve sparse estimation of regression coefficients under misspecification. The resulting estimators are selection consistent, asymptotically normal and oracle, where the selection is based on the limiting values of the parameter estimators obtained using the misspecified model without penalization. We further derive conditions under which the penalized estimators from the misspecified model may yield selection consistency under the true model. The robustness is explored numerically via simulation and an application to the Wisconsin Epidemiological Study of Diabetic Retinopathy.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 46
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2012-08-22
    Description: This paper investigates the definition and the estimation of the Fréchet mean of a random rigid body motion in R p . The sample space SE ( p ) contains objects M =( R , t ) where R is a p x p rotation matrix and t is a p x 1 translation vector. This work is motivated by applications in biomechanics where the posture of a joint at a given time is expressed as M SE (3), the rigid body displacement needed to map a system of axes on one segment of the joint to a similar system on the other segment. This posture can also be reported as M –1 =( R T ,– R T t ) by interchanging the role of the two segments. Several definitions of a Fréchet mean for a random motion are proposed using weighted least squares distances. A special emphasis is given to a Fréchet mean that is equivariant with respect to the inverse transform; this means that if P is the Fréchet mean for M then P –1 is the Fréchet mean for M –1 , where M is a random SE ( p ) object. The sampling properties of moment estimators of the Fréchet means are studied in a large concentration setting, where the scatter of the random M s around their mean value P is small, and as the sample size goes to . Some simple exponential family models for SE ( p ) data that generalize Downs’ (1972) Fisher–von Mises matrix distribution for rotation matrices are introduced and the least squares mean values for these distributions are calculated. Asymptotic comparisons between the estimators presented in this work are carried out for a particular model when p =2. A numerical example involving the motion of the ankle is presented to illustrate the methodology.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 47
    Publication Date: 2012-08-22
    Description: Panel attrition is frequently encountered in panel sample surveys. When it is related to the observed study variable, the classical approach of nonresponse adjustment using a covariate-dependent dropout mechanism can be biased. We consider an efficient method of estimation with monotone panel attrition when the response probability depends on the previous values of study variable as well as other covariates. Because of the monotone structure of the missing pattern, the response mechanism is missing at random. The proposed estimator is asymptotically optimal in the sense that it minimizes the asymptotic variance of a class of estimators that can be written as a linear combination of the unbiased estimators of the panel estimates for each wave, and incorporates all available information using generalized least squares. Variance estimation is discussed and results from a simulation study are presented.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 48
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2012-08-22
    Description: We propose a graphical measure, the generalized negative predictive function, to quantify the predictive accuracy of covariates for survival time or recurrent event times. This new measure characterizes the event-free probabilities over time conditional on a thresholded linear combination of covariates and has direct clinical utility. We show that this function is maximized at the set of covariates truly related to event times and thus can be used to compare the predictive accuracy of different sets of covariates. We construct nonparametric estimators for this function under right censoring and prove that the proposed estimators, upon proper normalization, converge weakly to zero-mean Gaussian processes. To bypass the estimation of complex density functions involved in the asymptotic variances, we adopt the bootstrap approach and establish its validity. Simulation studies demonstrate that the proposed methods perform well in practical situations. Two clinical studies are presented.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 49
    Publication Date: 2012-08-22
    Description: Random probability measures are the main tool for Bayesian nonparametric inference, with their laws acting as prior distributions. Many well-known priors used in practice admit different, though equivalent, representations. In terms of computational convenience, stick-breaking representations stand out. In this paper we focus on the normalized inverse Gaussian process and provide a completely explicit stick-breaking representation for it. This result is of interest both from a theoretical viewpoint and for statistical practice.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 50
    Publication Date: 2012-08-22
    Description: In most current data modelling for time-dynamic systems, one works with a prespecified differential equation and attempts to estimate its parameters. In contrast, we demonstrate that in the case of functional data, the equation itself can be inferred. Assuming only that the dynamics are described by a first-order nonlinear differential equation with a random component, we obtain data-adaptive dynamic equations from the observed data via a simple smoothing-based procedure. We prove consistency and introduce diagnostics to ascertain the fraction of variance that is explained by the deterministic part of the equation. This approach is shown to yield useful insights into the time-dynamic nature of human growth.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 51
    Publication Date: 2012-08-22
    Description: We consider estimation of scalar functions that determine the dynamics of diffusion processes. It has been recently shown that nonparametric maximum likelihood estimation is ill-posed in this context. We adopt a probabilistic approach to regularize the problem by the adoption of a prior distribution for the unknown functional. A Gaussian prior measure is chosen in the function space by specifying its precision operator as an appropriate differential operator. We establish that a Bayesian–Gaussian conjugate analysis for the drift of one-dimensional nonlinear diffusions is feasible using high-frequency data, by expressing the loglikelihood as a quadratic function of the drift, with sufficient statistics given by the local time process and the end points of the observed path. Computationally efficient posterior inference is carried out using a finite element method. We embed this technology in partially observed situations and adopt a data augmentation approach whereby we iteratively generate missing data paths and draws from the unknown functional. Our methodology is applied to estimate the drift of models used in molecular dynamics and financial econometrics using high- and low-frequency observations. We discuss extensions to other partially observed schemes and connections to other types of nonparametric inference.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 52
    Publication Date: 2012-08-22
    Description: Principal component analysis is commonly used for dimension reduction in analysing high-dimensional data. Multilinear principal component analysis aims to serve a similar function for analysing tensor structure data, and has empirically been shown effective in reducing dimensionality. In this paper, we investigate its statistical properties and demonstrate its advantages. Conventional principal component analysis, which vectorizes the tensor data, may lead to inefficient and unstable prediction due to the often extremely large dimensionality involved. Multilinear principal component analysis, in trying to preserve the data structure, searches for low-dimensional projections and, thereby, decreases dimensionality more efficiently. The asymptotic theory of order-two multilinear principal component analysis, including asymptotic efficiency and distributions of principal components, associated projections, and the explained variance, is developed. A test of dimensionality is also proposed. Finally, multilinear principal component analysis is shown to improve conventional principal component analysis in analysing the Olivetti faces dataset, which is achieved by extracting a more modularly oriented basis set in reconstructing the test faces.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 53
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2012-08-22
    Description: A general framework for a novel non-geodesic decomposition of high-dimensional spheres or high-dimensional shape spaces for planar landmarks is discussed. The decomposition, principal nested spheres, leads to a sequence of submanifolds with decreasing intrinsic dimensions, which can be interpreted as an analogue of principal component analysis. In a number of real datasets, an apparent one-dimensional mode of variation curving through more than one geodesic component is captured in the one-dimensional component of principal nested spheres. While analysis of principal nested spheres provides an intuitive and flexible decomposition of the high-dimensional sphere, an interesting special case of the analysis results in finding principal geodesics, similar to those from previous approaches to manifold principal component analysis. An adaptation of our method to Kendall’s shape space is discussed, and a computational algorithm for fitting principal nested spheres is proposed. The result provides a coordinate system to visualize the data structure and an intuitive summary of principal modes of variation, as exemplified by several datasets.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 54
    Publication Date: 2012-08-22
    Description: Incidence is an important epidemiological concept most suitably studied using an incident cohort study. However, data are often collected from the more feasible prevalent cohort study, whereby diseased individuals are recruited through a cross-sectional survey and followed in time. In the absence of temporal trends in survival, we derive an efficient nonparametric estimator of the cumulative incidence based on such data and study its asymptotic properties. Arbitrary calendar time variations in disease incidence are allowed. Age-specific incidence and adjustments for both stratified sampling and temporal variations in survival are also discussed. Simulation results are presented and data from the Canadian Study of Health and Aging are analysed to infer the incidence of dementia in the Canadian elderly population.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 55
    Publication Date: 2012-08-22
    Description: It can be more challenging to efficiently model the covariance matrices for multivariate longitudinal data than for the univariate case, due to the correlations arising between multiple responses. The positive-definiteness constraint and the high dimensionality are further obstacles in covariance modelling. In this paper, we develop a data-based method by which the parameters in the covariance matrices are replaced by unconstrained and interpretable parameters with reduced dimensions. The maximum likelihood estimators for the mean and covariance parameters are shown to be consistent and asymptotically normally distributed. Simulations and real data analysis show that the new approach performs very well even when modelling bivariate nonstationary dependence structures.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 56
    Publication Date: 2012-08-22
    Description: Bayesian properties of the signed root likelihood ratio statistic are analysed. Conditions for first-order probability matching are derived by the examination of the Bayesian posterior and frequentist means of this statistic. Second-order matching conditions are shown to arise from matching of the Bayesian posterior and frequentist variances of a mean-adjusted version of the signed root statistic. Conditions for conditional probability matching in ancillary statistic models are derived and discussed.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 57
    Publication Date: 2012-08-22
    Description: When a parametric likelihood function is not specified for a model, estimating equations may provide an instrument for statistical inference. Qin and Lawless (1994) illustrated that empirical likelihood makes optimal use of these equations in inferences for fixed low-dimensional unknown parameters. In this paper, we study empirical likelihood for general estimating equations with growing high dimensionality and propose a penalized empirical likelihood approach for parameter estimation and variable selection. We quantify the asymptotic properties of empirical likelihood and its penalized version, and show that penalized empirical likelihood has the oracle property. The performance of the proposed method is illustrated via simulated applications and a data analysis.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 58
    Publication Date: 2012-08-22
    Description: In this article we propose a new model, called the inner envelope model, which leads to efficient estimation in the context of multivariate normal linear regression. The asymptotic distribution and the consistency of its maximum likelihood estimators are established. Theoretical results, simulation studies and examples all show that the efficiency gains can be substantial relative to standard methods and to the maximum likelihood estimators from the envelope model introduced recently by Cook et al. (2010). Compared to the envelope model, the inner envelope model is based on a different construction and it can produce substantial efficiency gains in situations where the envelope model offers no gains. In effect, inner envelopes open a new frontier to the way in which reducing subspaces can be used to improve efficiency in multivariate problems.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 59
    Publication Date: 2012-04-19
    Description: Bayesian analysis of a finite state Markov process, which is popularly used to model multistate event history data, is considered. A new prior process, called a beta-Dirichlet process, is introduced for the cumulative intensity functions and is proved to be conjugate. In addition, the beta-Dirichlet prior is applied to a Bayesian semiparametric regression model. To illustrate the application of the proposed model, we analyse a dataset of credit histories.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 60
    Publication Date: 2012-04-19
    Description: Two-level fractional factorial designs are considered under a baseline parameterization. The criterion of minimum aberration is formulated in this context and optimal designs under this criterion are investigated. The underlying theory and the concept of isomorphism turn out to be significantly different from their counterparts under orthogonal parameterization, and this is reflected in the optimal designs obtained.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 61
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2012-04-19
    Description: We give a definition of a bounded edge within the causal directed acyclic graph framework. A bounded edge generalizes the notion of a signed edge and is defined in terms of bounds on a ratio of survivor probabilities. We derive rules concerning the propagation of bounds. Bounds on causal effects in the presence of unmeasured confounding are also derived using bounds related to specific edges on a graph. We illustrate the theory developed by an example concerning estimating the effect of antihistamine treatment on asthma in the presence of unmeasured confounding.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 62
    Publication Date: 2012-04-19
    Description: Treatment switching is a frequent occurrence in clinical trials, where, during the course of the trial, patients who fail on the control treatment may change to the experimental treatment. Analysing the data without accounting for switching yields highly biased and inefficient estimates of the treatment effect. In this paper, we propose a novel class of semiparametric semicompeting risks transition survival models to accommodate treatment switches. Theoretical properties of the proposed model are examined and an efficient expectation-maximization algorithm is derived for obtaining the maximum likelihood estimates. Simulation studies are conducted to demonstrate the superiority of the model compared with the intent-to-treat analysis and other methods proposed in the literature. The proposed method is applied to data from a colorectal cancer clinical trial.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 63
    Publication Date: 2012-04-19
    Description: The mean residual life provides the remaining life expectancy of a subject who has survived to a certain time-point. When covariates are present, regression models are needed to study the association between the mean residual life function and potential regression covariates. In this paper, we propose a flexible class of semiparametric mean residual life models where some effects may be time-varying and some may be constant over time. In the presence of right censoring, we use the inverse probability of censoring weighting approach and develop inference procedures for estimating the model parameters. In addition, we provide graphical and numerical methods for model checking and tests for examining whether or not the covariate effects vary with time. Asymptotic and finite sample properties of the proposed estimators are established and the approach is applied to real life datasets collected from clinical trials.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 64
    Publication Date: 2012-04-19
    Description: Covariate measurement error and missing responses are typical features in longitudinal data analysis. There has been extensive research on either covariate measurement error or missing responses, but relatively little work has been done to address both simultaneously. In this paper, we propose a simple method for the marginal analysis of longitudinal data with time-varying covariates, some of which are measured with error, while the response is subject to missingness. Our method has a number of appealing properties: assumptions on the model are minimal, with none needed about the distribution of the mismeasured covariate; implementation is straightforward and its applicability is broad. We provide both theoretical justification and numerical results.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 65
    Publication Date: 2012-04-19
    Description: Combining information from two or more independent surveys is a problem frequently encountered in survey sampling. We consider the case of two independent surveys, where a large sample from survey 1 collects only auxiliary information and a much smaller sample from survey 2 provides information on both the variables of interest and the auxiliary variables. We propose a model-assisted projection method of estimation based on a working model, but the reference distribution is design-based. We generate synthetic or proxy values of a variable of interest by first fitting the working model, relating the variable of interest to the auxiliary variables, to the data from survey 2 and then predicting the variable of interest associated with the auxiliary variables observed in survey 1. The projection estimator of a total is simply obtained from the survey 1 weights and associated synthetic values. We identify the conditions for the projection estimator to be asymptotically unbiased. Domain estimation using the projection method is also considered. Replication variance estimators are obtained by augmenting the synthetic data file for survey 1 with additional synthetic columns associated with the columns of replicate weights. Results from a simulation study are presented.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 66
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2012-04-19
    Description: We propose a semiparametric proportional likelihood ratio model which is particularly suitable for modelling a nonlinear monotonic relationship between the outcome variable and a covariate. This model extends the generalized linear model by leaving the distribution unspecified, and has a strong connection with semiparametric models such as the selection bias model (Gilbert et al., 1999), the density ratio model (Qin, 1998; Fokianos & Kaimi, 2006), the single-index model (Ichimura, 1993) and the exponential tilt regression model (Rathouz & Gao, 2009). A maximum likelihood estimator is obtained for the new model and its asymptotic properties are derived. An example and simulation study illustrate the use of the model.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 67
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2012-04-19
    Description: We consider a robust parameter estimator minimizing an empirical approximation to the q -entropy and show its relationship to minimization of power divergences through a simple parameter transformation. The estimator balances robustness and efficiency through a tuning constant q and avoids kernel density smoothing. We derive an upper bound to the estimator mean squared error under a contaminated reference model and use it as a min-max criterion for selecting q .
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 68
    Publication Date: 2012-04-19
    Description: A standard approach to model the extreme values of a stationary process is the peaks over threshold method, which consists of imposing a high threshold, identifying clusters of exceedances of this threshold and fitting the maximum value from each cluster using the generalized Pareto distribution. This approach is strongly justified by underlying asymptotic theory. We propose an alternative model for the distribution of the cluster maxima that accounts for the subasymptotic theory of extremes of a stationary process. This new distribution is a product of two terms, one for the marginal distribution of exceedances and the other for the dependence structure of the exceedance values within a cluster. We illustrate the improvement in fit, measured by the root mean square error of the estimated quantiles, offered by the new distribution over the peaks over thresholds analysis using simulated and hydrological data, and we suggest a diagnostic tool to help identify when the proposed model is likely to lead to an improved fit.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 69
    Publication Date: 2012-04-19
    Description: Importance sampling is a common technique for Monte Carlo approximation, including that of p -values. Here it is shown that a simple correction of the usual importance sampling p -values provides valid p -values, meaning that a hypothesis test created by rejecting the null hypothesis when the p -value is at most α will also have a Type I error rate of at most α . This correction uses the importance weight of the original observation, which gives valuable diagnostic information under the null hypothesis. Using the corrected p -values can be crucial for multiple testing and also in problems where evaluating the accuracy of importance sampling approximations is difficult. Inverting the corrected p -values provides a useful way to create Monte Carlo confidence intervals that maintain the nominal significance level and use only a single Monte Carlo sample.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 70
    Publication Date: 2012-04-19
    Description: Sparse discriminant methods based on independence rules, such as the nearest shrunken centroids classifier (Tibshirani et al., 2002) and features annealed independence rules (Fan & Fan, 2008), have been proposed as computationally attractive tools for feature selection and classification with high-dimensional data. A fundamental drawback of these rules is that they ignore correlations among features and thus could produce misleading feature selection and inferior classification. We propose a new procedure for sparse discriminant analysis, motivated by the least squares formulation of linear discriminant analysis. To demonstrate our proposal, we study the numerical and theoretical properties of discriminant analysis constructed via lasso penalized least squares. Our theory shows that the method proposed can consistently identify the subset of discriminative features contributing to the Bayes rule and at the same time consistently estimate the Bayes classification direction, even when the dimension can grow faster than any polynomial order of the sample size. The theory allows for general dependence among features. Simulated and real data examples show that lassoed discriminant analysis compares favourably with other popular sparse discriminant proposals.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 71
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2012-04-19
    Description: We propose a method of factor profiled sure independence screening for ultrahigh-dimensional variable selection. The objective of this method is to identify nonzero components consistently from a sparse coefficient vector. The new method assumes that the correlation structure of the high-dimensional data can be well represented by a set of low-dimensional latent factors, which can be estimated consistently by eigenvalue-eigenvector decomposition. The estimated latent factors should then be profiled out from both the response and the predictors. Such an operation, referred to as factor profiling, produces uncorrelated predictors. Therefore, sure independence screening can be applied subsequently and the resulting screening result is consistent for model selection, a major advantage that standard sure independence screening does not share. We refer to the new method as factor profiled sure independence screening. Numerical studies confirm its outstanding performance.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 72
    Publication Date: 2012-04-19
    Description: Karl Pearson’s role in the transformation that took the 19th century statistics of Laplace and Gauss into the modern era of 20th century multivariate analysis is examined from a new point of view. By viewing Pearson’s work in the context of a motto he adopted from Charles Darwin, a philosophical theme is identified in Pearson’s statistical work, and his three major achievements are briefly described.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 73
    Publication Date: 2012-04-19
    Description: This paper considers semiparametric estimation of the Cox proportional hazards model for right-censored and length-biased data arising from prevalent sampling. To exploit the special structure of length-biased sampling, we propose a maximum pseudo-profile likelihood estimator, which can handle time-dependent covariates and is consistent under covariate-dependent censoring. Simulation studies show that the proposed estimator is more efficient than its competitors. A data analysis illustrates the methods and theory.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 74
    Publication Date: 2012-04-19
    Description: We consider the problem of fitting a generalized linear model to overdispersed data, focussing on a quasilikelihood approach in which the variance is assumed to be proportional to that specified by the model, and the constant of proportionality, , is used to obtain appropriate standard errors and model comparisons. It is common practice to base an estimate of on Pearson’s lack-of-fit statistic, with or without Farrington’s modification. We propose a new estimator that has a smaller variance, subject to a condition on the third moment of the response variable. We conjecture that this condition is likely to be achieved for the important special cases of count and binomial data. We illustrate the benefits of the new estimator using simulations for both count and binomial data.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 75
    Publication Date: 2012-04-19
    Description: Several optimality properties of Dorfman’s (1943) group testing procedure are derived for estimation of the prevalence of a rare disease whose status is classified with error. Exact ranges of disease prevalence are obtained for which group testing provides more efficient estimation when group size increases.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 76
    Publication Date: 2012-04-19
    Description: We propose new regression models for parameterizing covariance structures in longitudinal data analysis. Using a novel Cholesky factor, the entries in this decomposition have a moving average and log-innovation interpretation and are modelled as linear functions of covariates. We propose efficient maximum likelihood estimates for joint mean-covariance analysis based on this decomposition and derive the asymptotic distributions of the coefficient estimates. Furthermore, we study a local search algorithm, computationally more efficient than traditional all subset selection, based on bic  for model selection, and show its model selection consistency. Thus, a conjecture of Pan & MacKenzie (2003) is verified. We demonstrate the finite-sample performance of the method via analysis of data on CD4 trajectories and through simulations.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 77
    Publication Date: 2012-04-19
    Description: We study allocations that maximize the power of tests of equality of two treatments having binary outcomes. When a normal approximation applies, the asymptotic power is maximized by minimizing the variance, leading to a Neyman allocation that assigns observations in proportion to the standard deviations. This allocation, which in general requires knowledge of the parameters of the problem, is recommended in a large body of literature. Under contiguous alternatives the normal approximation indeed applies, and in this case the Neyman allocation reduces to a balanced design. However, when studying the power under a noncontiguous alternative, a large deviations approximation is needed, and the Neyman allocation is no longer asymptotically optimal. In the latter case, the optimal allocation depends on the parameters, but is rather close to a balanced design. Thus, a balanced design is a viable option for both contiguous and noncontiguous alternatives. Finite sample studies show that a balanced design is indeed generally quite close to being optimal for power maximization. This is good news as implementation of a balanced design does not require knowledge of the parameters.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 78
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2012-04-19
    Description: The proportional likelihood ratio model introduced in Luo & Tsai (2012) is adapted to explicitly model the means of observations. This is useful for the estimation of and inference on treatment effects, particularly in designed experiments and allows the data analyst greater control over model specification and parameter interpretation.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 79
    Publication Date: 2012-05-22
    Description: The infinite dimension of functional data can challenge conventional methods for classification and clustering. A variety of techniques have been introduced to address this problem, particularly in the case of prediction, but the structural models that they involve can be too inaccurate, or too abstract, or too difficult to interpret, for practitioners. In this paper, we develop approaches to adaptively choose components, enabling classification and clustering to be reduced to finite-dimensional problems. We explore and discuss properties of these methodologies. Our techniques involve methods for estimating classifier error rate and cluster tightness, and for choosing both the number of components, and their locations, to optimize these quantities. A major attraction of this approach is that it allows identification of parts of the function domain that convey important information for classification and clustering. It also permits us to determine regions that are relevant to one of these analyses but not the other.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 80
    Publication Date: 2012-05-22
    Description: We generalize the Dunnett test to derive efficacy and futility boundaries for a flexible multi-arm multi-stage clinical trial for a normally distributed endpoint with known variance. We show that the boundaries control the familywise error rate in the strong sense. The method is applicable for any number of treatment arms, number of stages and number of patients per treatment per stage. It can be used for a wide variety of boundary types or rules derived from α -spending functions. Additionally, we show how sample size can be computed under a least favourable configuration power requirement and derive formulae for expected sample sizes.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 81
    Publication Date: 2012-05-22
    Description: We develop a method for bias correction, which models the error of the target estimator as a function of the corresponding estimator obtained from bootstrap samples, and the original estimators and bootstrap estimators of the parameters governing the model fitted to the sample data. This is achieved by considering a number of plausible parameter values, generating a pseudo original sample for each parameter and bootstrap samples for each such sample, and then searching for an appropriate functional relationship. Under certain conditions, the procedure also permits estimation of the mean square error of the bias corrected estimator. The method is applied for estimating the prediction mean square error in small area estimation of proportions under a generalized mixed model. Empirical comparisons with jackknife and bootstrap methods are presented.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 82
    Publication Date: 2012-05-22
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 83
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2014-11-25
    Description: A framework for classification is developed with a notion of confidence. In this framework, a classifier consists of two tolerance regions in the predictor space, with a specified coverage level for each class. The classifier also produces an ambiguous region where the classification needs further investigation. Theoretical analysis reveals interesting structures of the confidence-ambiguity trade-off, and the optimal solution is characterized by extending the Neyman–Pearson lemma. We provide general estimating procedures, along with rates of convergence, based on estimates of the conditional probabilities. The method can be easily implemented with good robustness, as illustrated through theory, simulation and a data example.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 84
    Publication Date: 2014-11-25
    Description: Evidence-based rules for optimal treatment allocation are key components in the quest for efficient, effective health-care delivery. Q-learning, an approximate dynamic programming algorithm, is a popular method for estimating optimal sequential decision rules from data. Q-learning requires the modelling of nonsmooth, nonmonotone transformations of the data, complicating the search for adequately expressive, yet parsimonious, statistical models. The default Q-learning working model is multiple linear regression, which not only is misspecified under most data-generating models but also results in nonregular regression estimators, complicating inference. We propose an alternative strategy for estimating optimal sequential decision rules for which the requisite statistical modelling does not depend on nonsmooth, nonmonotone transformed data, does not result in nonregular regression estimators, is consistent under more data-generation models than is Q-learning, results in estimated sequential decision rules that have better sampling properties, and is amenable to established statistical methods for exploratory data analysis, model building and validation. We derive the new method, IQ-learning, via an interchange in the order of certain steps in Q-learning. In simulated experiments, IQ-learning improves upon Q-learning in terms of integrated mean-squared error and power. The method is illustrated using data from a study of major depressive disorder.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 85
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2014-11-25
    Description: Symmetric binary matrices representing relations are collected in many areas. Our focus is on dynamically evolving binary relational matrices, with interest being on inference on the relationship structure and prediction. We propose a nonparametric Bayesian dynamic model, which reduces dimensionality in characterizing the binary matrix through a lower-dimensional latent space representation, with the latent coordinates evolving in continuous time via Gaussian processes. By using a logistic mapping function from the link probability matrix space to the latent relational space, we obtain a flexible and computationally tractable formulation. Employing Pólya-gamma data augmentation, an efficient Gibbs sampler is developed for posterior computation, with the dimension of the latent space automatically inferred. We provide theoretical results on flexibility of the model, and illustrate its performance via simulation experiments. We also consider an application to co-movements in world financial markets.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 86
    Publication Date: 2014-11-25
    Description: We define three types of neighbour-balanced designs for experiments where the units are arranged in a circle or single line in space or time. The designs are balanced with respect to neighbours at distance one and at distance two. The variants come from allowing or forbidding self-neighbours, and from considering neighbours to be directed or undirected. For two of the variants, we give a method of constructing a design for all values of the number of treatments, except for some small values where it is impossible. In the third case, we give a partial solution that covers all sizes likely to be used in practice.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 87
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2014-11-25
    Description: We consider the problem of estimating the number of types in a corpus using the number of types observed in a sample of tokens from that corpus. We derive exact and asymptotic distributions for the number of observed types, conditioned on the number of tokens and the latent type distribution. We use the asymptotic distributions to derive an estimator of the latent number of types and validate this estimator numerically.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 88
    Publication Date: 2014-11-25
    Description: Exact forms of Taylor expansion for vector-valued functions have been incorrectly used in many statistical publications. We offer two methods to correct this error.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 89
    Publication Date: 2014-11-25
    Description: Regularization aims to improve prediction performance by trading an increase in training error for better agreement between training and prediction errors, which is often captured through decreased degrees of freedom. In this paper we give examples which show that regularization can increase the degrees of freedom in common models, including the lasso and ridge regression. In such situations, both training error and degrees of freedom increase, making the regularization inherently without merit. Two important scenarios are described where the expected reduction in degrees of freedom is guaranteed: all symmetric linear smoothers and convex constrained linear regression models like ridge regression and the lasso, when compared to unconstrained linear regression.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 90
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2014-11-25
    Description: We propose a general framework for dimension reduction in regression to fill the gap between linear and fully nonlinear dimension reduction. The main idea is to first transform each of the raw predictors monotonically and then search for a low-dimensional projection in the space defined by the transformed variables. Both user-specified and data-driven transformations are suggested. In each case, the methodology is first discussed in generality and then a representative method is proposed and evaluated by simulation. The proposed methods are applied to a real dataset.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 91
    Publication Date: 2014-11-25
    Description: Establishing cause-effect relationships is a standard goal of empirical science. Once the existence of a causal relationship is established, the precise causal mechanism involved becomes a topic of interest. A particularly popular type of mechanism analysis concerns questions of mediation, i.e., to what extent an effect is direct, and to what extent it is mediated by a third variable. A semiparametric theory has recently been proposed that allows multiply robust estimation of direct and mediated marginal effect functionals in observational studies (Tchetgen Tchetgen & Shpitser, 2012). In this paper we extend the theory to handle parametric models of natural direct and indirect effects within levels of pre-exposure variables with an identity or log link function, where the model for the observed data likelihood is otherwise unrestricted. We show that estimation is generally infeasible in such a model because of the curse of dimensionality associated with the required estimation of auxiliary conditional densities or expectations, given high-dimensional covariates. Thus, we consider multiply robust estimation and propose a more general model which assumes that a subset, but not the entirety, of several working models holds.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 92
    Publication Date: 2014-11-25
    Description: In this paper we derive a new p -value based multiple testing procedure that improves upon the Hommel procedure by gaining power as well as having a simpler step-up structure similar to the Hochberg procedure. The key to this improvement is that the Hommel procedure can be improved by a consonant procedure. Exact critical constants of this new procedure can be numerically determined. The zeroth-order approximations to the exact critical constants, albeit slightly conservative, are simple to use and need no tabling, and hence are recommended in practice. The proposed procedure is shown to control the familywise error rate under independence among the p -values. Simulations empirically demonstrate familywise error rate control under positive and negative dependence. Power superiority of the proposed procedure over competing ones is also empirically demonstrated. Illustrative examples are given.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 93
    Publication Date: 2014-11-25
    Description: We propose a multivariate generalization of the univariate two-sample run test based on the shortest Hamiltonian path. The proposed test is distribution-free in finite samples. While most existing two-sample tests perform poorly or are even inapplicable to high-dimensional data, our test can be conveniently used in high-dimension, low-sample-size situations. We investigate its power when the sample size remains fixed and the dimension of the data grows to infinity. Simulated and real datasets demonstrate our method’s superiority over existing nonparametric two-sample tests.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 94
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2014-11-25
    Description: We consider a linear regression model and propose an omnibus test to simultaneously check the assumption of independence between the error and predictor variables and the goodness-of-fit of the parametric model. Our approach is based on testing for independence between the predictor and the residual obtained from the parametric fit by using the Hilbert–Schmidt independence criterion ( Gretton et al., 2008 ). The proposed method requires no user-defined regularization, is simple to compute based on only pairwise distances between points in the sample, and is consistent against all alternatives. We develop distribution theory for the proposed test statistic, under both the null and the alternative hypotheses, and devise a bootstrap scheme to approximate its null distribution. We prove the consistency of the bootstrap scheme. A simulation study shows that our method has better power than its main competitors. Two real datasets are analysed to demonstrate the scope and usefulness of our method.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 95
    Publication Date: 2014-11-25
    Description: We develop a method for construction of arrays which are nearly orthogonal, in the sense that each column is orthogonal to a large proportion of the other columns, and which are convertible to fully orthogonal arrays via a mapping of the symbols in each column to a possibly smaller set of symbols. These arrays can be useful in computer experiments as designs which accommodate a large number of factors and enjoy attractive space-filling properties. Our construction allows both the mappable nearly orthogonal array and the consequent fully orthogonal array to be either symmetric or asymmetric. Resolvable orthogonal arrays play a key role in the construction.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 96
    Publication Date: 2014-11-25
    Description: The higher criticism test is effective for testing a joint null hypothesis against a sparse alternative, e.g., for testing the effect of a gene or genetic pathway that consists of d genetic markers. Accurate p -value calculations for the higher criticism test based on the asymptotic distribution require a very large d , which is not the case for the number of genetic variants in a gene or a pathway. In this paper we propose an analytical method for accurately computing the p -value of the higher criticism test for finite- d problems. Unlike previous treatments, this method does not rely on asymptotics in d or on simulation, and is exact for arbitrary d when the test statistics are normally distributed. The method is particularly computationally advantageous when d is not large. We illustrate the proposed method with a case-control genome-wide association study of lung cancer and compare its power with competing methods through simulations.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 97
    Publication Date: 2014-11-25
    Description: We develop likelihood methods for the Kronecker envelope model in the principal components analysis of matrix observations that have a multivariate normal distribution. Maximum likelihood estimates are derived and the associated likelihood ratio statistic for a test of this Knonecker envelope model is obtained. The asymptotic null distribution of the likelihood ratio statistic is derived as some nuisance parameters approach infinity, and a saddlepoint approximation for this limiting distribution is given. An alternative composite test for the Kronecker envelope model, which can be used when the sample size is too small to use the likelihood ratio test, is also given. Simulation results demonstrate the accuracy of our approximations.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 98
    Publication Date: 2014-11-25
    Description: In marketing research, social science and epidemiological studies, call-back of nonrespondents is standard. If respondents and nonrespondents tend to give different answers, the missing data are called nonignorable, and using them alone may produce biased results. To extend earlier work on nonresponse in the presence of call-backs, Alho (1990) proposed modelling the probability of response at each attempt through logistic regression, where outcomes of interest and covariates are explanatory variables. In this paper we propose a semiparametric maximum likelihood approach, and discuss large-sample properties and the semiparametric likelihood ratio statistic used to test whether the data are missing completely at random. Simulations are conducted to evaluate this approach and a modification of the method of Alho (1990) . Data from the National Health Interview Survey are used for illustration.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 99
    Publication Date: 2014-11-25
    Description: This paper studies Bayesian variable selection in linear models with general spherically symmetric error distributions. We construct the posterior odds based on a separable prior, which arises as a class of mixtures of Gaussian densities. The posterior odds for comparing among nonnull models are shown to be independent of the error distribution, if this is spherically symmetric. Because of this invariance, we refer to our method as a robust Bayesian variable selection method. We demonstrate that our posterior odds have model selection consistency, and that our class of prior functions are the only ones within a large class which are robust in our sense.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 100
    Publication Date: 2014-11-25
    Description: A central question in causal inference with observational studies is the sensitivity of conclusions to unmeasured confounding. The classical Cornfield condition allows us to assess whether an unmeasured binary confounder can explain away the observed relative risk of the exposure on the outcome. It states that for an unmeasured confounder to explain away an observed relative risk, the association between the unmeasured confounder and the exposure and the association between the unmeasured confounder and the outcome must both be larger than the observed relative risk. In this paper, we extend the classical Cornfield condition in three directions. First, we consider analogous conditions for the risk difference and allow for a categorical, not just a binary, unmeasured confounder. Second, we provide more stringent thresholds that the maximum of the above-mentioned associations must satisfy, rather than weaker conditions that both must satisfy. Third, we show that all the earlier results on Cornfield conditions hold under weaker assumptions than previously used. We illustrate the potential applications by real examples, where our new conditions give more information than the classical ones.
    Print ISSN: 0006-3444
    Electronic ISSN: 1464-3510
    Topics: Biology , Mathematics , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...