ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unknown

Lower Bounds for Testing Graphical Models: Colorings and Antiferromagnetic Ising Models (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Publication Date: 2020

Description: We study the identity testing problem in the context of spin systems or undirected graphical models, where it takes the following form: given the parameter specification of the model $M$ and a sampling oracle for the distribution $\mu_{M^*}$ of an unknown model $M^*$, can we efficiently determine if the two models $M$ and $M^*$ are the same? We consider identity testing for both soft-constraint and hard-constraint systems. In particular, we prove hardness results in two prototypical cases, the Ising model and proper colorings, and explore whether identity testing is any easier than structure learning. For the ferromagnetic (attractive) Ising model, Daskalakis et al. (2018) presented a polynomial-time algorithm for identity testing. We prove hardness results in the antiferromagnetic (repulsive) setting in the same regime of parameters where structure learning is known to require a super-polynomial number of samples. Specifically, for $n$-vertex graphs of maximum degree $d$, we prove that if $|\beta| d = \omega(\log{n})$ (where $\beta$ is the inverse temperature parameter), then there is no polynomial running time identity testing algorithm unless $RP=NP$. In the hard-constraint setting, we present hardness results for identity testing for proper colorings. Our results are based on the presumed hardness of #BIS, the problem of (approximately) counting independent sets in bipartite graphs.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

2

Unknown

DESlib: A Dynamic ensemble selection library in Python (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: DESlib is an open-source python library providing the implementation of several dynamic selection techniques. The library is divided into three modules: (i) dcs, containing the implementation of dynamic classifier selection methods (DCS); (ii) des, containing the implementation of dynamic ensemble selection methods (DES); (iii) static, with the implementation of static ensemble techniques. The library is fully documented (documentation available online on Read the Docs), has a high test coverage (codecov.io) and is part of the scikit-learn-contrib supported projects. Documentation, code and examples can be found on its GitHub page: https://github.com/scikit-learn-contrib/DESlib.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

3

Unknown

Universal Latent Space Model Fitting for Large Networks with Edge Covariates (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Latent space models are effective tools for statistical modeling and visualization of network data. Due to their close connection to generalized linear models, it is also natural to incorporate covariate information in them. The current paper presents two universal fitting algorithms for networks with edge covariates: one based on nuclear norm penalization and the other based on projected gradient descent. Both algorithms are motivated by maximizing the likelihood function for an existing class of inner-product models, and we establish their statistical rates of convergence for these models. In addition, the theory informs us that both methods work simultaneously for a wide range of different latent space models that allow latent positions to affect edge formation in flexible ways, such as distance models. Furthermore, the effectiveness of the methods is demonstrated on a number of real world network data sets for different statistical tasks, including community detection with and without edge covariates, and network assisted learning.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

4

Unknown

A Unified Framework for Structured Graph Learning via Spectral Constraints (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Graph learning from data is a canonical problem that has received substantial attention in the literature. Learning a structured graph is essential for interpretability and identification of the relationships among data. In general, learning a graph with a specific structure is an NP-hard combinatorial problem and thus designing a general tractable algorithm is challenging. Some useful structured graphs include connected, sparse, multi-component, bipartite, and regular graphs. In this paper, we introduce a unified framework for structured graph learning that combines Gaussian graphical model and spectral graph theory. We propose to convert combinatorial structural constraints into spectral constraints on graph matrices and develop an optimization framework based on block majorization-minimization to solve structured graph learning problem. The proposed algorithms are provably convergent and practically amenable for a number of graph based applications such as data clustering. Extensive numerical experiments with both synthetic and real data sets illustrate the effectiveness of the proposed algorithms. An open source R package containing the code for all the experiments is available at https://CRAN.R-project.org/package=spectralGraphTopology.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

5

Unknown

Path-Based Spectral Clustering: Guarantees, Robustness to Outliers, and Fast Algorithms (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: We consider the problem of clustering with the longest-leg path distance (LLPD) metric, which is informative for elongated and irregularly shaped clusters. We prove finite-sample guarantees on the performance of clustering with respect to this metric when random samples are drawn from multiple intrinsically low-dimensional clusters in high-dimensional space, in the presence of a large number of high-dimensional outliers. By combining these results with spectral clustering with respect to LLPD, we provide conditions under which the Laplacian eigengap statistic correctly determines the number of clusters for a large class of data sets, and prove guarantees on the labeling accuracy of the proposed algorithm. Our methods are quite general and provide performance guarantees for spectral clustering with any ultrametric. We also introduce an efficient, easy to implement approximation algorithm for the LLPD based on a multiscale analysis of adjacency graphs, which allows for the runtime of LLPD spectral clustering to be quasilinear in the number of data points.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

6

Unknown

Weighted Message Passing and Minimum Energy Flow for Heterogeneous Stochastic Block Models with Side Information (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: We study the misclassification error for community detection in general heterogeneous stochastic block models (SBM) with noisy or partial label information. We establish a connection between the misclassification rate and the notion of minimum energy on the local neighborhood of the SBM. We develop an optimally weighted message passing algorithm to reconstruct labels for SBM based on the minimum energy flow and the eigenvectors of a certain Markov transition matrix. The general SBM considered in this paper allows for unequal-size communities, degree heterogeneity, and different connection probabilities among blocks. We focus on how to optimally weigh the message passing to improve misclassification.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

7

Unknown

Lower Bounds for Parallel and Randomized Convex Optimization (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: We study the question of whether parallelization in the exploration of the feasible set can be used to speed up convex optimization, in the local oracle model of computation and in the high-dimensional regime. We show that the answer is negative for both deterministic and randomized algorithms applied to essentially any of the interesting geometries and nonsmooth, weakly-smooth, or smooth objective functions. In particular, we show that it is not possible to obtain a polylogarithmic (in the sequential complexity of the problem) number of parallel rounds with a polynomial (in the dimension) number of queries per round. In the majority of these settings and when the dimension of the space is polynomial in the inverse target accuracy, our lower bounds match the oracle complexity of sequential convex optimization, up to at most a logarithmic factor in the dimension, which makes them (nearly) tight. Another conceptual contribution of our work is in providing a general and streamlined framework for proving lower bounds in the setting of parallel convex optimization. Prior to our work, lower bounds for parallel convex optimization algorithms were only known in a small fraction of the settings considered in this paper, mainly applying to Euclidean ($\ell_2$) and $\ell_\infty$ spaces.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

8

Unknown

A Low Complexity Algorithm with O(√T) Regret and O(1) Constraint Violations for Online Convex Optimization with Long Term Constraints (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: This paper considers online convex optimization over a complicated constraint set, which typically consists of multiple functional constraints and a set constraint. The conventional online projection algorithm (Zinkevich, 2003) can be difficult to implement due to the potentially high computation complexity of the projection operation. In this paper, we relax the functional constraints by allowing them to be violated at each round but still requiring them to be satisfied in the long term. This type of relaxed online convex optimization (with long term constraints) was first considered in Mahdavi et al. (2012). That prior work proposes an algorithm to achieve $O(\sqrt{T})$ regret and $O(T^{3/4})$ constraint violations for general problems and another algorithm to achieve an $O(T^{2/3})$ bound for both regret and constraint violations when the constraint set can be described by a finite number of linear constraints. A recent extension in Jenatton et al. (2016) can achieve $O(T^{\max\{\theta,1-\theta\}})$ regret and $O(T^{1-\theta/2})$ constraint violations where $\theta\in (0,1)$. The current paper proposes a new simple algorithm that yields improved performance in comparison to prior works. The new algorithm achieves an $O(\sqrt{T})$ regret bound with $O(1)$ constraint violations.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

9

Unknown

High-Dimensional Interactions Detection with Sparse Principal Hessian Matrix (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: In statistical learning framework with regressions, interactions are the contributions to the response variable from the products of the explanatory variables. In high-dimensional problems, detecting interactions is challenging due to combinatorial complexity and limited data information. We consider detecting interactions by exploring their connections with the principal Hessian matrix. Specifically, we propose a one-step synthetic approach for estimating the principal Hessian matrix by a penalized M-estimator. An alternating direction method of multipliers (ADMM) is proposed to efficiently solve the encountered regularized optimization problem. Based on the sparse estimator, we detect the interactions by identifying its nonzero components. Our method directly targets at the interactions, and it requires no structural assumption on the hierarchy of the interactions effects. We show that our estimator is theoretically valid, computationally efficient, and practically useful for detecting the interactions in a broad spectrum of scenarios.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

10

Unknown

Neyman-Pearson classification: parametrics and sample size requirement (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: The Neyman-Pearson (NP) paradigm in binary classification seeks classifiers that achieve a minimal type II error while enforcing the prioritized type I error controlled under some user-specified level $\alpha$. This paradigm serves naturally in applications such as severe disease diagnosis and spam detection, where people have clear priorities among the two error types. Recently, Tong, Feng, and Li (2018) proposed a nonparametric umbrella algorithm that adapts all scoring-type classification methods (e.g., logistic regression, support vector machines, random forest) to respect the given type I error (i.e., conditional probability of classifying a class $0$ observation as class $1$ under the 0-1 coding) upper bound $\alpha$ with high probability, without specific distributional assumptions on the features and the responses. Universal the umbrella algorithm is, it demands an explicit minimum sample size requirement on class $0$, which is often the more scarce class, such as in rare disease diagnosis applications. In this work, we employ the parametric linear discriminant analysis (LDA) model and propose a new parametric thresholding algorithm, which does not need the minimum sample size requirements on class $0$ observations and thus is suitable for small sample applications such as rare disease diagnosis. Leveraging both the existing nonparametric and the newly proposed parametric thresholding rules, we propose four LDA-based NP classifiers, for both low- and high-dimensional settings. On the theoretical front, we prove NP oracle inequalities for one proposed classifier, where the rate for excess type II error benefits from the explicit parametric model assumption. Furthermore, as NP classifiers involve a sample splitting step of class $0$ observations, we construct a new adaptive sample splitting scheme that can be applied universally to NP classifiers, and this adaptive strategy reduces the type II error of these classifiers. The proposed NP classifiers are implemented in the R package nproc.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

11

Unknown

On Mahalanobis Distance in Functional Settings (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Mahalanobis distance is a classical tool in multivariate analysis. We suggest here an extension of this concept to the case of functional data. More precisely, the proposed definition concerns those statistical problems where the sample data are real functions defined on a compact interval of the real line. The obvious difficulty for such a functional extension is the non-invertibility of the covariance operator in infinite-dimensional cases. Unlike other recent proposals, our definition is suggested and motivated in terms of the Reproducing Kernel Hilbert Space (RKHS) associated with the stochastic process that generates the data. The proposed distance is a true metric; it depends on a unique real smoothing parameter which is fully motivated in RKHS terms. Moreover, it shares some properties of its finite dimensional counterpart: it is invariant under isometries, it can be consistently estimated from the data and its sampling distribution is known under Gaussian models. An empirical study for two statistical applications, outliers detection and binary classification, is included. The results are quite competitive when compared to other recent proposals in the literature.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

12

Unknown

A Model of Fake Data in Data-driven Analysis (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Data-driven analysis has been increasingly used in various decision making processes. With more sources, including reviews, news, and pictures, can now be used for data analysis, the authenticity of data sources is in doubt. While previous literature attempted to detect fake data piece by piece, in the current work, we try to capture the fake data sender's strategic behavior to detect the fake data source. Specifically, we model the tension between a data receiver who makes data-driven decisions and a fake data sender who benefits from misleading the receiver. We propose a potentially infinite horizon continuous time game-theoretic model with asymmetric information to capture the fact that the receiver does not initially know the existence of fake data and learns about it during the course of the game. We use point processes to model the data traffic, where each piece of data can occur at any discrete moment in a continuous time flow. We fully solve the model and employ numerical examples to illustrate the players' strategies and payoffs for insights. Specifically, our results show that maintaining some suspicion about the data sources and understanding that the sender can be strategic are very helpful to the data receiver. In addition, based on our model, we propose a methodology of detecting fake data that is complementary to the previous studies on this topic, which suggested various approaches on analyzing the data piece by piece. We show that after analyzing each piece of data, understanding a source by looking at the its whole history of pushing data can be helpful.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

13

Unknown

A Statistical Learning Approach to Modal Regression (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: This paper studies the nonparametric modal regression problem systematically from a statistical learning viewpoint. Originally motivated by pursuing a theoretical understanding of the maximum correntropy criterion based regression (MCCR), our study reveals that MCCR with a tending-to-zero scale parameter is essentially modal regression. We show that the nonparametric modal regression problem can be approached via the classical empirical risk minimization. Some efforts are then made to develop a framework for analyzing and implementing modal regression. For instance, the modal regression function is described, the modal regression risk is defined explicitly and its Bayes rule is characterized; for the sake of computational tractability, the surrogate modal regression risk, which is termed as the generalization risk in our study, is introduced. On the theoretical side, the excess modal regression risk, the excess generalization risk, the function estimation error, and the relations among the above three quantities are studied rigorously. It turns out that under mild conditions, function estimation consistency and convergence may be pursued in modal regression as in vanilla regression protocols such as mean regression, median regression, and quantile regression. On the practical side, the implementation issues of modal regression including the computational algorithm and the selection of the tuning parameters are discussed. Numerical validations on modal regression are also conducted to verify our findings.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

14

Unknown

Perturbation Bounds for Procrustes, Classical Scaling, and Trilateration, with Applications to Manifold Learning (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: One of the common tasks in unsupervised learning is dimensionality reduction, where the goal is to find meaningful low-dimensional structures hidden in high-dimensional data. Sometimes referred to as manifold learning, this problem is closely related to the problem of localization, which aims at embedding a weighted graph into a low-dimensional Euclidean space. Several methods have been proposed for localization, and also manifold learning. Nonetheless, the robustness property of most of them is little understood. In this paper, we obtain perturbation bounds for classical scaling and trilateration, which are then applied to derive performance bounds for Isomap, Landmark Isomap, and Maximum Variance Unfolding. A new perturbation bound for procrustes analysis plays a key role.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

15

Unknown

Online Sufficient Dimension Reduction Through Sliced Inverse Regression (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Sliced inverse regression is an effective paradigm that achieves the goal of dimension reduction through replacing high dimensional covariates with a small number of linear combinations. It does not impose parametric assumptions on the dependence structure. More importantly, such a reduction of dimension is sufficient in that it does not cause loss of information. In this paper, we adapt the stationary sliced inverse regression to cope with the rapidly changing environments. We propose to implement sliced inverse regression in an online fashion. This online learner consists of two steps. In the first step we construct an online estimate for the kernel matrix; in the second step we propose two online algorithms, one is motivated by the perturbation method and the other is originated from the gradient descent optimization, to perform online singular value decomposition. The theoretical properties of this online learner are established. We demonstrate the numerical performance of this online learner through simulations and real world applications. All numerical studies confirm that this online learner performs as well as the batch learner.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

16

Unknown

On lp-Support Vector Machines and Multidimensional Kernels (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: In this paper, we extend the methodology developed for Support Vector Machines (SVM) using the $\ell_2$-norm ($\ell_2$-SVM) to the more general case of $\ell_p$-norms with $p〉1$ ($\ell_p$-SVM). We derive second order cone formulations for the resulting dual and primal problems. The concept of kernel function, widely applied in $\ell_2$-SVM, is extended to the more general case of $\ell_p$-norms with $p〉1$ by defining a new operator called multidimensional kernel. This object gives rise to reformulations of dual problems, in a transformed space of the original data, where the dependence on the original data always appear as homogeneous polynomials. We adapt known solution algorithms to efficiently solve the primal and dual resulting problems and some computational experiments on real-world datasets are presented showing rather good behavior in terms of the accuracy of $\ell_p$-SVM with $p〉1$.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

17

Unknown

Distributed Feature Screening via Componentwise Debiasing (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Feature screening is a powerful tool in processing high-dimensional data. When the sample size N and the number of features p are both large, the implementation of classic screening methods can be numerically challenging. In this paper, we propose a distributed screening framework for big data setup. In the spirit of 'divide-and-conquer', the proposed framework expresses a correlation measure as a function of several component parameters, each of which can be distributively estimated using a natural U-statistic from data segments. With the component estimates aggregated, we obtain a final correlation estimate that can be readily used for screening features. This framework enables distributed storage and parallel computing and thus is computationally attractive. Due to the unbiased distributive estimation of the component parameters, the final aggregated estimate achieves a high accuracy that is insensitive to the number of data segments m. Under mild conditions, we show that the aggregated correlation estimator is as efficient as the centralized estimator in terms of the probability convergence bound and the mean squared error rate; the corresponding screening procedure enjoys sure screening property for a wide range of correlation measures. The promising performances of the new method are supported by extensive numerical examples.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

18

Unknown

Practical Locally Private Heavy Hitters (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: We present new practical local differentially private heavy hitters algorithms achieving optimal or near-optimal worst-case error and running time -- TreeHist and Bitstogram. In both algorithms, server running time is $\tilde O(n)$ and user running time is $\tilde O(1)$, hence improving on the prior state-of-the-art result of Bassily and Smith [STOC 2015] requiring $O(n^{5/2})$ server time and $O(n^{3/2})$ user time. With a typically large number of participants in local algorithms (in the millions), this reduction in time complexity, in particular at the user side, is crucial for making locally private heavy hitters algorithms usable in practice. We implemented Algorithm TreeHist to verify our theoretical analysis and compared its performance with the performance of Google's RAPPOR code.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

19

Unknown

On the Complexity Analysis of the Primal Solutions for the Accelerated Randomized Dual Coordinate Ascent (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Dual first-order methods are essential techniques for large-scale constrained convex optimization. However, when recovering the primal solutions, we need $T(\epsilon^{-2})$ iterations to achieve an $\epsilon$-optimal primal solution when we apply an algorithm to the non-strongly convex dual problem with $T(\epsilon^{-1})$ iterations to achieve an $\epsilon$-optimal dual solution, where $T(x)$ can be $x$ or $\sqrt{x}$. In this paper, we prove that the iteration complexity of the primal solutions and dual solutions have the same $O\left(\frac{1}{\sqrt{\epsilon}}\right)$ order of magnitude for the accelerated randomized dual coordinate ascent. When the dual function further satisfies the quadratic functional growth condition, by restarting the algorithm at any period, we establish the linear iteration complexity for both the primal solutions and dual solutions even if the condition number is unknown. When applied to the regularized empirical risk minimization problem, we prove the iteration complexity of $O\left(n\log n+\sqrt{\frac{n}{\epsilon}}\right)$ in both primal space and dual space, where $n$ is the number of samples. Our result takes out the $\left(\log \frac{1}{\epsilon}\right)$ factor compared with the methods based on smoothing/regularization or Catalyst reduction. As far as we know, this is the first time that the optimal $O\left(\sqrt{\frac{n}{\epsilon}}\right)$ iteration complexity in the primal space is established for the dual coordinate ascent based stochastic algorithms. We also establish the accelerated linear complexity for some problems with nonsmooth loss, e.g., the least absolute deviation and SVM.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

20

Unknown

Branch and Bound for Piecewise Linear Neural Network Verification (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: The success of Deep Learning and its potential use in many safety-critical applicationshas motivated research on formal verification of Neural Network (NN) models. In thiscontext, verification involves proving or disproving that an NN model satisfies certaininput-output properties. Despite the reputation of learned NN models as black boxes,and the theoretical hardness of proving useful properties about them, researchers havebeen successful in verifying some classes of models by exploiting their piecewise linearstructure and taking insights from formal methods such as Satisifiability Modulo Theory.However, these methods are still far from scaling to realistic neural networks. To facilitateprogress on this crucial area, we exploit the Mixed Integer Linear Programming (MIP) formulation of verification to propose a family of algorithms based on Branch-and-Bound (BaB). We show that our family contains previous verification methods as special cases.With the help of the BaB framework, we make three key contributions. Firstly, we identifynew methods that combine the strengths of multiple existing approaches, accomplishingsignificant performance improvements over previous state of the art. Secondly, we introducean effective branching strategy on ReLU non-linearities. This branching strategy allows usto efficiently and successfully deal with high input dimensional problems with convolutionalnetwork architecture, on which previous methods fail frequently. Finally, we proposecomprehensive test data sets and benchmarks which includes a collection of previouslyreleased testcases. We use the data sets to conduct a thorough experimental comparison ofexisting and new algorithms and to provide an inclusive analysis of the factors impactingthe hardness of verification problems.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

21

Unknown

Tensor Train Decomposition on TensorFlow (T3F) (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Tensor Train decomposition is used across many branches of machine learning. We present T3F—a library for Tensor Train decomposition based on TensorFlow. T3F supports GPU execution, batch processing, automatic differentiation, and versatile functionality for the Riemannian optimization framework, which takes into account the underlying manifold structure to construct efficient optimization methods. The library makes it easier to implement machine learning papers that rely on the Tensor Train decomposition. T3F includes documentation, examples and 94% test coverage.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

22

Unknown

Generalized Optimal Matching Methods for Causal Inference (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: We develop an encompassing framework for matching, covariate balancing, and doubly-robust methods for causal inference from observational data called generalized optimal matching (GOM). The framework is given by generalizing a new functional-analytical formulation of optimal matching, giving rise to the class of GOM methods, for which we provide a single unified theory to analyze tractability and consistency. Many commonly used existing methods are included in GOM and, using their GOM interpretation, can be extended to optimally and automatically trade off balance for variance and outperform their standard counterparts. As a subclass, GOM gives rise to kernel optimal matching (KOM), which, as supported by new theoretical and empirical results, is notable for combining many of the positive properties of other methods in one. KOM, which is solved as a linearly-constrained convex-quadratic optimization problem, inherits both the interpretability and model-free consistency of matching but can also achieve the $\sqrt{n}$-consistency of well-specified regression and the bias reduction and robustness of doubly robust methods. In settings of limited overlap, KOM enables a very transparent method for interval estimation for partial identification and robust coverage. We demonstrate this in examples with both synthetic and real data.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

23

Unknown

WONDER: Weighted One-shot Distributed Ridge Regression in High Dimensions (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: In many areas, practitioners need to analyze large data sets that challenge conventional single-machine computing. To scale up data analysis, distributed and parallel computing approaches are increasingly needed. Here we study a fundamental and highly important problem in this area: How to do ridge regression in a distributed computing environment? Ridge regression is an extremely popular method for supervised learning, and has several optimality properties, thus it is important to study. We study one-shot methods that construct weighted combinations of ridge regression estimators computed on each machine. By analyzing the mean squared error in a high-dimensional random-effects model where each predictor has a small effect, we discover several new phenomena. Infinite-worker limit: The distributed estimator works well for very large numbers of machines, a phenomenon we call 'infinite-worker limit'. Optimal weights: The optimal weights for combining local estimators sum to more than unity, due to the downward bias of ridge. Thus, all averaging methods are suboptimal. We also propose a new Weighted ONe-shot DistributEd Ridge regression algorithm (WONDER). We test WONDER in simulation studies and using the Million Song Dataset as an example. There it can save at least 100x in computation time, while nearly preserving test accuracy.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

24

Unknown

Unique Sharp Local Minimum in L1-minimization Complete Dictionary Learning (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: We study the problem of globally recovering a dictionary from a set of signals via $\ell_1$-minimization. We assume that the signals are generated as i.i.d. random linear combinations of the $K$ atoms from a complete reference dictionary $D^*\in \mathbb R^{K\times K}$, where the linear combination coefficients are from either a Bernoulli type model or exact sparse model. First, we obtain a necessary and sufficient norm condition for the reference dictionary $D^*$ to be a sharp local minimum of the expected $\ell_1$ objective function. Our result substantially extends that of Wu and Yu (2015) and allows the combination coefficient to be non-negative. Secondly, we obtain an explicit bound on the region within which the objective value of the reference dictionary is minimal. Thirdly, we show that the reference dictionary is the unique sharp local minimum, thus establishing the first known global property of $\ell_1$-minimization dictionary learning. Motivated by the theoretical results, we introduce a perturbation based test to determine whether a dictionary is a sharp local minimum of the objective function. In addition, we also propose a new dictionary learning algorithm based on Block Coordinate Descent, called DL-BCD, which is guaranteed to decrease the obective function monotonically. Simulation studies show that DL-BCD has competitive performance in terms of recovery rate compared to other state-of-the-art dictionary learning algorithms when the reference dictionary is generated from random Gaussian matrices.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

25

Unknown

Expected Policy Gradients for Reinforcement Learning (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected sarsa, EPG integrates (or sums) across actions when estimating the gradient, instead of relying only on the action in the sampled trajectory. For continuous action spaces, we first derive a practical result for Gaussian policies and quadratic critics and then extend it to a universal analytical method, covering a broad class of actors and critics, including Gaussian, exponential families, and policies with bounded support. For Gaussian policies, we introduce an exploration method that uses covariance proportional to the matrix exponential of the scaled Hessian of the critic with respect to the actions. For discrete action spaces, we derive a variant of EPG based on softmax policies. We also establish a new general policy gradient theorem, of which the stochastic and deterministic policy gradient theorems are special cases. Furthermore, we prove that EPG reduces the variance of the gradient estimates without requiring deterministic policies and with little computational overhead. Finally, we provide an extensive experimental evaluation of EPG and show that it outperforms existing approaches on multiple challenging control domains.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

26

Unknown

Exact Guarantees on the Absence of Spurious Local Minima for Non-negative Rank-1 Robust Principal Component Analysis (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: This work is concerned with the non-negative rank-1 robust principal component analysis (RPCA), where the goal is to recover the dominant non-negative principal components of a data matrix precisely, where a number of measurements could be grossly corrupted with sparse and arbitrary large noise. Most of the known techniques for solving the RPCA rely on convex relaxation methods by lifting the problem to a higher dimension, which significantly increase the number of variables. As an alternative, the well-known Burer-Monteiro approach can be used to cast the RPCA as a non-convex and non-smooth $\ell_1$ optimization problem with a significantly smaller number of variables. In this work, we show that the low-dimensional formulation of the symmetric and asymmetric positive rank-1 RPCA based on the Burer-Monteiro approach has benign landscape, i.e., 1) it does not have any spurious local solution, 2) has a unique global solution, and 3) its unique global solution coincides with the true components. An implication of this result is that simple local search algorithms are guaranteed to achieve a zero global optimality gap when directly applied to the low-dimensional formulation. Furthermore, we provide strong deterministic and probabilistic guarantees for the exact recovery of the true principal components. In particular, it is shown that a constant fraction of the measurements could be grossly corrupted and yet they would not create any spurious local solution.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

27

Unknown

Smoothed Nonparametric Derivative Estimation using Weighted Difference Quotients (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Derivatives play an important role in bandwidth selection methods (e.g., plug-ins), data analysis and bias-corrected confidence intervals. Therefore, obtaining accurate derivative information is crucial. Although many derivative estimation methods exist, the majority require a fixed design assumption. In this paper, we propose an effective and fully data-driven framework to estimate the first and second order derivative in random design. We establish the asymptotic properties of the proposed derivative estimator, and also propose a fast selection method for the tuning parameters. The performance and flexibility of the method is illustrated via an extensive simulation study.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

28

Unknown

Learning Causal Networks via Additive Faithfulness (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: In this paper we introduce a statistical model, called additively faithful directed acyclic graph (AFDAG), for causal learning from observational data. Our approach is based on additive conditional independence (ACI), a recently proposed three-way statistical relation that shares many similarities with conditional independence but without resorting to multi-dimensional kernels. This distinct feature strikes a balance between a parametric model and a fully nonparametric model, which makes the proposed model attractive for handling large networks. We develop an estimator for AFDAG based on a linear operator that characterizes ACI, and establish the consistency and convergence rates of this estimator, as well as the uniform consistency of the estimated DAG. Moreover, we introduce a modified PC-algorithm to implement the estimating procedure efficiently, so that its complexity is determined by the level of sparseness rather than the dimension of the network. Through simulation studies we show that our method outperforms existing methods when commonly assumed conditions such as Gaussian or Gaussian copula distributions do not hold. Finally, the usefulness of AFDAG formulation is demonstrated through an application to a proteomics data set.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

29

Unknown

Conjugate Gradients for Kernel Machines (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Regularized least-squares (kernel-ridge / Gaussian process) regression is a fundamental algorithm of statistics and machine learning. Because generic algorithms for the exact solution have cubic complexity in the number of datapoints, large datasets require to resort to approximations. In this work, the computation of the least-squares prediction is itself treated as a probabilistic inference problem. We propose a structured Gaussian regression model on the kernel function that uses projections of the kernel matrix to obtain a low-rank approximation of the kernel and the matrix. A central result is an enhanced way to use the method of conjugate gradients for the specific setting of least-squares regression as encountered in machine learning.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

30

Unknown

High-dimensional Gaussian graphical models on network-linked data (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Graphical models are commonly used to represent conditional dependence relationships between variables. There are multiple methods available for exploring them from high-dimensional data, but almost all of them rely on the assumption that the observations are independent and identically distributed. At the same time, observations connected by a network are becoming increasingly common, and tend to violate these assumptions. Here we develop a Gaussian graphical model for observations connected by a network with potentially different mean vectors, varying smoothly over the network. We propose an efficient estimation algorithm and demonstrate its effectiveness on both simulated and real data, obtaining meaningful and interpretable results on a statistics coauthorship network. We also prove that our method estimates both the inverse covariance matrix and the corresponding graph structure correctly under the assumption of network “cohesion”, which refers to the empirically observed phenomenon of network neighbors sharing similar traits.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

31

Unknown

GraKeL: A Graph Kernel Library in Python (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: The problem of accurately measuring the similarity between graphs is at the core of many applications in a variety of disciplines. Graph kernels have recently emerged as a promising approach to this problem. There are now many kernels, each focusing on different structural aspects of graphs. Here, we present GraKeL, a library that unifies several graph kernels into a common framework. The library is written in Python and adheres to the scikit-learn interface. It is simple to use and can be naturally combined with scikit-learn's modules to build a complete machine learning pipeline for tasks such as graph classification and clustering. The code is BSD licensed and is available at: https://github.com/ysig/GraKeL.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

32

Unknown

Identifiability of Additive Noise Models Using Conditional Variances (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: This paper considers a new identifiability condition for additive noise models (ANMs) in which each variable is determined by an arbitrary Borel measurable function of its parents plus an independent error. It has been shown that ANMs are fully recoverable under some identifiability conditions, such as when all error variances are equal. However, this identifiable condition could be restrictive, and hence, this paper focuses on a relaxed identifiability condition that involves not only error variances, but also the influence of parents. This new class of identifiable ANMs does not put any constraints on the form of dependencies, or distributions of errors, and allows different error variances. It further provides a statistically consistent and computationally feasible structure learning algorithm for the identifiable ANMs based on the new identifiability condition. The proposed algorithm assumes that all relevant variables are observed, while it does not assume faithfulness or a sparse graph. Demonstrated through extensive simulated and real multivariate data is that the proposed algorithm successfully recovers directed acyclic graphs.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

33

Unknown

Representation Learning for Dynamic Graphs: A Survey (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Graphs arise naturally in many real-world applications including social networks, recommender systems, ontologies, biology, and computational finance. Traditionally, machine learning models for graphs have been mostly designed for static graphs. However, many applications involve evolving graphs. This introduces important challenges for learning and inference since nodes, attributes, and edges change over time. In this survey, we review the recent advances in representation learning for dynamic graphs, including dynamic knowledge graphs. We describe existing models from an encoder-decoder perspective, categorize these encoders and decoders based on the techniques they employ, and analyze the approaches in each category. We also review several prominent applications and widely used datasets and highlight directions for future research.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

34

Unknown

(1 + epsilon)-class Classification: an Anomaly Detection Method for Highly Imbalanced or Incomplete Data Sets (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Anomaly detection is not an easy problem since distribution of anomalous samples is unknown a priori. We explore a novel method that gives a trade-off possibility between one-class and two-class approaches, and leads to a better performance on anomaly detection problems with small or non-representative anomalous samples. The method is evaluated using several data sets and compared to a set of conventional one-class and two-class approaches.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

35

Unknown

Robust Asynchronous Stochastic Gradient-Push: Asymptotically Optimal and Network-Independent Performance for Strongly Convex Functions (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: We consider the standard model of distributed optimization of a sum of functions $F(\mathbf z) = \sum_{i=1}^n f_i(\mathbf z)$, where node $i$ in a network holds the function $f_i(\mathbf z)$. We allow for a harsh network model characterized by asynchronous updates, message delays, unpredictable message losses, and directed communication among nodes. In this setting, we analyze a modification of the Gradient-Push method for distributed optimization, assuming that (i) node $i$ is capable of generating gradients of its function $f_i(\mathbf z)$ corrupted by zero-mean bounded-support additive noise at each step, (ii) $F(\mathbf z)$ is strongly convex, and (iii) each $f_i(\mathbf z)$ has Lipschitz gradients. We show that our proposed method asymptotically performs as well as the best bounds on centralized gradient descent that takes steps in the direction of the sum of the noisy gradients of all the functions $f_1(\mathbf z), \ldots, f_n(\mathbf z)$ at each step.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

36

Unknown

The weight function in the subtree kernel is decisive (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Tree data are ubiquitous because they model a large variety of situations, e.g., the architecture of plants, the secondary structure of RNA, or the hierarchy of XML files. Nevertheless, the analysis of these non-Euclidean data is difficult per se. In this paper, we focus on the subtree kernel that is a convolution kernel for tree data introduced by Vishwanathan and Smola in the early 2000's. More precisely, we investigate the influence of the weight function from a theoretical perspective and in real data applications. We establish on a 2-classes stochastic model that the performance of the subtree kernel is improved when the weight of leaves vanishes, which motivates the definition of a new weight function, learned from the data and not fixed by the user as usually done. To this end, we define a unified framework for computing the subtree kernel from ordered or unordered trees, that is particularly suitable for tuning parameters. We show through eight real data classification problems the great efficiency of our approach, in particular for small data sets, which also states the high importance of the weight function. Finally, a visualization tool of the significant features is derived.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

37

Unknown

Self-paced Multi-view Co-training (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Co-training is a well-known semi-supervised learning approach which trains classifiers on two or more different views and exchanges pseudo labels of unlabeled instances in an iterative way. During the co-training process, pseudo labels of unlabeled instances are very likely to be false especially in the initial training, while the standard co-training algorithm adopts a 'draw without replacement' strategy and does not remove these wrongly labeled instances from training stages. Besides, most of the traditional co-training approaches are implemented for two-view cases, and their extensions in multi-view scenarios are not intuitive. These issues not only degenerate their performance as well as available application range but also hamper their fundamental theory. Moreover, there is no optimization model to explain the objective a co-training process manages to optimize. To address these issues, in this study we design a unified self-paced multi-view co-training (SPamCo) framework which draws unlabeled instances with replacement. Two specified co-regularization terms are formulated to develop different strategies for selecting pseudo-labeled instances during training. Both forms share the same optimization strategy which is consistent with the iteration process in co-training and can be naturally extended to multi-view scenarios. A distributed optimization strategy is also introduced to train the classifier of each view in parallel to further improve the efficiency of the algorithm. Furthermore, the SPamCo algorithm is proved to be PAC learnable, supporting its theoretical soundness. Experiments conducted on synthetic, text categorization, person re-identification, image recognition and object detection data sets substantiate the superiority of the proposed method.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

38

Unknown

Skill Rating for Multiplayer Games. Introducing Hypernode Graphs and their Spectral Theory (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: We consider the skill rating problem for multiplayer games, that is how to infer player skills from game outcomes in multiplayer games. We formulate the problem as a minimization problem $\arg \min_{s} s^T \Delta s$ where $\Delta$ is a positive semidefinite matrix and $s$ a real-valued function, of which some entries are the skill values to be inferred and other entries are constrained by the game outcomes. We leverage graph-based semi-supervised learning (SSL) algorithms for this problem. We apply our algorithms on several data sets of multiplayer games and obtain very promising results compared to Elo Duelling (see Elo, 1978) and TrueSkill (see Herbrich et al., 2006).. As we leverage graph-based SSL algorithms and because games can be seen as relations between sets of players, we then generalize the approach. For this aim, we introduce a new finite model, called hypernode graph, defined to be a set of weighted binary relations between sets of nodes. We define Laplacians of hypernode graphs. Then, we show that the skill rating problem for multiplayer games can be formulated as $\arg \min_{s} s^T \Delta s$ where $\Delta$ is the Laplacian of a hypernode graph constructed from a set of games. From a fundamental perspective, we show that hypernode graph Laplacians are symmetric positive semidefinite matrices with constant functions in their null space. We show that problems on hypernode graphs can not be solved with graph constructions and graph kernels. We relate hypernode graphs to signed graphs showing that positive relations between groups can lead to negative relations between individuals.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

39

Unknown

On Stationary-Point Hitting Time and Ergodicity of Stochastic Gradient Langevin Dynamics (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Stochastic gradient Langevin dynamics (SGLD) is a fundamental algorithm in stochastic optimization. Recent work by Zhang et al. (2017) presents an analysis for the hitting time of SGLD for the first and second order stationary points. The proof in Zhang et al. (2017) is a two-stage procedure through bounding the Cheeger's constant, which is rather complicated and leads to loose bounds. In this paper, using intuitions from stochastic differential equations, we provide a direct analysis for the hitting times of SGLD to the first and second order stationary points. Our analysis is straightforward. It only relies on basic linear algebra and probability theory tools. Our direct analysis also leads to tighter bounds comparing to Zhang et al. (2017) and shows the explicit dependence of the hitting time on different factors, including dimensionality, smoothness, noise strength, and step size effects. Under suitable conditions, we show that the hitting time of SGLD to first-order stationary points can be dimension-independent. Moreover, we apply our analysis to study several important online estimation problems in machine learning, including linear regression, matrix factorization, and online PCA.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

40

Unknown

High-Dimensional Inference for Cluster-Based Graphical Models (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Motivated by modern applications in which one constructs graphical models based on a very large number of features, this paper introduces a new class of cluster-based graphical models, in which variable clustering is applied as an initial step for reducing the dimension of the feature space. We employ model assisted clustering, in which the clusters contain features that are similar to the same unobserved latent variable. Two different cluster-based Gaussian graphical models are considered: the latent variable graph, corresponding to the graphical model associated with the unobserved latent variables, and the cluster-average graph, corresponding to the vector of features averaged over clusters. Our study reveals that likelihood based inference for the latent graph, not analyzed previously, is analytically intractable. Our main contribution is the development and analysis of alternative estimation and inference strategies, for the precision matrix of an unobservable latent vector Z. We replace the likelihood of the data by an appropriate class of empirical risk functions, that can be specialized to the latent graphical model and to the simpler, but under-analyzed, cluster-average graphical model. The estimators thus derived can be used for inference on the graph structure, for instance on edge strength or pattern recovery. Inference is based on the asymptotic limits of the entry-wise estimates of the precision matrices associated with the conditional independence graphs under consideration. While taking the uncertainty induced by the clustering step into account, we establish Berry-Esseen central limit theorems for the proposed estimators. It is noteworthy that, although the clusters are estimated adaptively from the data, the central limit theorems regarding the entries of the estimated graphs are proved under the same conditions one would use if the clusters were known in advance. As an illustration of the usage of these newly developed inferential tools, we show that they can be reliably used for recovery of the sparsity pattern of the graphs we study, under FDR control, which is verified via simulation studies and an fMRI data analysis. These experimental results confirm the theoretically established difference between the two graph structures. Furthermore, the data analysis suggests that the latent variable graph, corresponding to the unobserved cluster centers, can help provide more insight into the understanding of the brain connectivity networks relative to the simpler, average-based, graph.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

41

Unknown

Union of Low-Rank Tensor Spaces: Clustering and Completion (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: We consider the problem of clustering and completing a set of tensors with missing data that are drawn from a union of low-rank tensor spaces. In the clustering problem, given a partially sampled tensor data that is composed of a number of subtensors, each chosen from one of a certain number of unknown tensor spaces, we need to group the subtensors that belong to the same tensor space. We provide a geometrical analysis on the sampling pattern and subsequently derive the sampling rate that guarantees the correct clustering under some assumptions with high probability. Moreover, we investigate the fundamental conditions for finite/unique completability for the union of tensor spaces completion problem. Both deterministic and probabilistic conditions on the sampling pattern to ensure finite/unique completability are obtained. For both the clustering and completion problems, our tensor analysis provides significantly better bound than the bound given by the matrix analysis applied to any unfolding of the tensor data.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

42

Unknown

Estimation of a Low-rank Topic-Based Model for Information Cascades (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: We consider the problem of estimating the latent structure of a social network based on the observed information diffusion events, or cascades, where the observations for a given cascade consist of only the timestamps of infection for infected nodes but not the source of the infection. Most of the existing work on this problem has focused on estimating a diffusion matrix without any structural assumptions on it. In this paper, we propose a novel model based on the intuition that an information is more likely to propagate among two nodes if they are interested in similar topics which are also prominent in the information content. In particular, our model endows each node with an influence vector (which measures how authoritative the node is on each topic) and a receptivity vector (which measures how susceptible the node is for each topic). We show how this node-topic structure can be estimated from the observed cascades, and prove the consistency of the estimator. Experiments on synthetic and real data demonstrate the improved performance and better interpretability of our model compared to existing state-of-the-art methods.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

43

Unknown

Kymatio: Scattering Transforms in Python (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: The wavelet scattering transform is an invariant and stable signal representation suitable for many signal processing and machine learning applications. We present the Kymatio software package, an easy-to-use, high-performance Python implementation of the scattering transform in 1D, 2D, and 3D that is compatible with modern deep learning frameworks, including PyTorch and TensorFlow/Keras. The transforms are implemented on both CPUs and GPUs, the latter offering a significant speedup over the former. The package also has a small memory footprint. Source code, documentation, and examples are available under a BSD license at https://www.kymat.io.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

44

Unknown

Fast Rates for General Unbounded Loss Functions: From ERM to Generalized Bayes (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: We present new excess risk bounds for general unbounded loss functions including log loss and squared loss, where the distribution of the losses may be heavy-tailed. The bounds hold for general estimators, but they are optimized when applied to $\eta$-generalized Bayesian, MDL, and empirical risk minimization estimators. In the case of log loss, the bounds imply convergence rates for generalized Bayesian inference under misspecification in terms of a generalization of the Hellinger metric as long as the learning rate $\eta$ is set correctly. For general loss functions, our bounds rely on two separate conditions: the $v$-GRIP (generalized reversed information projection) conditions, which control the lower tail of the excess loss; and the newly introduced witness condition, which controls the upper tail. The parameter $v$ in the $v$-GRIP conditions determines the achievable rate and is akin to the exponent in the Tsybakov margin condition and the Bernstein condition for bounded losses, which the $v$-GRIP conditions generalize; favorable $v$ in combination with small model complexity leads to $\tilde{O}(1/n)$ rates. The witness condition allows us to connect the excess risk to an 'annealed' version thereof, by which we generalize several previous results connecting Hellinger and Rényi divergence to KL divergence.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

45

Unknown

Scalable Approximate MCMC Algorithms for the Horseshoe Prior (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: The horseshoe prior is frequently employed in Bayesian analysis of high-dimensional models, and has been shown to achieve minimax optimal risk properties when the truth is sparse. While optimization-based algorithms for the extremely popular Lasso and elastic net procedures can scale to dimension in the hundreds of thousands, algorithms for the horseshoe that use Markov chain Monte Carlo (MCMC) for computation are limited to problems an order of magnitude smaller. This is due to high computational cost per step and growth of the variance of time-averaging estimators as a function of dimension. We propose two new MCMC algorithms for computation in these models that have significantly improved performance compared to existing alternatives. One of the algorithms also approximates an expensive matrix product to give orders of magnitude speedup in high-dimensional applications. We prove guarantees for the accuracy of the approximate algorithm, and show that gradually decreasing the approximation error as the chain extends results in an exact algorithm. The scalability of the algorithm is illustrated in simulations with problem size as large as $N=5,000$ observations and $p=50,000$ predictors, and an application to a genome-wide association study with $N=2,267$ and $p=98,385$. The empirical results also show that the new algorithm yields estimates with lower mean squared error, intervals with better coverage, and elucidates features of the posterior that were often missed by previous algorithms in high dimensions, including bimodality of posterior marginals indicating uncertainty about which covariates belong in the model.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

46

Unknown

Community-Based Group Graphical Lasso (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: A new strategy for probabilistic graphical modeling is developed that draws parallels to community detection analysis. The method jointly estimates an undirected graph and homogeneous communities of nodes. The structure of the communities is taken into account when estimating the graph and at the same time, the structure of the graph is accounted for when estimating communities of nodes. The procedure uses a joint group graphical lasso approach with community detection-based grouping, such that some groups of edges co-occur in the estimated graph. The grouping structure is unknown and is estimated based on community detection algorithms. Theoretical derivations regarding graph convergence and sparsistency, as well as accuracy of community recovery are included, while the method's empirical performance is illustrated in an fMRI context, as well as with simulated examples.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

47

Unknown

GADMM: Fast and Communication Efficient Framework for Distributed Machine Learning (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: When the data is distributed across multiple servers, lowering the communication cost between the servers (or workers) while solving the distributed learning problem is an important problem and is the focus of this paper. In particular, we propose a fast, and communication-efficient decentralized framework to solve the distributed machine learning (DML) problem. The proposed algorithm, Group Alternating Direction Method of Multipliers (GADMM) is based on the Alternating Direction Method of Multipliers (ADMM) framework. The key novelty in GADMM is that it solves the problem in a decentralized topology where at most half of the workers are competing for the limited communication resources at any given time. Moreover, each worker exchanges the locally trained model only with two neighboring workers, thereby training a global model with a lower amount of communication overhead in each exchange. We prove that GADMM converges to the optimal solution for convex loss functions, and numerically show that it converges faster and more communication-efficient than the state-of-the-art communication-efficient algorithms such as the Lazily Aggregated Gradient (LAG) and dual averaging, in linear and logistic regression tasks on synthetic and real datasets. Furthermore, we propose Dynamic GADMM (D-GADMM), a variant of GADMM, and prove its convergence under the time-varying network topology of the workers.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

48

Unknown

Multi-Player Bandits: The Adversarial Case (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: We consider a setting where multiple players sequentially choose among a common set of actions (arms). Motivated by an application to cognitive radio networks, we assume that players incur a loss upon colliding, and that communication between players is not possible. Existing approaches assume that the system is stationary. Yet this assumption is often violated in practice, e.g., due to signal strength fluctuations. In this work, we design the first multi-player Bandit algorithm that provably works in arbitrarily changing environments, where the losses of the arms may even be chosen by an adversary. This resolves an open problem posed by Rosenski et al. (2016).

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

49

Unknown

Multiparameter Persistence Landscapes (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: An important problem in the field of Topological Data Analysis is defining topological summaries which can be combined with traditional data analytic tools. In recent work Bubenik introduced the persistence landscape, a stable representation of persistence diagrams amenable to statistical analysis and machine learning tools. In this paper we generalise the persistence landscape to multiparameter persistence modules providing a stable representation of the rank invariant. We show that multiparameter landscapes are stable with respect to the interleaving distance and persistence weighted Wasserstein distance, and that the collection of multiparameter landscapes faithfully represents the rank invariant. Finally we provide example calculations and statistical tests to demonstrate a range of potential applications and how one can interpret the landscapes associated to a multiparameter module.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

50

Unknown

Ensemble Learning for Relational Data (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: We present a theoretical analysis framework for relational ensemble models. We show that ensembles of collective classifiers can improve predictions for graph data by reducing errors due to variance in both learning and inference. In addition, we propose a relational ensemble framework that combines a relational ensemble learning approach with a relational ensemble inference approach for collective classification. The proposed ensemble techniques are applicable for both single and multiple graph settings. Experiments on both synthetic and real-world data demonstrate the effectiveness of the proposed framework. Finally, our experimental results support the theoretical analysis and confirm that ensemble algorithms that explicitly focus on both learning and inference processes and aim at reducing errors associated with both, are the best performers.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

51

Unknown

Sparse and low-rank multivariate Hawkes processes (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: We consider the problem of unveiling the implicit network structure of node interactions (such as user interactions in a social network), based only on high-frequency timestamps. Our inference is based on the minimization of the least-squares loss associated with a multivariate Hawkes model, penalized by $\ell_1$ and trace norm of the interaction tensor. We provide a first theoretical analysis for this problem, that includes sparsity and low-rank inducing penalizations. This result involves a new data-driven concentration inequality for matrix martingales in continuous time with observable variance, which is a result of independent interest and a broad range of possible applications since it extends to matrix martingales former results restricted to the scalar case. A consequence of our analysis is the construction of sharply tuned $\ell_1$ and trace-norm penalizations, that leads to a data-driven scaling of the variability of information available for each users. Numerical experiments illustrate the significant improvements achieved by the use of such data-driven penalizations.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

52

Unknown

Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: We study contextual bandit learning with an abstract policy class and continuous action space. We obtain two qualitatively different regret bounds: one competes with a smoothed version of the policy class under no continuity assumptions, while the other requires standard Lipschitz assumptions. Both bounds exhibit data-dependent “zooming” behavior and, with no tuning, yield improved guarantees for benign problems. We also study adapting to unknown smoothness parameters, establishing a price-of-adaptivity and deriving optimal adaptive algorithms that require no additional information.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

53

Unknown

Distributed Minimum Error Entropy Algorithms (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Minimum Error Entropy (MEE) principle is an important approach in Information Theoretical Learning (ITL). It is widely applied and studied in various fields for its robustness to noise. In this paper, we study a reproducing kernel-based distributed MEE algorithm, DMEE, which is designed to work with both fully supervised data and semi-supervised data. The divide-and-conquer approach is employed, so there is no inter-node communication overhead. Similar as other distributed algorithms, DMEE significantly reduces the computational complexity and memory requirement on single computing nodes. With fully supervised data, our proved learning rates equal the minimax optimal learning rates of the classical pointwise kernel-based regressions. Under the semi-supervised learning scenarios, we show that DMEE exploits unlabeled data effectively, in the sense that first, under the settings with weak regularity assumptions, additional unlabeled data significantly improves the learning rates of DMEE. Second, with sufficient unlabeled data, labeled data can be distributed to many more computing nodes, that each node takes only O(1) labels, without spoiling the learning rates in terms of the number of labels. This conclusion overcomes the saturation phenomenon in unlabeled data size. It parallels a recent results for regularized least squares (Lin and Zhou, 2018), and suggests that an inflation of unlabeled data is a solution to the MEE learning problems with decentralized data source for the concerns of privacy protection. Our work refers to pairwise learning and non-convex loss. The theoretical analysis is achieved by distributed U-statistics and error decomposition techniques in integral operators.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

54

Unknown

Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: We consider worker skill estimation for the single-coin Dawid-Skene crowdsourcing model. In practice, skill-estimation is challenging because worker assignments are sparse and irregular due to the arbitrary and uncontrolled availability of workers. We formulate skill estimation as a rank-one correlation-matrix completion problem, where the observed components correspond to observed label correlation between workers. We show that the correlation matrix can be successfully recovered and skills are identifiable if and only if the sampling matrix (observed components) does not have a bipartite connected component. We then propose a projected gradient descent scheme and show that skill estimates converge to the desired global optima for such sampling matrices. Our proof is original and the results are surprising in light of the fact that even the weighted rank-one matrix factorization problem is NP-hard in general. Next, we derive sample complexity bounds in terms of spectral properties of the signless Laplacian of the sampling matrix. Our proposed scheme achieves state-of-art performance on a number of real-world datasets.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

55

Unknown

Chaining Meets Chain Rule: Multilevel Entropic Regularization and Training of Neural Networks (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: We derive generalization and excess risk bounds for neural networks using a family of complexity measures based on a multilevel relative entropy. The bounds are obtained by introducing the notion of generated hierarchical coverings of neural networks and by using the technique of chaining mutual information introduced by Asadi et al. '18. The resulting bounds are algorithm-dependent and multiscale: they exploit the multilevel structure of neural networks. This, in turn, leads to an empirical risk minimization problem with a multilevel entropic regularization. The minimization problem is resolved by introducing a multiscale extension of the celebrated Gibbs posterior distribution, proving that the derived distribution achieves the unique minimum. This leads to a new training procedure for neural networks with performance guarantees, which exploits the chain rule of relative entropy rather than the chain rule of derivatives (as in backpropagation), and which takes into account the interactions between different scales of the hypothesis sets of neural networks corresponding to different depths of the hidden layers. To obtain an efficient implementation of the latter, we further develop a multilevel Metropolis algorithm simulating the multiscale Gibbs distribution, with an experiment for a two-layer neural network on the MNIST data set.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

56

Unknown

Apache Mahout: Machine Learning on Distributed Dataflow Systems (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Apache Mahout is a library for scalable machine learning (ML) on distributed dataflow systems, offering various implementations of classification, clustering, dimensionality reduction and recommendation algorithms. Mahout was a pioneer in large-scale machine learning in 2008, when it started and targeted MapReduce, which was the predominant abstraction for scalable computing in industry at that time. Mahout has been widely used by leading web companies and is part of several commercial cloud offerings. In recent years, Mahout migrated to a general framework enabling a mix of dataflow programming and linear algebraic computations on backends such as Apache Spark and Apache Flink. This design allows users to execute data preprocessing and model training in a single, unified dataflow system, instead of requiring a complex integration of several specialized systems. Mahout is maintained as a community-driven open source project at the Apache Software Foundation, and is available under https://mahout.apache.org.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

57

Unknown

Bayesian Closed Surface Fitting Through Tensor Products (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Closed surfaces provide a useful model for $3$-d shapes, with the data typically consisting of a cloud of points in $\mathbb{R}^3$. The existing literature on closed surface modeling focuses on frequentist point estimation methods that join surface patches along the edges, with surface patches created via Bézier surfaces or tensor products of B-splines. However, the resulting surfaces are not smooth along the edges and the geometric constraints required to join the surface patches lead to computational drawbacks. In this article, we develop a Bayesian model for closed surfaces based on tensor products of a cyclic basis resulting in infinitely smooth surface realizations. We impose sparsity on the control points through a double-shrinkage prior. Theoretical properties of the support of our proposed prior are studied and it is shown that the posterior achieves the optimal rate of convergence under reasonable assumptions on the prior. The proposed approach is illustrated with some examples.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

58

Unknown

Orlicz Random Fourier Features (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Kernel techniques are among the most widely-applied and influential tools in machine learning with applications at virtually all areas of the field. To combine this expressive power with computational efficiency numerous randomized schemes have been proposed in the literature, among which probably random Fourier features (RFF) are the simplest and most popular. While RFFs were originally designed for the approximation of kernel values, recently they have been adapted to kernel derivatives, and hence to the solution of large-scale tasks involving function derivatives. Unfortunately, the understanding of the RFF scheme for the approximation of higher-order kernel derivatives is quite limited due to the challenging polynomial growing nature of the underlying function class in the empirical process. To tackle this difficulty, we establish a finite-sample deviation bound for a general class of polynomial-growth functions under $\alpha$-exponential Orlicz condition on the distribution of the sample. Instantiating this result for RFFs, our finite-sample uniform guarantee implies a.s. convergence with tight rate for arbitrary kernel with $\alpha$-exponential Orlicz spectrum and any order of derivative.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

59

Unknown

A Regularization-Based Adaptive Test for High-Dimensional GLMs (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: In spite of its urgent importance in the era of big data, testing high-dimensional parameters in generalized linear models (GLMs) in the presence of high-dimensional nuisance parameters has been largely under-studied, especially with regard to constructing powerful tests for general (and unknown) alternatives. Most existing tests are powerful only against certain alternatives and may yield incorrect Type 1 error rates under high-dimensional nuisance parameter situations. In this paper, we propose the adaptive interaction sum of powered score (aiSPU) test in the framework of penalized regression with a non-convex penalty, called truncated Lasso penalty (TLP), which can maintain correct Type 1 error rates while yielding high statistical power across a wide range of alternatives. To calculate its p-values analytically, we derive its asymptotic null distribution. Via simulations, its superior finite-sample performance is demonstrated over several representative existing methods. In addition, we apply it and other representative tests to an Alzheimer's Disease Neuroimaging Initiative (ADNI) data set, detecting possible gene-gender interactions for Alzheimer's disease. We also put R package “aispu” implementing the proposed test on GitHub.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

60

Unknown

metric-learn: Metric Learning Algorithms in Python (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: metric-learn is an open source Python package implementing supervised and weakly-supervised distance metric learning algorithms. As part of scikit-learn-contrib, it provides a unified interface compatible with scikit-learn which allows to easily perform cross-validation, model selection, and pipelining with other machine learning estimators. metric-learn is thoroughly tested and available on PyPi under the MIT license.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

61

Unknown

Empirical Priors for Prediction in Sparse High-dimensional Linear Regression (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: In this paper we adopt the familiar sparse, high-dimensional linear regression model and focus on the important but often overlooked task of prediction. In particular, we consider a new empirical Bayes framework that incorporates data in the prior in two ways: one is to center the prior for the non-zero regression coefficients and the other is to provide some additional regularization. We show that, in certain settings, the asymptotic concentration of the proposed empirical Bayes posterior predictive distribution is very fast, and we establish a Bernstein--von Mises theorem which ensures that the derived empirical Bayes prediction intervals achieve the targeted frequentist coverage probability. The empirical prior has a convenient conjugate form, so posterior computations are relatively simple and fast. Finally, our numerical results demonstrate the proposed method's strong finite-sample performance in terms of prediction accuracy, uncertainty quantification, and computation time compared to existing Bayesian methods.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

62

Unknown

Importance Sampling Techniques for Policy Optimization (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: How can we effectively exploit the collected samples when solving a continuous control task with Reinforcement Learning? Recent results have empirically demonstrated that multiple policy optimization steps can be performed with the same batch by using off-distribution techniques based on importance sampling. However, when dealing with off-distribution optimization, it is essential to take into account the uncertainty introduced by the importance sampling process. In this paper, we propose and analyze a class of model-free, policy search algorithms that extend the recent Policy Optimization via Importance Sampling (Metelli et al., 2018) by incorporating two advanced variance reduction techniques: per-decision and multiple importance sampling. For both of them, we derive a high-probability bound, of independent interest, and then we show how to employ it to define a suitable surrogate objective function that can be used for both action-based and parameter-based settings. The resulting algorithms are finally evaluated on a set of continuous control tasks, using both linear and deep policies, and compared with modern policy optimization methods.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

63

Unknown

GluonTS: Probabilistic and Neural Time Series Modeling in Python (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: We introduce the Gluon Time Series Toolkit (GluonTS), a Python library for deep learning based time series modeling for ubiquitous tasks, such as forecasting and anomaly detection. GluonTS simplifies the time series modeling pipeline by providing the necessary components and tools for quick model development, efficient experimentation and evaluation. In addition, it contains reference implementations of state-of-the-art time series models that enable simple benchmarking of new algorithms.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

64

Unknown

Target–Aware Bayesian Inference: How to Beat Optimal Conventional Estimators (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Standard approaches for Bayesian inference focus solely on approximating the posterior distribution. Typically, this approximation is, in turn, used to calculate expectations for one or more target functions—a computational pipeline that is inefficient when the target function(s) are known upfront. We address this inefficiency by introducing a framework for target–aware Bayesian inference (TABI) that estimates these expectations directly. While conventional Monte Carlo estimators have a fundamental limit on the error they can achieve for a given sample size, our TABI framework is able to breach this limit; it can theoretically produce arbitrarily accurate estimators using only three samples, while we show empirically that it can also breach this limit in practice. We utilize our TABI framework by combining it with adaptive importance sampling approaches and show both theoretically and empirically that the resulting estimators are capable of converging faster than the standard $\mathcal{O}(1/N)$ Monte Carlo rate, potentially producing rates as fast as $\mathcal{O}(1/N^2)$. We further combine our TABI framework with amortized inference methods, to produce a method for amortizing the cost of calculating expectations. Finally, we show how TABI can be used to convert any marginal likelihood estimator into a target aware inference scheme and demonstrate the substantial benefits this can yield.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

65

Unknown

ThunderGBM: Fast GBDTs and Random Forests on GPUs (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Gradient Boosting Decision Trees (GBDTs) and Random Forests (RFs) have been used in many real-world applications. They are often a standard recipe for building state-of-the-art solutions to machine learning and data mining problems. However, training and prediction are very expensive computationally for large and high dimensional problems. This article presents an efficient and open source software toolkit called ThunderGBM which exploits the high-performance Graphics Processing Units (GPUs) for GBDTs and RFs. ThunderGBM supports classification, regression and ranking, and can run on single or multiple GPUs of a machine. Our experimental results show that ThunderGBM outperforms the existing libraries while producing similar models, and can handle high dimensional problems where existing GPU-based libraries fail. Documentation, examples, and more details about ThunderGBM are available at https://github.com/xtra-computing/thundergbm.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

66

Unknown

Identifiability and Consistent Estimation of Nonparametric Translation Hidden Markov Models with General State Space (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: This paper considers hidden Markov models where the observations are given as the sum of a latent state which lies in a general state space and some independent noise with unknown distribution. It is shown that these fully nonparametric translation models are identifiable with respect to both the distribution of the latent variables and the distribution of the noise, under mostly a light tail assumption on the latent variables. Two nonparametric estimation methods are proposed and we prove that the corresponding estimators are consistent for the weak convergence topology. These results are illustrated with numerical experiments.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

67

Unknown

Change Point Estimation in a Dynamic Stochastic Block Model (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: We consider the problem of estimating the location of a single change point in a network generated by a dynamic stochastic block model mechanism. This model produces community structure in the network that exhibits change at a single time epoch. We propose two methods of estimating the change point, together with the model parameters, before and after its occurrence. The first employs a least-squares criterion function and takes into consideration the full structure of the stochastic block model and is evaluated at each point in time. Hence, as an intermediate step, it requires estimating the community structure based on a clustering algorithm at every time point. The second method comprises the following two steps: in the first one, a least-squares function is used and evaluated at each time point, but ignoring the community structure and only considering a random graph generating mechanism exhibiting a change point. Once the change point is identified, in the second step, all network data before and after it are used together with a clustering algorithm to obtain the corresponding community structures and subsequently estimate the generating stochastic block model parameters. The first method, since it requires knowledge of the community structure and hence clustering at every point in time, is significantly more computationally expensive than the second one. On the other hand, it requires a significantly less stringent identifiability condition for consistent estimation of the change point and the model parameters than the second method; however, it also requires a condition on the misclassification rate of misallocating network nodes to their respective communities that may fail to hold in many realistic settings. Despite the apparent stringency of the identifiability condition for the second method, we show that networks generated by a stochastic block mechanism exhibiting a change in their structure can easily satisfy this condition under a multitude of scenarios, including merging/splitting communities, nodes joining another community, etc. Further, for both methods under their respective identifiability and certain additional regularity conditions, we establish rates of convergence and derive the asymptotic distributions of the change point estimators. The results are illustrated on synthetic data. In summary, this work provides an in-depth investigation of the novel problem of change point analysis for networks generated by stochastic block models, identifies key conditions for the consistent estimation of the change point, and proposes a computationally fast algorithm that solves the problem in many settings that occur in applications. Finally, it discusses challenges posed by employing clustering algorithms in this problem, that require additional investigation for their full resolution.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

68

Unknown

Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: This paper considers stochastic optimization problems for a large class of objective functions, including convex and continuous submodular. Stochastic proximal gradient methods have been widely used to solve such problems; however, their applicability remains limited when the problem dimension is large and the projection onto a convex set is computationally costly. Instead, stochastic conditional gradient algorithms are proposed as an alternative solution which rely on (i) Approximating gradients via a simple averaging technique requiring a single stochastic gradient evaluation per iteration; (ii) Solving a linear program to compute the descent/ascent direction. The gradient averaging technique reduces the noise of gradient approximations as time progresses, and replacing projection step in proximal methods by a linear program lowers the computational complexity of each iteration. We show that under convexity and smoothness assumptions, our proposed stochastic conditional gradient method converges to the optimal objective function value at a sublinear rate of $\mathcal{O}(1/t^{1/3})$. Further, for a monotone and continuous DR-submodular function and subject to a general convex body constraint, we prove that our proposed method achieves a $((1-1/e)\text{OPT} -\epsilon)$ guarantee (in expectation) with $\mathcal{O}{(1/\epsilon^3)}$ stochastic gradient computations. This guarantee matches the known hardness results and closes the gap between deterministic and stochastic continuous submodular maximization. Additionally, we achieve $((1/e)\text{OPT} -\epsilon)$ guarantee after operating on $\mathcal{O}{(1/\epsilon^3)}$ stochastic gradients for the case that the objective function is continuous DR-submodular but non-monotone and the constraint set is a down-closed convex body. By using stochastic continuous optimization as an interface, we also provide the first $(1-1/e)$ tight approximation guarantee for maximizing a monotone but stochastic submodular set function subject to a general matroid constraint and $(1/e)$ approximation guarantee for the non-monotone case.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

69

Unknown

Quadratic Decomposable Submodular Function Minimization: Theory and Practice (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: We introduce a new convex optimization problem, termed quadratic decomposable submodular function minimization (QDSFM), which allows to model a number of learning tasks on graphs and hypergraphs. The problem exhibits close ties to decomposable submodular function minimization (DSFM) yet is much more challenging to solve. We approach the problem via a new dual strategy and formulate an objective that can be optimized through a number of double-loop algorithms. The outer-loop uses either random coordinate descent (RCD) or alternative projection (AP) methods, for both of which we prove linear convergence rates. The inner-loop computes projections onto cones generated by base polytopes of the submodular functions via the modified min-norm-point or Frank-Wolfe algorithms. We also describe two new applications of QDSFM: hypergraph-adapted PageRank and semi-supervised learning. The proposed hypergraph-based PageRank algorithm can be used for local hypergraph partitioning and comes with provable performance guarantees. For hypergraph-adapted semi-supervised learning, we provide numerical experiments demonstrating the efficiency of our QDSFM solvers and their significant improvements on prediction accuracy when compared to state-of-the-art methods.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

70

Unknown

Regularized Estimation of High-dimensional Factor-Augmented Vector Autoregressive (FAVAR) Models (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: A factor-augmented vector autoregressive (FAVAR) model is defined by a VAR equation that captures lead-lag correlations amongst a set of observed variables $X$ and latent factors $F$, and a calibration equation that relates another set of observed variables $Y$ with $F$ and $X$. The latter equation is used to estimate the factors that are subsequently used in estimating the parameters of the VAR system. The FAVAR model has become popular in applied economic research, since it can summarize a large number of variables of interest as a few factors through the calibration equation and subsequently examine their influence on core variables of primary interest through the VAR equation. However, there is increasing need for examining lead-lag relationships between a large number of time series, while incorporating information from another high-dimensional set of variables. Hence, in this paper we investigate the FAVAR model under high-dimensional scaling. We introduce an appropriate identification constraint for the model parameters, which when incorporated into the formulated optimization problem yields estimates with good statistical properties. Further, we address a number of technical challenges introduced by the fact that estimates of the VAR system model parameters are based on estimated rather than directly observed quantities. The performance of the proposed estimators is evaluated on synthetic data. Further, the model is applied to commodity prices and reveals interesting and interpretable relationships between the prices and the factors extracted from a set of global macroeconomic indicators.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

71

Unknown

Sequential change-point detection in high-dimensional Gaussian graphical models (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: High dimensional piecewise stationary graphical models represent a versatile class for modelling time varying networks arising in diverse application areas, including biology, economics, and social sciences. There has been recent work in offline detection and estimation of regime changes in the topology of sparse graphical models. However, the online setting remains largely unexplored, despite its high relevance to applications in sensor networks and other engineering monitoring systems, as well as financial markets. To that end, this work introduces a novel scalable online algorithm for detecting an unknown number of abrupt changes in the inverse covariance matrix of sparse Gaussian graphical models with small delay. The proposed algorithm is based upon monitoring the conditional log-likelihood of all nodes in the network and can be extended to a large class of continuous and discrete graphical models. We also investigate asymptotic properties of our procedure under certain mild regularity conditions on the graph size, sparsity level, number of samples, and pre- and post-changes in the topology of the network. Numerical works on both synthetic and real data illustrate the good performance of the proposed methodology both in terms of computational and statistical efficiency across numerous experimental settings.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

72

Unknown

NEVAE: A Deep Generative Model for Molecular Graphs (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Deep generative models have been praised for their ability to learn smooth latent representations of images, text, and audio, which can then be used to generate new, plausible data. Motivated by these success stories, there has been a surge of interest in developing deep generative models for automated molecule design. However, these models face several difficulties due to the unique characteristics of molecular graphs—their underlying structure is not Euclidean or grid-like, they remain isomorphic under permutation of the nodes’ labels, and they come with a different number of nodes and edges. In this paper, we first propose a novel variational autoencoder for molecular graphs, whose encoder and decoder are specially designed to account for the above properties by means of several technical innovations. Moreover, in contrast with the state of the art, our decoder is able to provide the spatial coordinates of the atoms of the molecules it generates. Then, we develop a gradient-based algorithm to optimize the decoder of our model so that it learns to generate molecules that maximize the value of certain property of interest and, given any arbitrary molecule, it is able to optimize the spatial configuration of its atoms for greater stability. Experiments reveal that our variational autoencoder can discover plausible, diverse and novel molecules more effectively than several state of the art models. Moreover, for several properties of interest, our optimized decoder is able to identify molecules with property values 121% higher than those identified by several state of the art methods based on Bayesian optimization and reinforcement learning.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

73

Unknown

Causal Discovery from Heterogeneous/Nonstationary Data (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: It is commonplace to encounter heterogeneous or nonstationary data, of which the underlying generating process changes across domains or over time. Such a distribution shift feature presents both challenges and opportunities for causal discovery. In this paper, we develop a framework for causal discovery from such data, called Constraint-based causal Discovery from heterogeneous/NOnstationary Data (CD-NOD), to find causal skeleton and directions and estimate the properties of mechanism changes. First, we propose an enhanced constraint-based procedure to detect variables whose local mechanisms change and recover the skeleton of the causal structure over observed variables. Second, we present a method to determine causal orientations by making use of independent changes in the data distribution implied by the underlying causal model, benefiting from information carried by changing distributions. After learning the causal structure, next, we investigate how to efficiently estimate the “driving force” of the nonstationarity of a causal mechanism. That is, we aim to extract from data a low-dimensional representation of changes. The proposed methods are nonparametric, with no hard restrictions on data distributions and causal mechanisms, and do not rely on window segmentation. Furthermore, we find that data heterogeneity benefits causal structure identification even with particular types of confounders. Finally, we show the connection between heterogeneity/nonstationarity and soft intervention in causal discovery. Experimental results on various synthetic and real-world data sets (task-fMRI and stock market data) are presented to demonstrate the efficacy of the proposed methods.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

74

Unknown

Distributed High-dimensional Regression Under a Quantile Loss Function (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: This paper studies distributed estimation and support recovery for high-dimensional linear regression model with heavy-tailed noise. To deal with heavy-tailed noise whose variance can be infinite, we adopt the quantile regression loss function instead of the commonly used squared loss. However, the non-smooth quantile loss poses new challenges to high-dimensional distributed estimation in both computation and theoretical development. To address the challenge, we transform the response variable and establish a new connection between quantile regression and ordinary linear regression. Then, we provide a distributed estimator that is both computationally and communicationally efficient, where only the gradient information is communicated at each iteration. Theoretically, we show that, after a constant number of iterations, the proposed estimator achieves a near-oracle convergence rate without any restriction on the number of machines. Moreover, we establish the theoretical guarantee for the support recovery. The simulation analysis is provided to demonstrate the effectiveness of our method.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

75

Unknown

apricot: Submodular selection for data summarization in Python (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: We present apricot, an open source Python package for selecting representative subsets from large data sets using submodular optimization. The package implements several efficient greedy selection algorithms that offer strong theoretical guarantees on the quality of the selected set. Additionally, several submodular set functions are implemented, including facility location, which is broadly applicable but requires memory quadratic in the number of examples in the data set, and a feature-based function that is less broadly applicable but can scale to millions of examples. Apricot is extremely efficient, using both algorithmic speedups such as the lazy greedy algorithm and memoization as well as code optimization using numba. We demonstrate the use of subset selection by training machine learning models to comparable accuracy using either the full data set or a representative subset thereof. This paper presents an explanation of submodular selection, an overview of the features in apricot, and applications to two data sets.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

76

Unknown

Learning from Binary Multiway Data: Probabilistic Tensor Decomposition and its Statistical Optimality (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: We consider the problem of decomposing a higher-order tensor with binary entries. Such data problems arise frequently in applications such as neuroimaging, recommendation system, topic modeling, and sensor network localization. We propose a multilinear Bernoulli model, develop a rank-constrained likelihood-based estimation method, and obtain the theoretical accuracy guarantees. In contrast to continuous-valued problems, the binary tensor problem exhibits an interesting phase transition phenomenon according to the signal-to-noise ratio. The error bound for the parameter tensor estimation is established, and we show that the obtained rate is minimax optimal under the considered model. Furthermore, we develop an alternating optimization algorithm with convergence guarantees. The efficacy of our approach is demonstrated through both simulations and analyses of multiple data sets on the tasks of tensor completion and clustering.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

77

Unknown

Generating Weighted MAX-2-SAT Instances with Frustrated Loops: an RBM Case Study (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Many optimization problems can be cast into the maximum satisfiability (MAX-SAT) form, and many solvers have been developed for tackling such problems. To evaluate a MAXSAT solver, it is convenient to generate hard MAX-SAT instances with known solutions. Here, we propose a method of generating weighted MAX-2-SAT instances inspired by the frustrated-loop algorithm used by the quantum annealing community. We extend the algorithm for instances of general bipartite couplings, with the associated optimization problem being the minimization of the restricted Boltzmann machine (RBM) energy over the nodal values, which is useful for effectively pre-training the RBM. The hardness of the generated instances can be tuned through a central parameter known as the frustration index. Two versions of the algorithm are presented: the random- and structured-loop algorithms. For the random-loop algorithm, we provide a thorough theoretical and empirical analysis on its mathematical properties from the perspective of frustration, and observe empirically a double phase transition behavior in the hardness scaling behavior driven by the frustration index. For the structured-loop algorithm, we show that it offers an improvement in hardness over the random-loop algorithm in the regime of high loop density, with the variation of hardness tunable through the concentration of frustrated weights.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

78

Unknown

Convex and Non-Convex Approaches for Statistical Inference with Class-Conditional Noisy Labels (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: We study the problem of estimation and testing in logistic regression with class-conditional noise in the observed labels, which has an important implication in the Positive-Unlabeled (PU) learning setting. With the key observation that the label noise problem belongs to a special sub-class of generalized linear models (GLM), we discuss convex and non-convex approaches that address this problem. A non-convex approach based on the maximum likelihood estimation produces an estimator with several optimal properties, but a convex approach has an obvious advantage in optimization. We demonstrate that in the low-dimensional setting, both estimators are consistent and asymptotically normal, where the asymptotic variance of the non-convex estimator is smaller than the convex counterpart. We also quantify the efficiency gap which provides insight into when the two methods are comparable. In the high-dimensional setting, we show that both estimation procedures achieve $\ell_2$-consistency at the minimax optimal $\sqrt{s\log p/n}$ rates under mild conditions. Finally, we propose an inference procedure using a de-biasing approach. We validate our theoretical findings through simulations and a real-data example.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

79

Unknown

Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible. We consider for the first time the semiparametric efficiency limits of OPE in Markov decision processes (MDPs), where actions, rewards, and states are memoryless. We show existing OPE estimators may fail to be efficient in this setting. We develop a new estimator based on cross-fold estimation of $q$-functions and marginalized density ratios, which we term double reinforcement learning (DRL). We show that DRL is efficient when both components are estimated at fourth-root rates and is also doubly robust when only one component is consistent. We investigate these properties empirically and demonstrate the performance benefits due to harnessing memorylessness.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

80

Unknown

Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Group (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: This paper considers the fundamental problem of learning a complete (orthogonal) dictionary from samples of sparsely generated signals. Most existing methods solve the dictionary (and sparse representations) based on heuristic algorithms, usually without theoretical guarantees for either optimality or complexity. The recent $\ell^1$-minimization based methods do provide such guarantees but the associated algorithms recover the dictionary one column at a time. In this work, we propose a new formulation that maximizes the $\ell^4$-norm over the orthogonal group, to learn the entire dictionary. We prove that under a random data model, with nearly minimum sample complexity, the global optima of the $\ell^4$-norm are very close to signed permutations of the ground truth. Inspired by this observation, we give a conceptually simple and yet effective algorithm based on matching, stretching, and projection (MSP). The algorithm provably converges locally and cost per iteration is merely an SVD. In addition to strong theoretical guarantees, experiments show that the new algorithm is significantly more efficient and effective than existing methods, including KSVD and $\ell^1$-based methods. Preliminary experimental results on mixed real imagery data clearly demonstrate advantages of so learned dictionary over classic PCA bases.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

81

Unknown

Learning Big Gaussian Bayesian Networks: Partition, Estimation and Fusion (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Structure learning of Bayesian networks has always been a challenging problem. Nowadays, massive-size networks with thousands or more of nodes but fewer samples frequently appear in many areas. We develop a divide-and-conquer framework, called partition-estimation-fusion (PEF), for structure learning of such big networks. The proposed method first partitions nodes into clusters, then learns a subgraph on each cluster of nodes, and finally fuses all learned subgraphs into one Bayesian network. The PEF method is designed in a flexible way so that any structure learning method may be used in the second step to learn a subgraph structure as either a DAG or a CPDAG. In the clustering step, we adapt hierarchical clustering to automatically choose a proper number of clusters. In the fusion step, we propose a novel hybrid method that sequentially adds edges between subgraphs. Extensive numerical experiments demonstrate the competitive performance of our PEF method, in terms of both speed and accuracy compared to existing methods. Our method can improve the accuracy of structure learning by 20% or more, while reducing running time up to two orders-of-magnitude.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

82

Unknown

Breaking the Curse of Nonregularity with Subagging --- Inference of the Mean Outcome under Optimal Treatment Regimes (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Precision medicine is an emerging medical approach that allows physicians to select the treatment options based on individual patient information. The goal of precision medicine is to identify the optimal treatment regime (OTR) that yields the most favorable clinical outcome. Prior to adopting any OTR in clinical practice, it is crucial to know the impact of implementing such a policy. Although considerable research has been devoted to estimating the OTR in the literature, less attention has been paid to statistical inference of the OTR. Challenges arise in the nonregular cases where the OTR is not uniquely defined. To deal with nonregularity, we develop a novel inference method for the mean outcome under an OTR (the optimal value function) based on subsample aggregating (subagging). The proposed method can be applied to multi-stage studies where treatments are sequentially assigned over time. Bootstrap aggregating (bagging) and subagging have been recognized as effective vari- ance reduction techniques to improve unstable estimators or classifiers (Buhlmann and Yu, 2002). However, it remains unknown whether these approaches can yield valid inference results. We show the proposed confidence interval (CI) for the optimal value function achieves nominal coverage. In addition, due to the variance reduction effect of subagging, our method enjoys certain statistical optimality. Specifically, we show that the mean squared error of the proposed value estimator is strictly smaller than that based on the simple sample-splitting estimator in the nonregular cases. Moreover, under certain conditions, the length of our proposed CI is shown to be on average shorter than CIs constructed based on the existing state-of-the-art method (Luedtke and van der Laan, 2016) and the 'oracle' method which works as well as if an OTR were known. Extensive numerical studies are conducted to back up our theoretical findings.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

83

Unknown

Distributionally Ambiguous Optimization for Batch Bayesian Optimization (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: We propose a novel, theoretically-grounded, acquisition function for Batch Bayesian Optimization informed by insights from distributionally ambiguous optimization. Our acquisition function is a lower bound on the well-known Expected Improvement function, which requires evaluation of a Gaussian expectation over a multivariate piecewise affine function. Our bound is computed instead by evaluating the best-case expectation over all probability distributions consistent with the same mean and variance as the original Gaussian distribution. Unlike alternative approaches, including Expected Improvement, our proposed acquisition function avoids multi-dimensional integrations entirely, and can be computed exactly - even on large batch sizes - as the solution of a tractable convex optimization problem. Our suggested acquisition function can also be optimized efficiently, since first and second derivative information can be calculated inexpensively as by-products of the acquisition function calculation itself. We derive various novel theorems that ground our work theoretically and we demonstrate superior performance via simple motivating examples, benchmark functions and real-world problems.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

84

Unknown

Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: In many real-world settings, a team of agents must coordinate its behaviour while acting in a decentralised fashion. At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a mixing network that estimates joint action-values as a monotonic combination of per-agent values. We structurally enforce that the joint-action value is monotonic in the per-agent values, through the use of non-negative weights in the mixing network, which guarantees consistency between the centralised and decentralised policies. To evaluate the performance of QMIX, we propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning. We evaluate QMIX on a challenging set of SMAC scenarios and show that it significantly outperforms existing multi-agent reinforcement learning methods.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

85

Unknown

Spectral Algorithms for Community Detection in Directed Networks (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Community detection in large social networks is affected by degree heterogeneity of nodes. The D-SCORE algorithm for directed networks was introduced to reduce this effect by taking the element-wise ratios of the singular vectors of the adjacency matrix before clustering. Meaningful results were obtained for the statistician citation network, but rigorous analysis on its performance was missing. First, this paper establishes theoretical guarantee for this algorithm and its variants for the directed degree-corrected block model (Directed-DCBM). Second, this paper provides significant improvements for the original D-SCORE algorithms by attaching the nodes outside of the community cores using the information of the original network instead of the singular vectors.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

86

Unknown

Trust-Region Variational Inference with Gaussian Mixture Models (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Many methods for machine learning rely on approximate inference from intractable probability distributions. Variational inference approximates such distributions by tractable models that can be subsequently used for approximate inference. Learning sufficiently accurate approximations requires a rich model family and careful exploration of the relevant modes of the target distribution. We propose a method for learning accurate GMM approximations of intractable probability distributions based on insights from policy search by using information-geometric trust regions for principled exploration. For efficient improvement of the GMM approximation, we derive a lower bound on the corresponding optimization objective enabling us to update the components independently. Our use of the lower bound ensures convergence to a stationary point of the original objective. The number of components is adapted online by adding new components in promising regions and by deleting components with negligible weight. We demonstrate on several domains that we can learn approximations of complex, multimodal distributions with a quality that is unmet by previous variational inference methods, and that the GMM approximation can be used for drawing samples that are on par with samples created by state-of-the-art MCMC samplers while requiring up to three orders of magnitude less computational resources.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

87

Unknown

Adaptive Approximation and Generalization of Deep Neural Network with Intrinsic Dimensionality (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: In this study, we prove that an intrinsic low dimensionality of covariates is the main factor that determines the performance of deep neural networks (DNNs). DNNs generally provide outstanding empirical performance. Hence, numerous studies have actively investigated the theoretical properties of DNNs to understand their underlying mechanisms. In particular, the behavior of DNNs in terms of high-dimensional data is one of the most critical questions. However, this issue has not been sufficiently investigated from the aspect of covariates, although high-dimensional data have practically low intrinsic dimensionality. In this study, we derive bounds for an approximation error and a generalization error regarding DNNs with intrinsically low dimensional covariates. We apply the notion of the Minkowski dimension and develop a novel proof technique. Consequently, we show that convergence rates of the errors by DNNs do not depend on the nominal high dimensionality of data, but on its lower intrinsic dimension. We further prove that the rate is optimal in the minimax sense. We identify an advantage of DNNs by showing that DNNs can handle a broader class of intrinsic low dimensional data than other adaptive estimators. Finally, we conduct a numerical simulation to validate the theoretical results.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

88

Unknown

Kriging Prediction with Isotropic Matern Correlations: Robustness and Experimental Designs (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: This work investigates the prediction performance of the kriging predictors. We derive some error bounds for the prediction error in terms of non-asymptotic probability under the uniform metric and $L_p$ metrics when the spectral densities of both the true and the imposed correlation functions decay algebraically. The Matern family is a prominent class of correlation functions of this kind. Our analysis shows that, when the smoothness of the imposed correlation function exceeds that of the true correlation function, the prediction error becomes more sensitive to the space-filling property of the design points. In particular, the kriging predictor can still reach the optimal rate of convergence, if the experimental design scheme is quasi-uniform. Lower bounds of the kriging prediction error are also derived under the uniform metric and $L_p$ metrics. An accurate characterization of this error is obtained, when an oversmoothed correlation function and a space-filling design is used.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

89

Unknown

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Reinforcement learning (RL) is a popular paradigm for addressing sequential decision tasks in which the agent has only limited environmental feedback. Despite many advances over the past three decades, learning in many domains still requires a large amount of interaction with the environment, which can be prohibitively expensive in realistic scenarios. To address this problem, transfer learning has been applied to reinforcement learning such that experience gained in one task can be leveraged when starting to learn the next, harder task. More recently, several lines of research have explored how tasks, or data samples themselves, can be sequenced into a curriculum for the purpose of learning a problem that may otherwise be too difficult to learn from scratch. In this article, we present a framework for curriculum learning (CL) in reinforcement learning, and use it to survey and classify existing CL methods in terms of their assumptions, capabilities, and goals. Finally, we use our framework to find open problems and suggest directions for future RL curriculum learning research.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

90

Unknown

Krylov Subspace Method for Nonlinear Dynamical Systems with Random Noise (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Operator-theoretic analysis of nonlinear dynamical systems has attracted much attention in a variety of engineering and scientific fields, endowed with practical estimation methods using data such as dynamic mode decomposition. In this paper, we address a lifted representation of nonlinear dynamical systems with random noise based on transfer operators, and develop a novel Krylov subspace method for estimating the operators using finite data, with consideration of the unboundedness of operators. For this purpose, we first consider Perron-Frobenius operators with kernel-mean embeddings for such systems. We then extend the Arnoldi method, which is the most classical type of Kryov subspace methods, so that it can be applied to the current case. Meanwhile, the Arnoldi method requires the assumption that the operator is bounded, which is not necessarily satisfied for transfer operators on nonlinear systems. We accordingly develop the shift-invert Arnoldi method for Perron-Frobenius operators to avoid this problem. Also, we describe an approach of evaluating predictive accuracy by estimated operators on the basis of the maximum mean discrepancy, which is applicable, for example, to anomaly detection in complex systems. The empirical performance of our methods is investigated using synthetic and real-world healthcare data.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

91

Unknown

Scikit-network: Graph Analysis in Python (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Scikit-network is a Python package inspired by scikit-learn for the analysis of large graphs. Graphs are represented by their adjacency matrix in the sparse CSR format of SciPy. The package provides state-of-the-art algorithms for ranking, clustering, classifying, embedding and visualizing the nodes of a graph. High performance is achieved through a mix of fast matrix-vector products (using SciPy), compiled code (using Cython) and parallel processing. The package is distributed under the BSD license, with dependencies limited to NumPy and SciPy. It is compatible with Python 3.6 and newer. Source code, documentation and installation instructions are available online.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

92

Unknown

Near-optimal Individualized Treatment Recommendations (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: The individualized treatment recommendation (ITR) is an important analytic framework for precision medicine. The goal of ITR is to assign the best treatments to patients based on their individual characteristics. From the machine learning perspective, the solution to the ITR problem can be formulated as a weighted classification problem to maximize the mean benefit from the recommended treatments given patients' characteristics. Several ITR methods have been proposed in both the binary setting and the multicategory setting. In practice, one may prefer a more flexible recommendation that includes multiple treatment options. This motivates us to develop methods to obtain a set of near-optimal individualized treatment recommendations alternative to each other, called alternative individualized treatment recommendations (A-ITR). We propose two methods to estimate the optimal A-ITR within the outcome weighted learning (OWL) framework. Simulation studies and a real data analysis for Type 2 diabetic patients with injectable antidiabetic treatments are conducted to show the usefulness of the proposed A-ITR framework. We also show the consistency of these methods and obtain an upper bound for the risk between the theoretically optimal recommendation and the estimated one. An R package "aitr" has been developed, found at https://github.com/menghaomiao/aitr.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

93

Unknown

Robust Reinforcement Learning with Bayesian Optimisation and Quadrature (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Bayesian optimisation has been successfully applied to a variety of reinforcement learning problems. However, the traditional approach for learning optimal policies in simulators does not utilise the opportunity to improve learning by adjusting certain environment variables: state features that are unobservable and randomly determined by the environment in a physical setting but are controllable in a simulator. This article considers the problem of finding a robust policy while taking into account the impact of environment variables. We present Alternating Optimisation and Quadrature (ALOQ), which uses Bayesian optimisation and Bayesian quadrature to address such settings. We also present Transferable ALOQ (TALOQ), for settings where simulator inaccuracies lead to difficulty in transferring the learnt policy to the physical system. We show that our algorithms are robust to the presence of significant rare events, which may not be observable under random sampling but play a substantial role in determining the optimal policy. Experimental results across different domains show that our algorithms learn robust policies efficiently.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

94

Unknown

Consistency of Semi-Supervised Learning Algorithms on Graphs: Probit and One-Hot Methods (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Graph-based semi-supervised learning is the problem of propagating labels from a small number of labelled data points to a larger set of unlabelled data. This paper is concerned with the consistency of optimization-based techniques for such problems, in the limit where the labels have small noise and the underlying unlabelled data is well clustered. We study graph-based probit for binary classification, and a natural generalization of this method to multi-class classification using one-hot encoding. The resulting objective function to be optimized comprises the sum of a quadratic form defined through a rational function of the graph Laplacian, involving only the unlabelled data, and a fidelity term involving only the labelled data. The consistency analysis sheds light on the choice of the rational function defining the optimization.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

95

Unknown

Optimal Estimation of Sparse Topic Models (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Topic models have become popular tools for dimension reduction and exploratory analysis of text data which consists in observed frequencies of a vocabulary of $p$ words in $n$ documents, stored in a $p\times n$ matrix. The main premise is that the mean of this data matrix can be factorized into a product of two non-negative matrices: a $p\times K$ word-topic matrix $A$ and a $K\times n$ topic-document matrix $W$. This paper studies the estimation of $A$ that is possibly element-wise sparse, and the number of topics $K$ is unknown. In this under-explored context, we derive a new minimax lower bound for the estimation of such $A$ and propose a new computationally efficient algorithm for its recovery. We derive a finite sample upper bound for our estimator, and show that it matches the minimax lower bound in many scenarios. Our estimate adapts to the unknown sparsity of $A$ and our analysis is valid for any finite $n$, $p$, $K$ and document lengths. Empirical results on both synthetic data and semi-synthetic data show that our proposed estimator is a strong competitor of the existing state-of-the-art algorithms for both non-sparse $A$ and sparse $A$, and has superior performance is many scenarios of interest.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

96

Unknown

Estimate Sequences for Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: In this paper, we propose a unified view of gradient-based algorithms for stochastic convex composite optimization by extending the concept of estimate sequence introduced by Nesterov. More precisely, we interpret a large class of stochastic optimization methods as procedures that iteratively minimize a surrogate of the objective, which covers the stochastic gradient descent method and variants of the incremental approaches SAGA, SVRG, and MISO/Finito/SDCA. This point of view has several advantages: (i) we provide a simple generic proof of convergence for all of the aforementioned methods; (ii) we naturally obtain new algorithms with the same guarantees; (iii) we derive generic strategies to make these algorithms robust to stochastic noise, which is useful when data is corrupted by small random perturbations. Finally, we propose a new accelerated stochastic gradient descent algorithm and a new accelerated SVRG algorithm that is robust to stochastic noise.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

97

Unknown

Randomization as Regularization: A Degrees of Freedom Explanation for Random Forest Success (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Random forests remain among the most popular off-the-shelf supervised machine learning tools with a well-established track record of predictive accuracy in both regression and classification settings. Despite their empirical success as well as a bevy of recent work investigating their statistical properties, a full and satisfying explanation for their success has yet to be put forth. Here we aim to take a step forward in this direction by demonstrating that the additional randomness injected into individual trees serves as a form of implicit regularization, making random forests an ideal model in low signal-to-noise ratio (SNR) settings. Specifically, from a model-complexity perspective, we show that the mtry parameter in random forests serves much the same purpose as the shrinkage penalty in explicitly regularized regression procedures like lasso and ridge regression. To highlight this point, we design a randomized linear-model-based forward selection procedure intended as an analogue to tree-based random forests and demonstrate its surprisingly strong empirical performance. Numerous demonstrations on both real and synthetic data are provided.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

98

Unknown

Variational Inference for Computational Imaging Inverse Problems (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: Machine learning methods for computational imaging require uncertainty estimation to be reliable in real settings. While Bayesian models offer a computationally tractable way of recovering uncertainty, they need large data volumes to be trained, which in imaging applications implicates prohibitively expensive collections with specific imaging instruments. This paper introduces a novel framework to train variational inference for inverse problems exploiting in combination few experimentally collected data, domain expertise and existing image data sets. In such a way, Bayesian machine learning models can solve imaging inverse problems with minimal data collection efforts. Extensive simulated experiments show the advantages of the proposed framework. The approach is then applied to two real experimental optics settings: holographic image reconstruction and imaging through highly scattering media. In both settings, state of the art reconstructions are achieved with little collection of training data.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

99

Unknown

Local Causal Network Learning for Finding Pairs of Total and Direct Effects (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: In observational studies, it is important to evaluate not only the total effect but also the direct and indirect effects of a treatment variable on a response variable. In terms of local structural learning of causal networks, we try to find all possible pairs of total and direct causal effects, which can further be used to calculate indirect causal effects. An intuitive global learning approach is first to find an essential graph over all variables representing all Markov equivalent causal networks, and then enumerate all equivalent networks and estimate a pair of the total and direct effects for each of them. However, it could be inefficient to learn an essential graph and enumerate equivalent networks when the true causal graph is large. In this paper, we propose a local learning approach instead. In the local learning approach, we first learn locally a chain component containing the treatment. Then, if necessary, we learn locally a chain component containing the response. Next, we locally enumerate all possible pairs of the treatment's parents and the response's parents. Finally based on these pairs, we find all possible pairs of total and direct effects of the treatment on the response.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

100

Unknown

Streamlined Variational Inference with Higher Level Random Effects (2020)

In: Journal of Machine Learning Research

add to mindlist on the mindlist

Details

Publication Date: 2020

Description: We derive and present explicit algorithms to facilitate streamlined computing for variational inference for models containing higher level random effects. Existing literature is such that streamlined variational inference is restricted to mean field variational Bayes algorithms for two-level random effects models. Here we provide the following extensions: (1) explicit Gaussian response mean field variational Bayes algorithms for three-level models, (2) explicit algorithms for the alternative variational message passing approach in the case of two-level and three-level models, and (3) an explanation of how arbitrarily high levels of nesting can be handled based on the recently published matrix algebraic results of the authors. A pay-off from (2) is simple extension to non-Gaussian response models. In summary, we remove barriers for streamlining variational inference algorithms based on either the mean field variational Bayes approach or the variational message passing approach when higher level random effects are present.

Print ISSN: 1532-4435

Electronic ISSN: 1533-7928

Topics: Computer Science

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext