ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Electronic Resource

On-line learning of linear functions (1995)

Littlestone, Nicholas ; Warmuth, Manfred K. ; Long, Philip M.

Springer

Computational complexity 5 (1995), S. 1-23

add to mindlist on the mindlist

Details

ISSN: 1420-8954

Keywords: Machine learning ; computational learning theory ; on-line learning ; linear functions ; worst-case loss bounds ; adaptive filter theory ; 68T05

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We present an algorithm for the on-line learning of linear functions which is optimal to within a constant factor with respect to bounds on the sum of squared errors for a worst case sequence of trials. The bounds are logarithmic in the number of variables. Furthermore, the algorithm is shown to be optimally robust with respect to noise in the data (again to within a constant factor).

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF01277953

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

2

Electronic Resource

PAC Learning Axis-aligned Rectangles with Respect to Product Distributions from Multiple-Instance Examples (1998)

Long, Philip M. ; Tan, Lei

Springer

Machine learning 30 (1998), S. 7-21

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: PAC learning ; multiple-instance examples ; axis-aligned hyperrectangles

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We describe a polynomial-time algorithm for learning axis-aligned rectangles in Q d with respect to product distributions from multiple-instance examples in the PAC model. Here, each example consists of n elements of Qd together with a label indicating whether any of the n points is in the rectangle to be learned. We assume that there is an unknown product distribution D over Q d such that all instances are independently drawn according to D. The accuracy of a hypothesis is measured by the probability that it would incorrectly predict whether one of n more points drawn from D was in the rectangle to be learned. Our algorithm achieves accuracy ∈ with probability 1-δ in O (d5 n12/∈20 log2 nd/∈δ time.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1007450326753

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

3

Electronic Resource

On the complexity of function learning (1995)

Auer, Peter ; Long, Philip M. ; Maass, Wolfgang ; [et al.]

Springer

Machine learning 18 (1995), S. 187-230

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: computational learning theory ; on-line learning ; mistake-bounded learning ; function learning

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract The majority of results in computational learning theory are concerned with concept learning, i.e. with the special case of function learning for classes of functions with range {0, 1}. Much less is known about the theory of learning functions with a larger range such as ℕ or ℝ. In particular relatively few results exist about the general structure of common models for function learning, and there are only very few nontrivial function classes for which positive learning results have been exhibited in any of these models. We introduce in this paper the notion of a binary branching adversary tree for function learning, which allows us to give a somewhat surprising equivalent characterization of the optimal learning cost for learning a class of real-valued functions (in terms of a max-min definition which does not involve any “learning” model). Another general structural result of this paper relates the cost for learning a union of function classes to the learning costs for the individual function classes. Furthermore, we exhibit an efficient learning algorithm for learning convex piecewise linear functions from ℝ d into ℝ. Previously, the class of linear functions from ℝ d into ℝ was the only class of functions with multidimensional domain that was known to be learnable within the rigorous framework of a formal model for online learning. Finally we give a sufficient condition for an arbitrary class $$\mathcal{F}$$ of functions from ℝ into ℝ that allows us to learn the class of all functions that can be written as the pointwise maximum ofk functions from $$\mathcal{F}$$ . This allows us to exhibit a number of further nontrivial classes of functions from ℝ into ℝ for which there exist efficient learning algorithms.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00993410

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

4

Electronic Resource

Tracking Drifting Concepts By Minimizing Disagreements (1994)

Helmbold, David P. ; Long, Philip M.

Springer

Machine learning 14 (1994), S. 27-45

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: Computational learning theory ; concept drift ; concept learning

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract In this paper we consider the problem of tracking a subset of a domain (called the target) which changes gradually over time. A single (unknown) probability distribution over the domain is used to generate random examples for the learning algorithm and measure the speed at which the target changes. Clearly, the more rapidly the target moves, the harder it is for the algorithm to maintain a good approximation of the target. Therefore we evaluate algorithms based on how much movement of the target can be tolerated between examples while predicting with accuracy ε Furthermore, the complexity of the class $$\mathcal{H}$$ of possible targets, as measured by d, its VC-dimension, also effects the difficulty of tracking the target concept. We show that if the problem of minimizing the number of disagreements with a sample from among concepts in a class $$\mathcal{H}$$ can be approximated to within a factor k, then there is a simple tracking algorithm for $$\mathcal{H}$$ which can achieve a probability ε of making a mistake if the target movement rate is at most a constant times $$ \in ^2 /(k(d + k)\ln \frac{1}{ \in })$$ , where d is the Vapnik-Chervonenkis dimension of $$\mathcal{H}$$ . Also, we show that if $$\mathcal{H}$$ is properly PAC-learnable, then there is an efficient (randomized) algorithm that with high probability approximately minimizes disagreements to within a factor of 7d + 1, yielding an efficient tracking algorithm for $$\mathcal{H}$$ which tolerates drift rates up to a constant times $$ \in ^2 /(d^2 \ln \frac{1}{ \in })$$ . In addition, we prove complementary results for the classes of halfspaces and axis-aligned hyperrectangles showing that the maximum rate of drift that any algorithm (even with unlimited computational power) can tolerate is a constant times ε2/d.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1022694620923

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

5

Electronic Resource

Guest Editor's Introduction (1997)

Long, Philip M.

Springer

Machine learning 27 (1997), S. 5-5

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1007397909744

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

6

Electronic Resource

Structural Results About On-line Learning Models With and Without Queries (1999)

Auer, Peter ; Long, Philip M.

Springer

Machine learning 36 (1999), S. 147-181

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: computational learning theory ; learning with queries ; mistake bounds ; function learning ; learning with noise

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We solve an open problem of Maass and Turán, showing that the optimal mistake-bound when learning a given concept class without membership queries is within a constant factor of the optimal number of mistakes plus membership queries required by an algorithm that can ask membership queries. Previously known results imply that the constant factor in our bound is best possible. We then show that, in a natural generalization of the mistake-bound model, the usefulness to the learner of arbitrary “yes-no” questions between trials is very limited. We show that several natural structural questions about relatives of the mistake-bound model can be answered through the application of this general result. Most of these results can be interpreted as saying that learning in apparently less powerful (and more realistic) models is not much more difficult than learning in more powerful models.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1007614417594

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

7

Electronic Resource

The Complexity of Learning According to Two Models of a Drifting Environment (1999)

Long, Philip M.

Springer

Machine learning 37 (1999), S. 337-354

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: computational learning theory ; concept drift ; context-sensitive learning ; prediction ; PAC learning ; agnostic learning ; uniform convergence ; VC theory

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract We show that a $$\frac{{c \in ^3 }}{{{\text{VCdim(}}\mathcal{F}{\text{)}}}}$$ bound on the rate of drift of the distribution generating the examples is sufficient for agnostic learning to relative accuracy ∈, where c 〉 0 is a constant; this matches a known necessary condition to within a constant factor. We establish a $$\frac{{c \in ^2 }}{{{\text{VCdim(}}\mathcal{F}{\text{)}}}}$$ sufficient condition for the realizable case, also matching a known necessary condition to within a constant factor. We provide a relatively simple proof of a bound of $$O(\frac{1}{{_ \in 2}}({\text{VCdim(}}\mathcal{F}{\text{)}}$$ + $$\log \frac{1}{\delta }))$$ on the sample complexity of agnostic learning in a fixed environment.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1007666507971

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

8

Electronic Resource

Tracking drifting concepts by minimizing disagreements (1994)

Helmbold, David P. ; Long, Philip M.

Springer

Machine learning 14 (1994), S. 27-45

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: Computational learning theory ; concept drift ; concept learning

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract In this paper we consider the problem of tracking a subset of a domain (called thetarget) which changes gradually over time. A single (unknown) probability distribution over the domain is used to generate random examples for the learning algorithm and measure the speed at which the target changes. Clearly, the more rapidly the target moves, the harder it is for the algorithm to maintain a good approximation of the target. Therefore we evaluate algorithms based on how much movement of the target can be tolerated between examples while predicting with accuracy ε. Furthermore, the complexity of the classH of possible targets, as measured byd, its VC-dimension, also effects the difficulty of tracking the target concept. We show that if the problem of minimizing the number of disagreements with a sample from among concepts in a classH can be approximated to within a factork, then there is a simple tracking algorithm forH which can achieve a probability ε of making a mistake if the target movement rate is at most a constant times ε2/(k(d +k) ln 1/ε), whered is the Vapnik-Chervonenkis dimension ofH. Also, we show that ifH is properly PAC-learnable, then there is an efficient (randomized) algorithm that with high probability approximately minimizes disagreements to within a factor of 7d + 1, yielding an efficient tracking algorithm forH which tolerates drift rates up to a constant times ε2/(d 2 ln 1/ε). In addition, we prove complementary results for the classes of halfspaces and axisaligned hyperrectangles showing that the maximum rate of drift that any algorithm (even with unlimited computational power) can tolerate is a constant times ε2/d.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00993161

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

9

Electronic Resource

On the Complexity of Function Learning (1995)

Auer, Peter ; Long, Philip M. ; Maass, Wolfgang ; [et al.]

Springer

Machine learning 18 (1995), S. 187-230

add to mindlist on the mindlist

Details

ISSN: 0885-6125

Keywords: computational learning theory ; on-line learning ; mistake-bounded learning ; function learning

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract The majority of results in computational learning theory are concerned with concept learning, i.e. with the special case of function learning for classes of functions with range {0, 1}. Much less is known about the theory of learning functions with a larger range such as $$\mathbb{N}$$ or $$\mathbb{R}$$ . In particular relatively few results exist about the general structure of common models for function learning, and there are only very few nontrivial function classes for which positive learning results have been exhibited in any of these models. We introduce in this paper the notion of a binary branching adversary tree for function learning, which allows us to give a somewhat surprising equivalent characterization of the optimal learning cost for learning a class of real-valued functions (in terms of a max-min definition which does not involve any “learning” model). Another general structural result of this paper relates the cost for learning a union of function classes to the learning costs for the individual function classes. Furthermore, we exhibit an efficient learning algorithm for learning convex piecewise linear functions from $$\mathbb{R}^d $$ into $$\mathbb{R}$$ . Previously, the class of linear functions from $$\mathbb{R}^d $$ into $$\mathbb{R}$$ was the only class of functions with multidimensional domain that was known to be learnable within the rigorous framework of a formal model for online learning. Finally we give a sufficient condition for an arbitrary class $$\mathcal{F}$$ of functions from $$\mathbb{R}$$ into $$\mathbb{R}$$ that allows us to learn the class of all functions that can be written as the pointwise maximum of k functions from $$\mathcal{F}$$ . This allows us to exhibit a number of further nontrivial classes of functions from $$\mathbb{R}$$ into $$\mathbb{R}$$ for which there exist efficient learning algorithms.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1022851430087

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

10

Unknown

Benign overfitting in linear regression (2020)

Bartlett, Peter L. ; Long, Philip M. ; Lugosi, Gábor ; [et al.]

National Academy of Sciences

In: PNAS - Proceedings of the National Academy of Sciences of the United States of America. 2020; 201907378. Published 2020 Apr 24. doi: 10.1073/pnas.1907378117. [early online release]

add to mindlist on the mindlist

Details

Publication Date: 2020-04-24

Description: The phenomenon of benign overfitting is one of the key mysteries uncovered by deep learning methodology: deep neural networks seem to predict well, even with a perfect fit to noisy training data. Motivated by this phenomenon, we consider when a perfect fit to training data in linear regression is compatible with accurate prediction. We give a characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy. The characterization is in terms of two notions of the effective rank of the data covariance. It shows that overparameterization is essential for benign overfitting in this setting: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size. By studying examples of data covariance properties that this characterization shows are required for benign overfitting, we find an important role for finite-dimensional data: the accuracy of the minimum norm interpolating prediction rule approaches the best possible accuracy for a much narrower range of properties of the data distribution when the data lie in an infinite-dimensional space vs. when the data lie in a finite-dimensional space with dimension that grows faster than the sample size.

Print ISSN: 0027-8424

Electronic ISSN: 1091-6490

Topics: Biology , Medicine , Natural Sciences in General

Published by National Academy of Sciences

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext