What is the real size of a sampled network? The case of the Internet

Fabien Viger, Alain Barrat, Luca Dall’Asta, Cun-Hui Zhang, and Eric D. Kolaczyk
Phys. Rev. E 75, 056111 – Published 17 May 2007

Abstract

Most data concerning the topology of complex networks are the result of mapping projects which bear intrinsic limitations and cannot give access to complete, unbiased datasets. A particularly interesting case is represented by the physical Internet. Router-level Internet mapping projects generally consist of sampling the network from a limited set of sources by using traceroute probes. This methodology, akin to the merging of spanning trees from the different sources to a set of destinations, leads necessarily to a partial, incomplete map of the Internet. The determination of the real Internet topology characteristics from such sampled maps is therefore, in part, a problem of statistical inference. In this paper we present a twofold contribution in order to address this problem. First, we argue that inference of some of the standard topological quantities is, in fact, a version of the so-called “species” problem in statistics, which is important in categorizing the problem and providing some indication of its inherent difficulties. Second, we tackle the issue of estimating arguably the most basic of network characteristics—its number of nodes—and propose two estimators for this quantity, based on subsampling principles. Numerical simulations, as well as an experiment based on probing the Internet, suggest the feasibility of accounting for measurement bias in reporting Internet topology characteristics.

  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Received 10 January 2007

DOI:https://doi.org/10.1103/PhysRevE.75.056111

©2007 American Physical Society

Authors & Affiliations

Fabien Viger1, Alain Barrat2,3,4, Luca Dall’Asta2,3, Cun-Hui Zhang5, and Eric D. Kolaczyk6

  • 1LIP6, UMR 7606 du CNRS, Université de Paris-6, 4 place Jussieu, 75005, Paris, France
  • 2LPT, UMR 8627 du CNRS, 91405 Orsay cedex, France
  • 3Université Paris-Sud, 91405 Orsay cedex, France
  • 4Complex Networks Lagrange Laboratory, ISI Foundation, Viale. S. Severo 65, 10133 Turin, Italy
  • 5Department of Statistics, Rutgers University, 504 Hill Center, Busch Campus, Piscataway, NJ 08854–8019 USA
  • 6Department of Mathematics and Statistics, Boston University, 111 Cummington Street, Boston, MA 02215 USA

Article Text (Subscription Required)

Click to Expand

References (Subscription Required)

Click to Expand
Issue

Vol. 75, Iss. 5 — May 2007

Reuse & Permissions
Access Options
Author publication services for translation and copyediting assistance advertisement

Authorization Required


×
×

Images

×

Sign up to receive regular email alerts from Physical Review E

Log In

Cancel
×

Search


Article Lookup

Paste a citation or DOI

Enter a citation
×