Emergence of Soft Communities from Geometric Preferential Attachment

Zuev, Konstantin; Boguñá, Marián; Bianconi, Ginestra; Krioukov, Dmitri

doi:10.1038/srep09421

Download PDF

Article
Open access
Published: 29 April 2015

Emergence of Soft Communities from Geometric Preferential Attachment

Konstantin Zuev¹,
Marián Boguñá²,
Ginestra Bianconi³ &
…
Dmitri Krioukov^1,4

Scientific Reports volume 5, Article number: 9421 (2015) Cite this article

4481 Accesses
66 Citations
5 Altmetric
Metrics details

Subjects

Abstract

All real networks are different, but many have some structural properties in common. There seems to be no consensus on what the most common properties are, but scale-free degree distributions, strong clustering and community structure are frequently mentioned without question. Surprisingly, there exists no simple generative mechanism explaining all the three properties at once in growing networks. Here we show how latent network geometry coupled with preferential attachment of nodes to this geometry fills this gap. We call this mechanism geometric preferential attachment (GPA) and validate it against the Internet. GPA gives rise to soft communities that provide a different perspective on the community structure in networks. The connections between GPA and cosmological models, including inflation, are also discussed.

The inherent community structure of hyperbolic networks

Article Open access 06 August 2021

p-adic numbers encode complex networks

Article Open access 08 January 2021

Maximally modular structure of growing hyperbolic networks

Article Open access 17 April 2023

Introduction

One of the fundamental problems in the study of complex networks^1,2,3,4,5 is to identify evolution mechanisms that shape the structure and dynamics of large real networks such as the Internet, the world wide web and various biological and social networks. In particular, how do complex networks grow so that many of them are scale-free and have strong clustering and non-trivial community structure? The preferential attachment (PA) mechanism^6,7,8, where new connections are made preferentially to more popular nodes, is widely accepted as the plausible explanation for the emergence of the scale-free structures (i.e. the power-law degree distributions) in large networks. PA has been empirically validated for many real growing networks^9,10,11,12 using statistical analysis of a sequence of network snapshots, demonstrating that it is indeed a key element of network evolution. Moreover, there is some evidence that the evolution of the community graph — a graph where nodes represent communities and links refer to members shared by two communities — is also driven by PA¹³.

Nevertheless, PA alone cannot explain two other empirically observed universal properties of complex networks: strong clustering¹⁴ and significant community structure¹⁵. Namely, in synthetic networks generated by standard PA, clustering is asymptotically zero¹⁶ and there are no communities¹⁷. To resolve the zero-clustering problem, several modifications of the original PA mechanism have been proposed^18,19,20,21. To the best of our knowledge, however, none of these models capture all three fundamental properties of complex networks: heavy-tail degree distribution, high clustering and community structure.

In social networks, the presence of communities, that might represent node clusters based on certain social factors such as economic status or political beliefs, is intuitively expected. A remarkable observation^{15,22,23,24,25,26} is that many other networks, including food webs, the world wide web, metabolic, biochemical and financial networks, also admit a reasonable division into informative communities. Since that discovery, community detection has become one of the main tools for the analysis and understanding of network data^17,27.

Despite an enormous amount of attention to community detection algorithms and their efficiency, there were very few attempts to answer a more fundamental question: what is the actual mechanism that induces community structure in real networks? For social networks, where there is a strong relationship between a high concentration of triangles and the existence of community structure²⁸, triadic closure²⁹ has been proposed as a plausible mechanism for generating communities³⁰. It was also shown by means of a simple agent-based acquaintance model that a large-scale community structure can emerge from the underlying social dynamics³¹. There also exist other contributions in this direction, where proposed mechanisms and generative models are specifically tailored for social networks^32,33,34,35.

Here we show how latent network geometry coupled with preferential attachment of nodes to this geometry induces community structure as well as power-law degree distributions and strong clustering. We prove that these universal properties of complex networks naturally emerge from the new mechanism that we call geometric preferential attachment (GPA), without appealing to the specific nature (e.g. social) of networks. Using the Internet as an example, we demonstrate that GPA generates networks that are in many ways similar to real networks.

Results

Geometric Preferential Attachment

In growing networks the concept of popularity that PA exploits is just one aspect of node attractiveness; another important aspect is similarity³⁶. Namely, if nodes are similar (“birds of feather”), then they have a higher chance of being connected (“flock together”), even if they are not popular. This effect, known as homophily in social sciences³⁷, has been observed in many real networks of various nature^38,39.

The GPA mechanism utilizes the idea that both popularity and similarity are important. We take the node birth time t = 1,2,... as a proxy for node's popularity: all other things being equal, the older the node (i.e. the smaller t), the more popular it is. The similarity attribute of node t is modeled by a random variable θ_t distributed over a circle that abstracts the “similarity” space. One can think of the similarity space as an image of a certain projection from a space of unknown or not easily measurable attributes of nodes. For social networks, these attributes could be political beliefs, education and social status, whereas for biological networks, {aⁱ} may represent chemical properties of metabolites or geometric properties of protein shapes. While the absolute value of the similarity coordinate does not have any specific meaning, the angular distance θ_st = π − | π − | θ_s − θ_t || quantifies the similarity between two nodes. Upon its birth, a new node t connects to an existing node s < t if s is both popular enough and similar to t, that is if s^βθ_st is small, where is a parameter that controls the relative contributions of popularity and similarity.

The described rule for establishing new connections admits a simple geometric interpretation which is very useful for analytical treatment of the model. Let us define the radial coordinate of node s at time s as r_s = 2 ln s and let it grow with time, so that at time t > s it is r_s(t) = βr_s + (1 − β)r_t. The distance x_st between two points in the hyperbolic plane of curvature K = −1 with polar coordinates (r_s(t),θ_s) and (r_t,θ_t) is approximately⁴⁰ . Since for any given t, the sets of nodes s < t that minimize s^βθ_st and x_st are the same, new nodes simply connect to the hyperbolically closest existing nodes. Note that the increase of the radial coordinate r_s(t) decreases the effective age of the node and thus models the effect of popularity fading observed in many real networks⁴¹.

But how do new nodes find their positions in this similarity space? The main assumption of our model is that the hidden attribute space of a growing network is likely to contain “hot” regions (e.g. of human activity) and that the hotter the region, the more attractive it is for new nodes. Hot regions can for instance represent some hot areas in science. When these regions are projected onto the similarity space , the hotness manifests itself by a higher node density, more scientists working in a hot area. The higher attractiveness of a hot region is then modeled by placing a new node in this region with the higher probability, the hotter this region is, i.e. the higher the node density in it. That is, new scientists are expected to begin their careers working in hot areas where many existing scientists are already active, versus jumping onto some obscure developments that nobody understands. Therefore the higher the node density in a particular section of our similarity space , the higher the probability that a new node is placed in this section. Intuitively we would expect that this process should lead to heterogeneous distributions of node coordinates in the similarity space. This intuition is confirmed by empirical results: if we map real networks to their hyperbolic spaces^42,43, we observe that the resulting empirical angular node density is not uniform (e.g. see Fig. 5(a)) and nodes tend to cluster into tight communities. In the Internet, for example, these communities are groups of Autonomous Systems belonging to the same country.

There are many ways to implement this general idea. For a variety of reasons we found that the most natural and consistent one is as follows. First we define the attractiveness of any location for a new node t with radial coordinate r_t as the number of existing nodes s < t lying in the hyperbolic disk D_ϕ (r_t) of radius r_t centered at (r_t,ϕ). The higher the attractiveness of a location ϕ, the higher the probability that a new node t will chose this location as its place θ_t = ϕ in the similarity space. We refer to this mechanism as the geometric preferential attachment (GPA) of nodes to the similarity space. This mechanism is illustrated in Fig. 1.

The exact definition of the GPA model is:

Initially the network is empty. New nodes t appear one at a time, t = 1,... and for each t:
The angular (similarity) coordinate θ_t of a new node t is determined as follows:
1. a
  Sample ϕ_i ~ U[0,2π], i = 1,...,t, uniformly at random. The set of points in the hyperbolic plane are the “candidate” positions for the newborn node;
2. b
  Define the attractiveness A_t(ϕ_i) of the i^th candidate as the number of existing nodes that lie within hyperbolic distance r_t from it;
3. c
  Set θ_t = ϕ_i with probability
  where Λ ≥ 0 is a parameter, called the initial attractiveness.
The radial (popularity) coordinate of node t is set to r_t = 2 ln t. The radial coordinates of existing nodes s < t are updated to r_s(t) = βr_s + (1 − β)r_t.
Node t connects to m hyperbolically closest existing nodes (if t ≤ m, then node t connects to all existing nodes).

The GPA model has thus three parameters: the number of links m established by every new node, the speed of popularity fading β and the initial attractiveness Λ. A moment's thought shows that m controls the average degree of the network, . We prove in Methods that the model generates scale-free networks and β controls the power-law exponent γ. The initial attractiveness Λ controls the heterogeneity of the angular node density, namely, the heterogeneity is a decreasing function of Λ. When Λ → ∞, the GPA model becomes manifestly identical to the homogeneous popularity × similarity (PS) model³⁶, where the angular coordinate θ_t of a new node t is sampled uniformly at random on [0, 2π]. Note, however, that in GPA, choosing a position in the similarity space is an active decision made by a node based on the attractiveness of different locations, as opposed to “passive” uniform randomness in PS. In standard PA, the initial attractiveness term is used to control the exponent of the power-law degree distribution^7,8. In what follows we show that in GPA, Λ controls certain properties of the community size distribution.

Figure 2 shows the simulation results for networks of size n = 10³ generated by the GPA model with m = 3 (i.e. each new node connects to the three hyperbolically closest nodes), β = 2/3 and different values of Λ. As expected, the smaller the value of Λ, the more heterogeneous the distribution of angular coordinates. To quantify the difference between the empirical distribution of the angular coordinates and the uniform distribution on [0, 2π], we use the Kolmogorov-Smirnov (KS) statistic, one of the standard distances that measures the difference between two probability distributions. Recall that the KS statistic ρ is defined as the maximum difference between the values of the empirical distribution of the sample θ₁,...,θ_n and the uniform distribution F_U[0,2π](θ) = θ/2π,

The KS statistic as a function of Λ is shown in the bottom panel of Fig. 2. As expected, ρ(Λ) is a decreasing function of Λ.

Degree Distribution

For each of the three networks depicted in Fig. 2, the statistical procedure for quantifying power-law behavior in empirical data proposed in Ref. 44 accepts the hypothesis that the network is scale-free. It estimates the lower cutoff for the scaling region as k_min = 3, which is consistent with the minimum degree in the networks m = 3. Figure 3(a) shows a doubly logarithmic plot of the empirical degree distributions P(k) ~ k^−γ along with the fitted power-law with exponent γ = 2.5.

These empirical results show that the degree distribution of a network generated by GPA appears to be a power-law. Moreover, quite unexpectedly, the power-law exponent γ remains similar for different values of Λ. These results can be proved analytically (see Methods for details). Remarkably, for any value of Λ, the GPA model produces scale-free networks with the power-law degree distribution identical to the degree distribution in networks growing according to PA and having power-law exponent γ = 1 + 1/β.

Clustering Coefficient

The concept of clustering⁴⁵ quantifies the tendency to form cliques (complete subgraphs) in the neighborhood of a given node. Specifically, the local clustering coefficient of node s is defined as the probability that two nodes s′ and s″, adjacent to s, are also connected to each other. Figure 3(b) shows the average value of the clustering coefficient for nodes of degree k as a function of k for the three networks in Fig. 2. Interestingly, clustering does not depend on Λ either (a proof is in the Methods) and scales approximately as k⁻¹. This means that, on average, the nodes with higher degree have lower clustering, which is consistent with empirical observations of clustering in real complex networks^11,46. For all the three PGA networks, the mean clustering (the average of the local clustering coefficients) is high, .

Soft Communities

The hyperbolic space underlying a network and the GPA mechanism of node appearance in that space naturally induce community structure and allow to detect communities in a very intuitive and simple way. A higher density of links within a community indicates that its nodes are more similar to each other than to the other nodes, because links connect only nodes located within a certain similarity distance threshold. All such densely linked nodes are thus close to each other in some area of the similarity space, meaning that the spatial node density is high in this area. Therefore a community becomes a cluster of spatially close nodes and the community structure is encoded in a non-uniform distribution of angular (similarity) coordinates of nodes.

Following the approach in Ref. 47, let us consider the angular gaps θ between consecutive nodes and define a soft community as a group of nodes separated from the rest of the network by two gaps that exceed a certain critical value θ_c. If a network has a total number of n nodes, then the critical gap θ_c is defined as the expected value of the largest gap θ_(n) = max{θ₁,...,θ_n}, where θ₁,...,θ_n are distributed uniformly at random on [0, 2π]. The rationale behind this definition is that if nodes are distributed uniformly in the similarity space and there are no communities, then we do not expect to find any pair of nodes separated by a gap larger than this θ_c. The calculations in the Methods show that the critical gap is approximately

Figure 4 shows the statistics of the rescaled gaps θ/θ_c for three GPA-generated networks of size n = 10⁴ with Λ = 0.1,1 and 10. In the top panel, we can see the organization of nodes on the circle with many consecutive small gaps (θ_i < θ_c) indicating groups of similar nodes (communities) separated by large gaps (θ_i > θ_c) which constitute boundaries between communities, so-called “fault lines”⁹. As expected, smaller values of Λ result into more heterogeneous distribution of gaps with strong long range correlations. This effect is clearly visible in the bottom panel, where the sample autocorrelation function is shown: the smaller the Λ, the slower the autocorrelation decays.

Having a geometric interpretation of the community structure, it is now easy to quantity how well communities are separated from each other. For each community , we define its separation from the rest of the network as the rescaled average of two gaps θ₁,θ₂ > θ_c that separate from its neighboring communities,

The mean community separation, i.e. the expected separation of a community that a randomly chosen node belongs to, can then be computed as follows:

where n_i is the size of community and n_c is total number of communities. The network metric can also be viewed as a measure of narrowness (or specialization) of communities. For example, in scientific collaboration network, where nodes represent scientists and communities correspond to groups with similar research interests, quantifies the degree of interdisciplinarity in the network. When is large, the boundaries between communities are sharp and each community focuses on its narrow, specific topic. On the other hand, if is close to one, then the boundaries are blur, communities are wide spread and the network is highly interdisciplinary.

The difference in the stochastic behavior of the rescaled gaps in Fig. 4 suggests that the initial attractiveness Λ controls the mean community separation in the GPA-generated networks. This is confirmed by simulation results shown in Fig. 5(c), where is shown as a function of Λ. As expected, is a monotonically decreasing function, approaching one when Λ is large.

The Internet

To demonstrate the ability of the GPA mechanism to generate graphs that are similar to real networks, and, in particular, to reproduce real non-uniform distributions of similarity node coordinates, we consider the Autonomous Systems (AS) Internet topology⁴⁸ of December 2009. The network consists of N = 25910 nodes, ASs and M = 63435 links that represent logical relationships between ASs. We embed the AS Internet into its hyperbolic space, i.e compute the popularity and similarity coordinates {r_i,θ_i}, using HyperMap⁴³, an efficient network mapping algorithm that estimates the latent hyperbolic coordinates of nodes. The network topology has a power-law degree distribution with γ = 2.1 and average node degree . This automatically determines two out of three parameters of the GPA model: and β = 1/(γ − 1). In Methods, we explain how to infer the value of Λ from network data using the maximum likelihood method. Here we consider the snapshot of the AS Internet based on the first n = 10³ nodes. The corresponding estimated value of the initial attractiveness is Λ_Int = 0.7.

Figure 5(a) shows the histogram of the angular node density for the AS Internet snapshot. We note that it is far from uniform, which is a direct indication of the presence of soft communities. We quantify the degree of heterogeneity of the angular density by the KS distance from the uniform distribution (2) and juxtapose it against the KS distances computed for networks generated by the GPA model with Λ = 0.7 (Fig. 5(b)). The Internet value lies within the 25th and 75th percentiles of the synthetic values, which shows that the degrees of non-uniformity in the Internet and GPA networks are comparable. Fig. 5(c) compares the real network with its synthetic counterpart in terms of the expected mean community separation (5). The GPA mechanism generates networks with that match the Internet value very well. In Fig. 5(d), we compare the community size distributions in the Internet snapshot and prediction given by the GPA model. Whereas for the Internet and GPA networks are essentially identical, the KS statistics and community size distributions are similar, but the match is not perfect. This effect is explained by the systematic bias present in the inferred values of the angular coordinates {θ_i}. Indeed, the HyperMap method first assumes that all angular coordinates are uniformly distributed over the similarity space , i.e. Λ = ∞ and then perturbs them to maximize a certain likelihood function. This “smoothes” the inferred angular node density and makes it more homogeneous than the true distribution. Nevertheless, although the inferred value of Λ is only an approximation for the true value, the GPA model still captures well the degree of heterogeneity in the real network.

Finally we note that GPA defined in Eq. (1) admits an interesting interpretation that suggests a model extension that may be useful for real network analysis. The probability of a new node born at time t to chose the angular position ϕ_i can be written as

where

Therefore the event of choosing a position on the circle can be understood as follows. With probability p_f the new node is a follower and chooses its position according to pure GPA (Λ = 0). With the remaining probability 1 − p_f the new node chooses its position uniformly at random among the t available positions. We note that Λ controls p_f, since ⟨A_t⟩ ≈ 1. When Λ is constant, p_f is also constant and consequently there is always a fraction of nodes that are placed at random locations. At long times, these random nodes diminish the effect of pure GPA and eventually the angular distribution of nodes become indistinguishable from a Poisson point process on the circle. We can then wonder whether a constant value of Λ is a realistic assumption for dealing with real networks. In scientific citation networks, for example, when a new field of science is being formed and not much work has yet been done in it, scientists may decide either to explore a new line of research within the field, or to follow one of the mainstream existing lines. The former case can be modeled by a random choice of the angular position, assuming that subfields are homogeneously distributed. The latter is modeled by the pure GPA term in Eq. (6). However, there is a payoff that does not remain constant during the evolution of the field. At early times, the chances to find an interesting result that would be highly cited and followed by others are very high. At late times, the topic space is crowded and the chances to find something fundamentally new are very slim. Therefore, there is a higher incentive for scientists to take higher risks at early times. This can be modeled by p_f increasing with time, converging to a value close to 1 as time grows to infinity. In turn, this means that Λ is a decreasing function of time, having a large value at the beginning of network evolution and decreasing to small values afterwards.

Unfortunately, measuring the temporal evolution of Λ in a real network is not yet possible because there currently exists no parametric theory describing such evolution that could be used for statistical inference of Λ. However, it is fairly easy to find an approximate value of Λ as a function of time as follows. If timestamps of a real complex network are available, we can pretend that Λ is constant and infer its value using the MLE techniques described in the Methods for subgraphs made of nodes that were born before a given time t, . This value can be thought of as a (possibly weighted) average of Λ(t) in time window (0,t). By increasing the value of t, we can detect whether Λ is constant (if does not change with time, beyond statistical fluctuations), or a decreasing function of time. Figure 5(e) shows for the AS Internet where the strong temporal dependence of Λ is evident.

Discussion

In summary, hyperbolic network geometry, combining popularity and similarity forces driving network evolution and coupled with preferential attachment of nodes to this geometry (GPA), naturally yields scale-free, strongly clustered growing networks with emergent soft community structure. The GPA model has three parameters that can be readily inferred from network data. Using the AS Internet topology as example, we have seen that the GPA mechanism generates heterogeneous networks that are similar to real networks with respect to key properties, including key aspects of the community size distribution and separation. The mean community separation, a new metric that quantifies the narrowness of communities in a network, is controlled in GPA by initial attractiveness Λ, which controls the power-law exponent in standard PA.

In the context of the asymptotic equivalence between de Sitter causal sets and popularity × similarity (PS) hyperbolic networks established in Ref. 49, we note that Λ is conceptually similar to the cosmological constant Λ in Einstein's equations in general relativity (GR), where it is also an additive term in the proportionality between the energy-momentum tensor and spacetime curvature. Causal sets^50,51 are random graphs obtained by Poisson sprinkling a collection of nodes onto (a patch) of a Lorentzian manifold; edges in these graphs connect all timelike-separated pairs of nodes. If there is no matter (empty spacetime) but there is only dark energy (positive Λ), then the solution of Einstein's equations is the de Sitter spacetime and the main theorem in Ref. 49 states that the ensemble of PS graphs is asymptotically (n → ∞) identical to an ensemble of causal sets sprinkled onto de Sitter spacetime, which is one of the three maximally symmetric, homogeneous and isotropic Lorentzian manifolds (the other two are Minkowski and anti-de Sitter spacetimes). In this context, the GPA model considered here is a model with cosmological constant Λ and matter. Modeled by high node density, this matter, as in GR, “attracts more matter”, thus increasing the spacetime curvature of which the node density is a proxy. Indeed the main feature of the model is that the higher the node density in a particular region of space, the more nodes will appear in this region later. The main difference with GR is that here we essentially have an analogy with only the 00-component of Einstein's equations. One can envision that other components should describe the coupled dynamics of the similarity space and nodes in it. In case of scientific collaboration network, for example, that would be the co-evolution of science (space) itself and interests of scientists (node dynamics in this space). In the model considered here nodes do not move. Finding the laws of their spatial dynamics that may further strengthen the analogy with general relativity is a promising but challenging research direction.

In that context, the decay of initial attractiveness Λ that we found in the Internet must be analogous to the decay of cosmological constant Λ in modern cosmological theories. Cosmic inflation^52,53 is widely accepted as the most plausible resolution of many problems with the classical big bang theory, including the flatness problem, the horizon problem and the magnetic-monopole problem. Inflation is an initial period of accelerated expansion of the universe during which gravity was repulsive. Inflation does not last long and can be modeled as a time dependent cosmological “constant” Λ that initially has a high value and then decays to zero. The analogies between GPA with decaying Λ and inflation go even further, producing similar outcomes as far as the spatial distribution of events is concerned. Indeed, cosmic inflation has the effect of smoothing out inhomogeneities so that once inflation is over, the universe is nearly flat, isotropic and homogeneous, except for quantum fluctuations of the inflaton field. These fluctuations are the seeds of future inhomogeneities that we observe in the universe at scales smaller than 100Mpc. In the GPA context, a high value of Λ has also a homogenizing effect. Indeed, if Λ is large, then p_f is small and new nodes chose their angular positions at random, producing a Poisson point process on the circle. Once Λ is small enough, we are left with a random distribution of points with Poisson fluctuations that, as in the universe, are the seeds of future communities in the network (galaxies in the universe), because once Λ is nearly zero, these initial fluctuations are reinforced by pure preferential attachment.

Methods

Invariance of the degree distribution and clustering

Here we prove that the degree distribution and clustering coefficient in the networks generated by the GPA model do not depend of the initial attractiveness Λ. Moreover, the degree distribution is power-law with exponent γ = 1 + 1/β. The proof can be reduced to the proof for the homogeneous PS model³⁶ (Supplementary Information, Section IV). Consider a new node t and let R_t be the radius of a hyperbolic disk centered at this node such that t is connected to all nodes s < t that lie in this disc. Then the probability that nodes t and s < t in the GPA model are connected can be computed as follows:

where is the hyperbolic distance between nodes s = (r_s(t),θ_s) and t = (r_t,θ_t) at time t. Using the total probability theorem,

where are the candidate positions generated at Step 1(a) and _t(i) are the corresponding acceptance probabilities (1). Applying the total probability theorem with respect to node s, we have:

Since the angular coordinates of the candidate positions and are uniformly distributed on [0, 2π], the probability is simply α/π. Therefore,

where the last equality holds because . We note that does not depend on Λ and that it is exactly the same as the probability of having a link between nodes t and s < t in the homogeneous PS model. The rest of the proof repeats the proof in Ref. 36 without a change. This leads to

which means that the resulting degree distribution in GPA is identical to PA: it is the power-law with exponent γ = 1 + 1/β. Since the connection probability does not depend on Λ, neither does clustering.

Critical gap

To obtain a closed-form expression for the critical gap, we note that for large n, the sequence θ₁,...,θ_n ~ U[0,2π] can be approximately viewed as a realization of the Poisson point process on the circle of unit radius with density λ = n/2π. In this case, the distribution of the angular gaps is approximately exponential with rate λ. The maximum gap θ_(n) has then the following PDF and its expected value can be calculated as follows:

where H_n is the n^th harmonic number and γ is Euler's constant.

Inference of Λ

The initial attractiveness Λ controls the distribution of angular coordinates θ₁,...,θ_n of the nodes. We therefore first infer θ_i using the HyperMap method⁴³. Given the network embedding into its hyperbolic space, the likelihood function can be written as follows:

where A_t(ϕ) is the attractiveness of location , that is the number of existing nodes at time (t − 1) that lie within distance r_t from (r_t,ϕ). The log-likelihood is then (up to an additive constant):

The multiple integrals in (15) cannot be calculated analytically, since the attractiveness function cannot be written in closed-form. Nevertheless, the log-likelihood can be efficiently estimated be the Monte Carlo method. First, generate N Monte Carlo samples, , j = 1,...,N. The “truncated” samples will be used for estimating the (t − 1)-dimensional integral in (15). Next, precompute all needed attractivenesses, , where t = 2,...n and i = 1,...,t − 1. Then for each value of Λ, the log-likelihood can be estimated as follows (up to a constant):

Computing attractivenesses of the Monte Carlo samples involves computing O(n³N) hyperbolic distances, which is the most computationally intensive part of the algorithm. Having all attractivenesses computed, we can then estimate l(Λ) for any and find the maximum likelihood estimate (MLE) . An important observation that drastically improves the efficiency of the algorithm is that we do not have to use the entire network to accurately estimate , the first nodes are often enough. Table 1 shows the MLEs obtained from the first n₀ = 100,200,500 and 1000 nodes of the networks generated by the GPA model with Λ = 0,0.2,0.5,0.7,1 and 2. The corresponding log-likelihood functions are shown in Fig. 6. These simulation results show that the smaller the true value of Λ — and we expect it to be small in real networks since most of them have community structure — the less network data we need to pin down. If, for example, Λ = 0, then the MLE of Λ based on the first n₀ = 100 nodes is already zero. The larger the true value of Λ, however, the flatter the log-likelihood is around its maximum, which makes inference more challenging.

Table 1 Maximum likelihood estimates. True values of the initial attractiveness parameter Λ and its MLEs based on the first n₀ = 100,200,500 and 1000 nodes. In all simulations, N = 100 Monte Carlo samples were used in (16)

References

Dorogovtsev, S. N. & Mendes, J. F. F. Evolution of Networks (Oxford University Press, Oxford, 2003).
Newman, M. E. J., Barabási, A.-L. & Watts, D. J. The Structure and Dynamics of Networks (Princeton University Press, Princeton, 2006).
Dorogovtsev, S. N. Lectures on Complex Networks (Oxford University Press, Oxford, 2010).
Newman, M. E. J. Networks: An Introduction (Oxford University Press, Oxford, 2010).
Easley, D. & Kleinberg, J. Networks, Crowds and Markets: Reasoning about a Highly Connected World (Cambridge University Press, Cambridge, 2010).
Barabási, A.-L. & Albert, R. Emergence of scaling in random networks. Science 286, 509–512 (1999).
Article ADS MathSciNet Google Scholar
Krapivsky, P. L., Redner, S. & Leyvraz, F. Connectivity of growing random networks. Phys. Rev. Lett. 85, 4629–4632 (2000).
Article CAS ADS Google Scholar
Dorogovtsev, S. N., Mendes, J. F. F. & Samukhin, A. N. Structure of growing networks with preferential linking. Phys. Rev. Lett. 85, 4633–4636 (2000).
Article CAS ADS Google Scholar
Newman, M. E. J. Clustering and preferential attachment in growing networks. Phys. Rev. E 64, 025102 (2001).
Article CAS ADS Google Scholar
Barabási, A.-L. et al. Evolution of the social network of scientific collaborations. Physica A 311, 590–614 (2002).
Article ADS MathSciNet Google Scholar
Vázquez, A., Pastor-Satorras, R. & Vespignani, A. Large-scale topological and dynamical properties of the Internet. Phys. Rev. E 65, 066130 (2002).
Article ADS Google Scholar
Jeong, H., Néda, Z. & Barabási, A.-L. Measuring preferential attachment in evolving networks. Europhys. Lett. 61, 567–572 (2003).
Article CAS ADS Google Scholar
Pollner, P., Palla, G. & Vicsek, T. Preferential attachment of communities: the same principle, but a higher level. Europhys. Lett. 73, 478–484 (2006).
Article CAS ADS MathSciNet Google Scholar
Watts, D. J. & Strogatz, S. H. Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998).
Article CAS ADS Google Scholar
Girvan, M. & Newman, M. E. J. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99, 7821–7826 (2002).
Article CAS ADS MathSciNet Google Scholar
Bollobás, B. & Riordan, O. M. Mathematical results on scale-free random graphs. In: Bornholdt S., & Schuster H. G., eds. (ed.) Handbook of Graphs and Networks, 1–34 (Wiley-VCH, Berlin, 2003).
Fortunato, S. Community detection in graphs. Phys. Rep. 486, 75–174 (2010).
Article ADS MathSciNet Google Scholar
Dorogovtsev, S. N., Mendes, J. F. F. & Samukhin, A. N. Generic scale of the “scale-free” growing networks. Phys. Rev. E 63, 062101 (2001).
Article CAS ADS Google Scholar
Klemm, K. & Eguíluz, V. Highly clustered scale-free networks. Phys. Rev. E 65, 036123 (2002).
Article ADS Google Scholar
Vázquez, A. Growing network with local rules: preferential attachment, clustering hierarchy and degree correlations. Phys. Rev. E 67, 056104 (2003).
Article ADS Google Scholar
Jackson, M. O. & Rogers, B. W. Meeting strangers and friends of friends: How random are social networks?'. Am. Econ. Rev. 97(3), 890–915 (2007).
Article Google Scholar
Eckmann, J.-P. & Moses, E. Curvature of co-links uncovers hidden thematic layers in the World Wide Web. Proc. Natl. Acad. Sci. USA 99, 5825–5829 (2002).
Article CAS ADS MathSciNet Google Scholar
Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N. & Barabási, A.-L. Hierarchical organization of modularity in metabolic networks. Science 297, 1551–1555 (2002).
Article CAS ADS Google Scholar
Holme, P., Huss, M. & Jeong, H. Subnetwork hierarchies of biochemical pathways. Bioinformatics 19, 532–538 (2003).
Article CAS Google Scholar
Boss, M., Elsinger, H., Summer, M. & Thurner, S. Network topology of the interbank market. Quantitative Finance 4, 677–684 (2004).
Article Google Scholar
Guimerà, R. & Amaral, L. A. N. Functional cartography of complex metabolic networks. Nature 433, 895–900 (2005).
Article ADS Google Scholar
Danon, L., Díaz-Guilera, A., Duch, J. & Arenas, A. Comparing community structure identification. J. Stat. Mech., P09008 (2005).
Newman, M. E. J. & Park, J. Why social networks are different from other types of networks. Phys. Rev. E 68, 036122 (2003).
Article CAS ADS Google Scholar
Rapoport, A. Spread of information through a population with socio-structural bias: I. Assumption of transitivity. Bull. Math. Biol. 15, 523–533 (1953).
MathSciNet Google Scholar
Bianconi, G., Darst, R. K., Iacovacci, J. & Fortunato, S. Triadic closure as a basic generating mechanism of communities in complex networks. Phys. Rev. E 90, 042806 (2014).
Article ADS Google Scholar
Bhat, U., Krapivsky, P. L. & Redner, S. Emergence of clustering in an acquaintance model without homophily. J. Stat. Mech. P11035 (2014).
Boguñá, M., Pastor-Satorras, R., Díaz-Guilera, A. & Arenas, A. Models of social networks based on social distance attachment. Phys. Rev. E 70, 056122 (2004).
Article ADS Google Scholar
Toivonen, R., Onnela, J.-P., Saramäki, J., Hyvönen, J. & Kaski, K. A model for social networks. Physica A 371, 851860 (2006).
Article Google Scholar
Lambiotte, R. & Ausloos, M. Coexistence of opposite opinions in a network with communities. J. Stat. Mech. P08026 (2007).
Lambiotte, R., Ausloos, M. & Holyst, J. A. Majority model on a network with communities. Phys. Rev. E 75, 030101(R) (2007).
Article ADS Google Scholar
Papadopoulos, F., Kitsak, M., Serrano, M. ., Boguñá, M. & Krioukov, D. Popularity versus similarity in growing networks. Nature 489, 537–540 (2012).
Article CAS ADS Google Scholar
McPherson, M., Smith-Lovin, L. & Cook, J. M. Birds of a feather: homophily in social networks. Annu. Rev. Sociol. 27, 415–444 (2001).
Article Google Scholar
Redner, S. How popular is your paper? An empirical study of the citation distribution. Eur. Phys. J. B 4, 131–134 (1998).
Article CAS ADS Google Scholar
Watts, D. J., Dodds, P. S. & Newman, M. E. J. Identity and search in social networks. Science 296, 1302–1305 (2002).
Article CAS ADS Google Scholar
Bonahon, F. Low-dimensional geometry (AMS, Providence, 2009).
van Raan, A. F. J. On growth, ageing and fractal differentiation of science. Scientometrics 47, 347–362 (2000).
Article Google Scholar
Boguñá, M., Papadopoulos, F. & Krioukov, D. Sustaning the Internet with hyperbolic mapping. Nature Communications 1, 62 (2010).
Article ADS Google Scholar
Papadopoulos, F., Psomas, C. & Krioukov, D. Network mapping by replaying hyperbolic growth IEEE/ACM Transactions on Networking, to appear.
Clauset, A., Shalizi, C. R. & Newman, M. E. J. Power-law distributions in empirical data. SIAM Review 51(4), 661–703 (2009).
Article ADS MathSciNet Google Scholar
Watts, D. J. & Strogatz, S. H. Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998).
Article CAS ADS Google Scholar
Ravasz, E. & Barabási, A.-L. Hierarchical organization in complex networks. Phys. Rev. E 67, 026112 (2003).
Article ADS Google Scholar
Serrano, M. A., Boguñá, M. & Sagués, F. Uncovering the hidden geometry behind metabolic networks. Molecular BioSystems 8, 843–850 (2012).
Article CAS Google Scholar
Claffy, K., Hyun, Y., Keys, K., Fomenkov, M. & Krioukov, D. Internet mapping: from art to science. IEEE DHS CATCH, 205–211 (2009).
Krioukov, D., Kitsak, M., Sinkovits, R. S., Rideout, D., Meyer, D. & Boguñá, M. Network cosmology. Sci. Rep. 2, 793 (2012).
Article ADS Google Scholar
Bombelli, L., Lee, J., Meyer, D. & Sorkin, R. D. Space-time as a causal set. Phys. Rev. Lett. 59, 521 (1987).
Article CAS ADS MathSciNet Google Scholar
Dowker, F. Introduction to causal sets and their phenomenology. Gen. Relativ. Gravit. 45, 1651–1667 (2013).
Article ADS MathSciNet Google Scholar
Lyth, D. H. & Liddle, A. R. The Primordial Density Perturbations: Cosmology, Inflation and the Origin of Structure (Cambridge University Press, Cambridge, 2009).
Mukhanov, V. Physical Foundations of Cosmology (Cambridge University Press, Cambridge, 2005).

Download references

Acknowledgements

This work was supported by DARPA grant No. HR0011-12-1-0012; NSF grants No. CNS-1344289, CNS-1442999, CNS-0964236, CNS-1441828, CNS-1039646 and CNS-1345286; by Cisco Systems; by the James S. McDonnell Foundation Scholar Award in Complex Systems; by the Icrea Foundation, funded by the Generalitat de Catalunya; by the MINECO project No. FIS2013-47282-C2-1-P; and by the Generalitat de Catalunya grant No. 2014SGR608.

Author information

Authors and Affiliations

Department of Physics, Northeastern University, Boston, MA, 02115, USA
Konstantin Zuev & Dmitri Krioukov
Departament de Física Fonamental, Universitat de Barcelona, Martí i Franquís 1, Barcelona, 08028, Spain
Marián Boguñá
School of Mathematics, Queen Mary University of London, London, E1 4SN, UK
Ginestra Bianconi
Department of Mathematics and Department of Electrical & Computer Engineering, Northeastern University, Boston, MA, 02115, USA
Dmitri Krioukov

Authors

Konstantin Zuev
View author publications
You can also search for this author in PubMed Google Scholar
Marián Boguñá
View author publications
You can also search for this author in PubMed Google Scholar
Ginestra Bianconi
View author publications
You can also search for this author in PubMed Google Scholar
Dmitri Krioukov
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

K.Z., D.K., M.B. and G.B. designed research; K.Z., D.K. and M.B. performed research, analyzed data, performed simulations and wrote the manuscript; all authors discussed the results and reviewed the manuscript.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder in order to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Zuev, K., Boguñá, M., Bianconi, G. et al. Emergence of Soft Communities from Geometric Preferential Attachment. Sci Rep 5, 9421 (2015). https://doi.org/10.1038/srep09421

Download citation

Received: 03 February 2015
Accepted: 02 March 2015
Published: 29 April 2015
DOI: https://doi.org/10.1038/srep09421

This article is cited by

Model-independent embedding of directed networks into Euclidean and hyperbolic spaces
- Bianka Kovács
- Gergely Palla
Communications Physics (2023)
Maximally modular structure of growing hyperbolic networks
- Sámuel G. Balogh
- Bianka Kovács
- Gergely Palla
Communications Physics (2023)
The D-Mercator method for the multidimensional hyperbolic embedding of real networks
- Robert Jankowski
- Antoine Allard
- M. Ángeles Serrano
Nature Communications (2023)
Greedy routing optimisation in hyperbolic networks
- Bendegúz Sulyok
- Gergely Palla
Scientific Reports (2023)
An anomalous topological phase transition in spatial random graphs
- Jasper van der Kolk
- M. Ángeles Serrano
- Marián Boguñá
Communications Physics (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.