Introduction

One of the fundamental problems in the study of complex networks1,2,3,4,5 is to identify evolution mechanisms that shape the structure and dynamics of large real networks such as the Internet, the world wide web and various biological and social networks. In particular, how do complex networks grow so that many of them are scale-free and have strong clustering and non-trivial community structure? The preferential attachment (PA) mechanism6,7,8, where new connections are made preferentially to more popular nodes, is widely accepted as the plausible explanation for the emergence of the scale-free structures (i.e. the power-law degree distributions) in large networks. PA has been empirically validated for many real growing networks9,10,11,12 using statistical analysis of a sequence of network snapshots, demonstrating that it is indeed a key element of network evolution. Moreover, there is some evidence that the evolution of the community graph — a graph where nodes represent communities and links refer to members shared by two communities — is also driven by PA13.

Nevertheless, PA alone cannot explain two other empirically observed universal properties of complex networks: strong clustering14 and significant community structure15. Namely, in synthetic networks generated by standard PA, clustering is asymptotically zero16 and there are no communities17. To resolve the zero-clustering problem, several modifications of the original PA mechanism have been proposed18,19,20,21. To the best of our knowledge, however, none of these models capture all three fundamental properties of complex networks: heavy-tail degree distribution, high clustering and community structure.

In social networks, the presence of communities, that might represent node clusters based on certain social factors such as economic status or political beliefs, is intuitively expected. A remarkable observation15,22,23,24,25,26 is that many other networks, including food webs, the world wide web, metabolic, biochemical and financial networks, also admit a reasonable division into informative communities. Since that discovery, community detection has become one of the main tools for the analysis and understanding of network data17,27.

Despite an enormous amount of attention to community detection algorithms and their efficiency, there were very few attempts to answer a more fundamental question: what is the actual mechanism that induces community structure in real networks? For social networks, where there is a strong relationship between a high concentration of triangles and the existence of community structure28, triadic closure29 has been proposed as a plausible mechanism for generating communities30. It was also shown by means of a simple agent-based acquaintance model that a large-scale community structure can emerge from the underlying social dynamics31. There also exist other contributions in this direction, where proposed mechanisms and generative models are specifically tailored for social networks32,33,34,35.

Here we show how latent network geometry coupled with preferential attachment of nodes to this geometry induces community structure as well as power-law degree distributions and strong clustering. We prove that these universal properties of complex networks naturally emerge from the new mechanism that we call geometric preferential attachment (GPA), without appealing to the specific nature (e.g. social) of networks. Using the Internet as an example, we demonstrate that GPA generates networks that are in many ways similar to real networks.

Results

Geometric Preferential Attachment

In growing networks the concept of popularity that PA exploits is just one aspect of node attractiveness; another important aspect is similarity36. Namely, if nodes are similar (“birds of feather”), then they have a higher chance of being connected (“flock together”), even if they are not popular. This effect, known as homophily in social sciences37, has been observed in many real networks of various nature38,39.

The GPA mechanism utilizes the idea that both popularity and similarity are important. We take the node birth time t = 1,2,... as a proxy for node's popularity: all other things being equal, the older the node (i.e. the smaller t), the more popular it is. The similarity attribute of node t is modeled by a random variable θt distributed over a circle that abstracts the “similarity” space. One can think of the similarity space as an image of a certain projection from a space of unknown or not easily measurable attributes of nodes. For social networks, these attributes could be political beliefs, education and social status, whereas for biological networks, {ai} may represent chemical properties of metabolites or geometric properties of protein shapes. While the absolute value of the similarity coordinate does not have any specific meaning, the angular distance θst = π − | π − | θs − θt || quantifies the similarity between two nodes. Upon its birth, a new node t connects to an existing node s < t if s is both popular enough and similar to t, that is if sβθst is small, where is a parameter that controls the relative contributions of popularity and similarity.

The described rule for establishing new connections admits a simple geometric interpretation which is very useful for analytical treatment of the model. Let us define the radial coordinate of node s at time s as rs = 2 ln s and let it grow with time, so that at time t > s it is rs(t) = βrs + (1 − β)rt. The distance xst between two points in the hyperbolic plane of curvature K = −1 with polar coordinates (rs(t),θs) and (rtt) is approximately40 . Since for any given t, the sets of nodes s < t that minimize sβθst and xst are the same, new nodes simply connect to the hyperbolically closest existing nodes. Note that the increase of the radial coordinate rs(t) decreases the effective age of the node and thus models the effect of popularity fading observed in many real networks41.

But how do new nodes find their positions in this similarity space? The main assumption of our model is that the hidden attribute space of a growing network is likely to contain “hot” regions (e.g. of human activity) and that the hotter the region, the more attractive it is for new nodes. Hot regions can for instance represent some hot areas in science. When these regions are projected onto the similarity space , the hotness manifests itself by a higher node density, more scientists working in a hot area. The higher attractiveness of a hot region is then modeled by placing a new node in this region with the higher probability, the hotter this region is, i.e. the higher the node density in it. That is, new scientists are expected to begin their careers working in hot areas where many existing scientists are already active, versus jumping onto some obscure developments that nobody understands. Therefore the higher the node density in a particular section of our similarity space , the higher the probability that a new node is placed in this section. Intuitively we would expect that this process should lead to heterogeneous distributions of node coordinates in the similarity space. This intuition is confirmed by empirical results: if we map real networks to their hyperbolic spaces42,43, we observe that the resulting empirical angular node density is not uniform (e.g. see Fig. 5(a)) and nodes tend to cluster into tight communities. In the Internet, for example, these communities are groups of Autonomous Systems belonging to the same country.

Figure 5
figure 5

AS Internet vs GPA networks.

Panel (a) shows the histogram of the angular (similarity) coordinates {θi} for the snapshot of the AS Internet consisting of the first n = 103 nodes. All {θi} are inferred by HyperMap43. Panel (b) compares the KS statistics for the Internet and synthetic networks generated by the GPA model (box plot) with γ = 2.1 and Λ = 0.7. The central red mark is the median, the blue horizontal edges of the box are the 25th and 75th percentiles, the black whiskers extend to the most extreme data points not considered outliers. The box plot is obtained from 100 independent generated networks. Panel (c) shows the perfect match between real and synthetic values of the mean community separation (5). Error bars represent plus and minus one standard deviation. Panel (d) juxtaposes the empirical CCDF of the soft community sizes in the Internet against CCDFs obtained for the three GPA-generated networks. Panel (e) shows the temporal evolution of the maximum likelihood estimate for the AS Internet, where the node birth times are their ranks in the decreasing degree order. The yellow star corresponds to the considered snapshot with n = 103 nodes and .

There are many ways to implement this general idea. For a variety of reasons we found that the most natural and consistent one is as follows. First we define the attractiveness of any location for a new node t with radial coordinate rt as the number of existing nodes s < t lying in the hyperbolic disk Dϕ (rt) of radius rt centered at (rt,ϕ). The higher the attractiveness of a location ϕ, the higher the probability that a new node t will chose this location as its place θt = ϕ in the similarity space. We refer to this mechanism as the geometric preferential attachment (GPA) of nodes to the similarity space. This mechanism is illustrated in Fig. 1.

Figure 1
figure 1

Geometric preferential attachment.

At time t, a new node appears at distance rt from the center of the hyperbolic disk denoted by cross. Points ϕ1 and ϕ2 represent two potential locations of the new node and the drop-shaped curves are the boundaries of the hyperbolic disks and of radius rt centered at ϕ1 and ϕ2. Since similarity is attractive and contains more nodes (five) than (none), the new node is more likely to appear at θt = ϕ1.

The exact definition of the GPA model is:

  • Initially the network is empty. New nodes t appear one at a time, t = 1,... and for each t:

  • The angular (similarity) coordinate θt of a new node t is determined as follows:

    1. a

      Sample ϕi ~ U[0,2π], i = 1,...,t, uniformly at random. The set of points in the hyperbolic plane are the “candidate” positions for the newborn node;

    2. b

      Define the attractiveness Ati) of the ith candidate as the number of existing nodes that lie within hyperbolic distance rt from it;

    3. c

      Set θt = ϕi with probability

      where Λ ≥ 0 is a parameter, called the initial attractiveness.

  • The radial (popularity) coordinate of node t is set to rt = 2 ln t. The radial coordinates of existing nodes s < t are updated to rs(t) = βrs + (1 − β)rt.

  • Node t connects to m hyperbolically closest existing nodes (if tm, then node t connects to all existing nodes).

The GPA model has thus three parameters: the number of links m established by every new node, the speed of popularity fading β and the initial attractiveness Λ. A moment's thought shows that m controls the average degree of the network, . We prove in Methods that the model generates scale-free networks and β controls the power-law exponent γ. The initial attractiveness Λ controls the heterogeneity of the angular node density, namely, the heterogeneity is a decreasing function of Λ. When Λ → ∞, the GPA model becomes manifestly identical to the homogeneous popularity × similarity (PS) model36, where the angular coordinate θt of a new node t is sampled uniformly at random on [0, 2π]. Note, however, that in GPA, choosing a position in the similarity space is an active decision made by a node based on the attractiveness of different locations, as opposed to “passive” uniform randomness in PS. In standard PA, the initial attractiveness term is used to control the exponent of the power-law degree distribution7,8. In what follows we show that in GPA, Λ controls certain properties of the community size distribution.

Figure 2 shows the simulation results for networks of size n = 103 generated by the GPA model with m = 3 (i.e. each new node connects to the three hyperbolically closest nodes), β = 2/3 and different values of Λ. As expected, the smaller the value of Λ, the more heterogeneous the distribution of angular coordinates. To quantify the difference between the empirical distribution of the angular coordinates and the uniform distribution on [0, 2π], we use the Kolmogorov-Smirnov (KS) statistic, one of the standard distances that measures the difference between two probability distributions. Recall that the KS statistic ρ is defined as the maximum difference between the values of the empirical distribution of the sample θ1,...,θn and the uniform distribution FU[0,2π](θ) = θ/2π,

The KS statistic as a function of Λ is shown in the bottom panel of Fig. 2. As expected, ρ(Λ) is a decreasing function of Λ.

Figure 2
figure 2

GPA networks.

Synthetic networks of size n = 103 generated according to the GPA model with m = 3, β = 2/3 and Λ = 0.1 (first row), Λ = 1 (second row) and Λ = 10 (third row). The right column shows the corresponding histograms of the angular nodes densities. The bottom panel plots the expected KS statistic ρ (2), as a function of Λ. For each value of Λ, ρ(Λ) is computed by averaging the KS statistics for 100 independently generated networks.

Degree Distribution

For each of the three networks depicted in Fig. 2, the statistical procedure for quantifying power-law behavior in empirical data proposed in Ref. 44 accepts the hypothesis that the network is scale-free. It estimates the lower cutoff for the scaling region as kmin = 3, which is consistent with the minimum degree in the networks m = 3. Figure 3(a) shows a doubly logarithmic plot of the empirical degree distributions P(k) ~ k−γ along with the fitted power-law with exponent γ = 2.5.

Figure 3
figure 3

Degree distribution and clustering.

Panel (a) shows the empirical complementary cumulative degree distribution functions (CCDF) Pc(k) = k′ = kP(k′) for the networks shown in Fig. 2 versus the corresponding power-law fit. The average clustering coefficient as a function of node degree k for these networks is shown in panel (b). The mean clustering for all networks.

These empirical results show that the degree distribution of a network generated by GPA appears to be a power-law. Moreover, quite unexpectedly, the power-law exponent γ remains similar for different values of Λ. These results can be proved analytically (see Methods for details). Remarkably, for any value of Λ, the GPA model produces scale-free networks with the power-law degree distribution identical to the degree distribution in networks growing according to PA and having power-law exponent γ = 1 + 1/β.

Clustering Coefficient

The concept of clustering45 quantifies the tendency to form cliques (complete subgraphs) in the neighborhood of a given node. Specifically, the local clustering coefficient of node s is defined as the probability that two nodes s′ and s″, adjacent to s, are also connected to each other. Figure 3(b) shows the average value of the clustering coefficient for nodes of degree k as a function of k for the three networks in Fig. 2. Interestingly, clustering does not depend on Λ either (a proof is in the Methods) and scales approximately as k−1. This means that, on average, the nodes with higher degree have lower clustering, which is consistent with empirical observations of clustering in real complex networks11,46. For all the three PGA networks, the mean clustering (the average of the local clustering coefficients) is high, .

Soft Communities

The hyperbolic space underlying a network and the GPA mechanism of node appearance in that space naturally induce community structure and allow to detect communities in a very intuitive and simple way. A higher density of links within a community indicates that its nodes are more similar to each other than to the other nodes, because links connect only nodes located within a certain similarity distance threshold. All such densely linked nodes are thus close to each other in some area of the similarity space, meaning that the spatial node density is high in this area. Therefore a community becomes a cluster of spatially close nodes and the community structure is encoded in a non-uniform distribution of angular (similarity) coordinates of nodes.

Following the approach in Ref. 47, let us consider the angular gaps θ between consecutive nodes and define a soft community as a group of nodes separated from the rest of the network by two gaps that exceed a certain critical value θc. If a network has a total number of n nodes, then the critical gap θc is defined as the expected value of the largest gap θ(n) = max{θ1,...,θn}, where θ1,...,θn are distributed uniformly at random on [0, 2π]. The rationale behind this definition is that if nodes are distributed uniformly in the similarity space and there are no communities, then we do not expect to find any pair of nodes separated by a gap larger than this θc. The calculations in the Methods show that the critical gap is approximately

Figure 4 shows the statistics of the rescaled gaps θ/θc for three GPA-generated networks of size n = 104 with Λ = 0.1,1 and 10. In the top panel, we can see the organization of nodes on the circle with many consecutive small gaps (θi < θc) indicating groups of similar nodes (communities) separated by large gaps (θi > θc) which constitute boundaries between communities, so-called “fault lines”9. As expected, smaller values of Λ result into more heterogeneous distribution of gaps with strong long range correlations. This effect is clearly visible in the bottom panel, where the sample autocorrelation function is shown: the smaller the Λ, the slower the autocorrelation decays.

Figure 4
figure 4

Statistics of the rescaled gaps.

Top panel shows the values of rescaled gaps θ/θc for three networks of size n = 104 generated by the GPA model with Λ = 0.1 (left), Λ = 1 (middle) and Λ = 10 (right). The bottom panel shows the sample autocorrelation function of the series in the top panel.

Having a geometric interpretation of the community structure, it is now easy to quantity how well communities are separated from each other. For each community , we define its separation from the rest of the network as the rescaled average of two gaps θ12 > θc that separate from its neighboring communities,

The mean community separation, i.e. the expected separation of a community that a randomly chosen node belongs to, can then be computed as follows:

where ni is the size of community and nc is total number of communities. The network metric can also be viewed as a measure of narrowness (or specialization) of communities. For example, in scientific collaboration network, where nodes represent scientists and communities correspond to groups with similar research interests, quantifies the degree of interdisciplinarity in the network. When is large, the boundaries between communities are sharp and each community focuses on its narrow, specific topic. On the other hand, if is close to one, then the boundaries are blur, communities are wide spread and the network is highly interdisciplinary.

The difference in the stochastic behavior of the rescaled gaps in Fig. 4 suggests that the initial attractiveness Λ controls the mean community separation in the GPA-generated networks. This is confirmed by simulation results shown in Fig. 5(c), where is shown as a function of Λ. As expected, is a monotonically decreasing function, approaching one when Λ is large.

The Internet

To demonstrate the ability of the GPA mechanism to generate graphs that are similar to real networks, and, in particular, to reproduce real non-uniform distributions of similarity node coordinates, we consider the Autonomous Systems (AS) Internet topology48 of December 2009. The network consists of N = 25910 nodes, ASs and M = 63435 links that represent logical relationships between ASs. We embed the AS Internet into its hyperbolic space, i.e compute the popularity and similarity coordinates {rii}, using HyperMap43, an efficient network mapping algorithm that estimates the latent hyperbolic coordinates of nodes. The network topology has a power-law degree distribution with γ = 2.1 and average node degree . This automatically determines two out of three parameters of the GPA model: and β = 1/(γ − 1). In Methods, we explain how to infer the value of Λ from network data using the maximum likelihood method. Here we consider the snapshot of the AS Internet based on the first n = 103 nodes. The corresponding estimated value of the initial attractiveness is ΛInt = 0.7.

Figure 5(a) shows the histogram of the angular node density for the AS Internet snapshot. We note that it is far from uniform, which is a direct indication of the presence of soft communities. We quantify the degree of heterogeneity of the angular density by the KS distance from the uniform distribution (2) and juxtapose it against the KS distances computed for networks generated by the GPA model with Λ = 0.7 (Fig. 5(b)). The Internet value lies within the 25th and 75th percentiles of the synthetic values, which shows that the degrees of non-uniformity in the Internet and GPA networks are comparable. Fig. 5(c) compares the real network with its synthetic counterpart in terms of the expected mean community separation (5). The GPA mechanism generates networks with that match the Internet value very well. In Fig. 5(d), we compare the community size distributions in the Internet snapshot and prediction given by the GPA model. Whereas for the Internet and GPA networks are essentially identical, the KS statistics and community size distributions are similar, but the match is not perfect. This effect is explained by the systematic bias present in the inferred values of the angular coordinates {θi}. Indeed, the HyperMap method first assumes that all angular coordinates are uniformly distributed over the similarity space , i.e. Λ = ∞ and then perturbs them to maximize a certain likelihood function. This “smoothes” the inferred angular node density and makes it more homogeneous than the true distribution. Nevertheless, although the inferred value of Λ is only an approximation for the true value, the GPA model still captures well the degree of heterogeneity in the real network.

Finally we note that GPA defined in Eq. (1) admits an interesting interpretation that suggests a model extension that may be useful for real network analysis. The probability of a new node born at time t to chose the angular position ϕi can be written as

where

Therefore the event of choosing a position on the circle can be understood as follows. With probability pf the new node is a follower and chooses its position according to pure GPA (Λ = 0). With the remaining probability 1 − pf the new node chooses its position uniformly at random among the t available positions. We note that Λ controls pf, since At ≈ 1. When Λ is constant, pf is also constant and consequently there is always a fraction of nodes that are placed at random locations. At long times, these random nodes diminish the effect of pure GPA and eventually the angular distribution of nodes become indistinguishable from a Poisson point process on the circle. We can then wonder whether a constant value of Λ is a realistic assumption for dealing with real networks. In scientific citation networks, for example, when a new field of science is being formed and not much work has yet been done in it, scientists may decide either to explore a new line of research within the field, or to follow one of the mainstream existing lines. The former case can be modeled by a random choice of the angular position, assuming that subfields are homogeneously distributed. The latter is modeled by the pure GPA term in Eq. (6). However, there is a payoff that does not remain constant during the evolution of the field. At early times, the chances to find an interesting result that would be highly cited and followed by others are very high. At late times, the topic space is crowded and the chances to find something fundamentally new are very slim. Therefore, there is a higher incentive for scientists to take higher risks at early times. This can be modeled by pf increasing with time, converging to a value close to 1 as time grows to infinity. In turn, this means that Λ is a decreasing function of time, having a large value at the beginning of network evolution and decreasing to small values afterwards.

Unfortunately, measuring the temporal evolution of Λ in a real network is not yet possible because there currently exists no parametric theory describing such evolution that could be used for statistical inference of Λ. However, it is fairly easy to find an approximate value of Λ as a function of time as follows. If timestamps of a real complex network are available, we can pretend that Λ is constant and infer its value using the MLE techniques described in the Methods for subgraphs made of nodes that were born before a given time t, . This value can be thought of as a (possibly weighted) average of Λ(t) in time window (0,t). By increasing the value of t, we can detect whether Λ is constant (if does not change with time, beyond statistical fluctuations), or a decreasing function of time. Figure 5(e) shows for the AS Internet where the strong temporal dependence of Λ is evident.

Discussion

In summary, hyperbolic network geometry, combining popularity and similarity forces driving network evolution and coupled with preferential attachment of nodes to this geometry (GPA), naturally yields scale-free, strongly clustered growing networks with emergent soft community structure. The GPA model has three parameters that can be readily inferred from network data. Using the AS Internet topology as example, we have seen that the GPA mechanism generates heterogeneous networks that are similar to real networks with respect to key properties, including key aspects of the community size distribution and separation. The mean community separation, a new metric that quantifies the narrowness of communities in a network, is controlled in GPA by initial attractiveness Λ, which controls the power-law exponent in standard PA.

In the context of the asymptotic equivalence between de Sitter causal sets and popularity × similarity (PS) hyperbolic networks established in Ref. 49, we note that Λ is conceptually similar to the cosmological constant Λ in Einstein's equations in general relativity (GR), where it is also an additive term in the proportionality between the energy-momentum tensor and spacetime curvature. Causal sets50,51 are random graphs obtained by Poisson sprinkling a collection of nodes onto (a patch) of a Lorentzian manifold; edges in these graphs connect all timelike-separated pairs of nodes. If there is no matter (empty spacetime) but there is only dark energy (positive Λ), then the solution of Einstein's equations is the de Sitter spacetime and the main theorem in Ref. 49 states that the ensemble of PS graphs is asymptotically (n → ∞) identical to an ensemble of causal sets sprinkled onto de Sitter spacetime, which is one of the three maximally symmetric, homogeneous and isotropic Lorentzian manifolds (the other two are Minkowski and anti-de Sitter spacetimes). In this context, the GPA model considered here is a model with cosmological constant Λ and matter. Modeled by high node density, this matter, as in GR, “attracts more matter”, thus increasing the spacetime curvature of which the node density is a proxy. Indeed the main feature of the model is that the higher the node density in a particular region of space, the more nodes will appear in this region later. The main difference with GR is that here we essentially have an analogy with only the 00-component of Einstein's equations. One can envision that other components should describe the coupled dynamics of the similarity space and nodes in it. In case of scientific collaboration network, for example, that would be the co-evolution of science (space) itself and interests of scientists (node dynamics in this space). In the model considered here nodes do not move. Finding the laws of their spatial dynamics that may further strengthen the analogy with general relativity is a promising but challenging research direction.

In that context, the decay of initial attractiveness Λ that we found in the Internet must be analogous to the decay of cosmological constant Λ in modern cosmological theories. Cosmic inflation52,53 is widely accepted as the most plausible resolution of many problems with the classical big bang theory, including the flatness problem, the horizon problem and the magnetic-monopole problem. Inflation is an initial period of accelerated expansion of the universe during which gravity was repulsive. Inflation does not last long and can be modeled as a time dependent cosmological “constant” Λ that initially has a high value and then decays to zero. The analogies between GPA with decaying Λ and inflation go even further, producing similar outcomes as far as the spatial distribution of events is concerned. Indeed, cosmic inflation has the effect of smoothing out inhomogeneities so that once inflation is over, the universe is nearly flat, isotropic and homogeneous, except for quantum fluctuations of the inflaton field. These fluctuations are the seeds of future inhomogeneities that we observe in the universe at scales smaller than 100Mpc. In the GPA context, a high value of Λ has also a homogenizing effect. Indeed, if Λ is large, then pf is small and new nodes chose their angular positions at random, producing a Poisson point process on the circle. Once Λ is small enough, we are left with a random distribution of points with Poisson fluctuations that, as in the universe, are the seeds of future communities in the network (galaxies in the universe), because once Λ is nearly zero, these initial fluctuations are reinforced by pure preferential attachment.

Methods

Invariance of the degree distribution and clustering

Here we prove that the degree distribution and clustering coefficient in the networks generated by the GPA model do not depend of the initial attractiveness Λ. Moreover, the degree distribution is power-law with exponent γ = 1 + 1/β. The proof can be reduced to the proof for the homogeneous PS model36 (Supplementary Information, Section IV). Consider a new node t and let Rt be the radius of a hyperbolic disk centered at this node such that t is connected to all nodes s < t that lie in this disc. Then the probability that nodes t and s < t in the GPA model are connected can be computed as follows:

where is the hyperbolic distance between nodes s = (rs(t),θs) and t = (rtt) at time t. Using the total probability theorem,

where are the candidate positions generated at Step 1(a) and t(i) are the corresponding acceptance probabilities (1). Applying the total probability theorem with respect to node s, we have:

Since the angular coordinates of the candidate positions and are uniformly distributed on [0, 2π], the probability is simply α/π. Therefore,

where the last equality holds because . We note that does not depend on Λ and that it is exactly the same as the probability of having a link between nodes t and s < t in the homogeneous PS model. The rest of the proof repeats the proof in Ref. 36 without a change. This leads to

which means that the resulting degree distribution in GPA is identical to PA: it is the power-law with exponent γ = 1 + 1/β. Since the connection probability does not depend on Λ, neither does clustering.

Critical gap

To obtain a closed-form expression for the critical gap, we note that for large n, the sequence θ1,...,θn ~ U[0,2π] can be approximately viewed as a realization of the Poisson point process on the circle of unit radius with density λ = n/2π. In this case, the distribution of the angular gaps is approximately exponential with rate λ. The maximum gap θ(n) has then the following PDF and its expected value can be calculated as follows:

where Hn is the nth harmonic number and γ is Euler's constant.

Inference of Λ

The initial attractiveness Λ controls the distribution of angular coordinates θ1,...,θn of the nodes. We therefore first infer θi using the HyperMap method43. Given the network embedding into its hyperbolic space, the likelihood function can be written as follows:

where At(ϕ) is the attractiveness of location , that is the number of existing nodes at time (t − 1) that lie within distance rt from (rt,ϕ). The log-likelihood is then (up to an additive constant):

The multiple integrals in (15) cannot be calculated analytically, since the attractiveness function cannot be written in closed-form. Nevertheless, the log-likelihood can be efficiently estimated be the Monte Carlo method. First, generate N Monte Carlo samples, , j = 1,...,N. The “truncated” samples will be used for estimating the (t − 1)-dimensional integral in (15). Next, precompute all needed attractivenesses, , where t = 2,...n and i = 1,...,t − 1. Then for each value of Λ, the log-likelihood can be estimated as follows (up to a constant):

Computing attractivenesses of the Monte Carlo samples involves computing O(n3N) hyperbolic distances, which is the most computationally intensive part of the algorithm. Having all attractivenesses computed, we can then estimate l(Λ) for any and find the maximum likelihood estimate (MLE) . An important observation that drastically improves the efficiency of the algorithm is that we do not have to use the entire network to accurately estimate , the first nodes are often enough. Table 1 shows the MLEs obtained from the first n0 = 100,200,500 and 1000 nodes of the networks generated by the GPA model with Λ = 0,0.2,0.5,0.7,1 and 2. The corresponding log-likelihood functions are shown in Fig. 6. These simulation results show that the smaller the true value of Λ — and we expect it to be small in real networks since most of them have community structure — the less network data we need to pin down. If, for example, Λ = 0, then the MLE of Λ based on the first n0 = 100 nodes is already zero. The larger the true value of Λ, however, the flatter the log-likelihood is around its maximum, which makes inference more challenging.

Table 1 Maximum likelihood estimates. True values of the initial attractiveness parameter Λ and its MLEs based on the first n0 = 100,200,500 and 1000 nodes. In all simulations, N = 100 Monte Carlo samples were used in (16)
Figure 6
figure 6

Log-likelihood functions.

The estimated log-likelihood functions l(Λ | θ1,...,θn) for synthetic networks of size n = 103 generated by the GPA model with Λ = 0,0.2,0.5,0.7,1 and 2. Each log-likelihood is estimated by (16) using N = 100 Monte Carlo samples and n0 = 500 first nodes.