• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Oct 11, 2005; 102(41): 14497–14502.
Published online Oct 4, 2005. doi:  10.1073/pnas.0501426102
PMCID: PMC1240072
Engineering

The “robust yet fragile” nature of the Internet

Abstract

The search for unifying properties of complex networks is popular, challenging, and important. For modeling approaches that focus on robustness and fragility as unifying concepts, the Internet is an especially attractive case study, mainly because its applications are ubiquitous and pervasive, and widely available expositions exist at every level of detail. Nevertheless, alternative approaches to modeling the Internet often make extremely different assumptions and derive opposite conclusions about fundamental properties of one and the same system. Fortunately, a detailed understanding of Internet technology combined with a unique ability to measure the network means that these differences can be understood thoroughly and resolved unambiguously. This article aims to make recent results of this process accessible beyond Internet specialists to the broader scientific community and to clarify several sources of basic methodological differences that are relevant beyond either the Internet or the two specific approaches focused on here (i.e., scale-free networks and highly optimized tolerance networks).

Keywords: complex network, HOT, Internet topology, network design, scale-free network

A popular case study for complex networks has been the Internet, with a central issue being the extent to which its design and evolution have made it “robust yet fragile” (RYF), that is, unaffected by random component failures but vulnerable to targeted attacks on its key components. One line of research portrays the Internet as “scale-free” (SF) with a “hub-like” core structure that makes the network simultaneously robust to random losses of nodes yet fragile to targeted attacks on the highly connected nodes or “hubs” (13). The resulting error tolerance with attack vulnerability has been proposed as a previously overlooked “Achilles' heel” of the Internet. The appeal of such a surprising discovery is understandable, because SF methods are quite general and do not depend on any details of Internet technology, economics, or engineering (4, 5).

One purpose of this article is to explore how this SF depiction compares with the real Internet and explain the nature and origin of some important discrepancies. Another purpose is to suggest that a more coherent perspective on the Internet as a complex network, and in particular its RYF nature, is possible in a way that is fully consistent with Internet technology, economics, and engineering. A complete exposition relies on the mathematics of random graphs and statistical physics (6), which underlie the SF theory, as well as on the very details of the Internet ignored in the SF formulation (7). Nevertheless, we aim to show here that the essential issues can be readily understood, if not rigorously proven, by using less technical detail, and the lessons learned are relevant well beyond either the Internet or SF-network models (810).

Power Laws and SF Models

One widespread focus of attention has been on “power laws” (or “scaling”) in graph vertex connectivity. For a graph having n vertices, let di denote the degree of vertex i, 1 ≤ in. We call D = {d1, d2,..., dn} the degree sequence of the graph, assumed without loss of generality always to be ordered d1d2 ≥ · · · ≥ dn. Let G(D) denote the set of all connected simple graphs (i.e., no self-loops or parallel edges) having the same graph degree D. We will say that graphs g [set membership] G(D) have scaling-degree sequence D (or D is scaling) if for all 1 ≤ knsn, D satisfies a power-law rank-size relationship of the form equation M1, where 0 < c and 0 < α are constants and ns determines the range of scaling (11). Because scaling implies log(k) + αlog(dk) ≈ log(c), doubly logarithmic plots of degree dk versus rank k yield approximately straight lines of slope -α. In contrast, exponential rank-size relationships (i.e., keλdkc) result in approximately straight lines on semilogarithmic plots.

The most significant SF claims for the Internet are that the router graph has power-law degree sequences that give rise to hubs, which by SF definition are highly connected vertices that are crucial to the global connectivity of the network and through which most traffic must pass (3). The SF assertion (later formalized in ref. 12) is that such hubs hold the network together, giving it “error tolerance” to random vertex failures, because most vertices have low connectivity (i.e., are nonhubs) but also have “attack vulnerability” to targeted hub removal, a previously overlooked Achilles' heel. The rationale for this claim can be illustrated by using the toy networks shown in Fig. 1, all of which have the identical scaling-degree sequence D shown in Fig. 1e.Fig. 1a shows a graph (size issues notwithstanding) that is representative of the type of structure typically found in graphs generated by SF models, in this case preferential attachment (PA). This graph is drawn in two ways: the left and right visualizations emphasize the growth process and Internet properties, respectively. Clearly, the highest-degree nodes are essential for graph connectivity, and this feature can be seen even more clearly for the more idealized SF graph shown in Fig. 1b. Thus, the SF claims would certainly hold if the Internet looked at all like Figs. 1 a and b. As we will see, the Internet looks nothing like these graphs and is much closer to Fig. 1d, which has the same degree sequence D but is otherwise completely different, with high-degree vertices at the periphery of the network, where their removal would have only local effects. Thus, although scaling-degree sequences imply the presence of high-degree vertices, they do not imply that such nodes form necessarily “crucial hubs” in the SF sense.

Fig. 1.
Diversity among graphs having the same degree sequence D. (a) RNDnet: a network consistent with construction by PA. The two networks represent the same graph, but the figure on the right is redrawn to emphasize the role that high-degree hubs play in overall ...

The deeper origins of the claims involving power laws and hubs arise from the SF models' roots in statistical physics, in which any particular graph is interpreted as an element from a larger statistical ensemble of graphs, with probability weights that typically arise either implicitly through some underlying stochastic generation process or by a mechanism that explicitly assigns a weight to each element of the ensemble (13, 14). Although there exist a variety of methods for generating ensembles of graphs having scaling-degree sequences, including PA, generalized random graph, power law random graph (15), and random degree-preserving rewiring (16), the resulting models are widely conjectured to be asymptotically equivalent (e.g., see ref. 6 and references therein).

In particular, for a graph g having degree sequence D, we define the purely graph-theoretic quantity s(g) = Σ(i,j)[set membership]E(g)didj, where E(g) is the set of edges in the graph. It is easy to check that high s(g) requires high-degree vertices to connect to other high-degree vertices. Normalizing against smax = max{s(g): g [set membership] G(D)}, we define the measure 0 ≤ S(g) ≤ 1 of the graph g as S(g) = s(g)/smax. Although s(g) and S(g) can be computed for any graph and do not depend on any particular construction mechanism, they have a special meaning in the context of ensembles of graphs. Specifically, S(g) has a direct interpretation as the relative log-likelihood of a graph resulting from the generalized random-graph construction (17); thus, all of the SF-model–generation mechanisms generate essentially only high S graphs. The S-metric also potentially unifies other aspects of SF graphs, because it is closely related to betweenness, degree correlation (6), and graph assortativity (18) and captures several notions of self-similarity related to graph trimming, coarse graining, and random rewiring (6).

The focus on ensemble-based methods means that the analysis in SF models has implicitly ignored those graphs that are unlikely to result from such constructions, in particular graphs with small S. Thus, although power-law degree distributions are unlikely under some traditional random graph constructions [e.g., Erdös–Renyí random graphs (19)], there are a multitude of other model-generation mechanisms that give rise to power laws (20). The SF-generating mechanisms are only one kind, but they tend to generate only high S graphs, which leaves unexplored an enormous diversity of low S graphs, as seen in Fig. 1. The graphs in Fig. 1 a and b are relatively likely to result from probabilistic construction, whereas the graphs in Fig. 1 c and d are vanishingly unlikely. The PA-type graph shown in Fig. 1a has S(ga) = 0.61 and is typical of the graphs that are likely under a variety of random-generation methods. The graph shown in Fig. 1b is the smax graph and thus by definition has S(gb) = 1.0. It can be thought of both as the most likely graph and also (uniquely) as the most “perfectly” SF graph with this degree sequence. Of course, the sheer enormity of the number of different high S graphs means that any particular one graph, even the relatively most likely, is actually unlikely in absolute terms to be selected. The graphs in Fig. 1 c and d have the values S(gc) = 0.33 and S(gd) = 0.34, respectively; furthermore, there are relatively few graphs with S values this low, and thus any graphs similar to these are vanishingly unlikely to arise at random (6). The remainder of this article explains in more detail why the underlying forces at work in the evolution of the real router-level Internet avoid the generation of high S graphs and how this feature can be captured in an optimization-based design framework. We also consider what, if anything, this framework has to say about the RYF nature of the Internet.

A Look at the Actual Internet

An obvious starting point for investigating the structure and underlying forces at work in the Internet is to inspect detailed router-level maps from Internet service providers (ISPs). Abilene, the backbone for the Internet2 academic network, is illustrated in Fig. 1 and is an ideal example for many reasons that will be exploited throughout this analysis.** Abilene publishes detailed hardware specifications for each router and link, so Fig. 1 is exact, not an approximation based on indirect measurements. Abilene is also a state-of-the-art network with essentially no difference between physical (i.e., layer two) and Internet-protocol (IP) (i.e., layer three) connectivity. This simplifies the exposition without loss of generality and also eliminates a source of confusion in measured data from networks that use older legacy technologies. Using regional academic networks and commercial ISPs, we verified that all the inferences and conclusions based on Abilene hold in general. Commercial ISPs do not allow publishing such details because of proprietary considerations, but router-level measurement studies (21, 22, ††) further confirm our analysis (7, 23, 24), although this requires additional statistical and Internet-specific expertise beyond the intended scope of this article.

Fig. 2 shows that Abilene is designed as a sparsely connected mesh of uniformly high-speed [10 gigabits per sec (Gbps)], long-range links between routers located in 11 major U.S. cities, with connectivity to regional and local networks provided by some minimal amount of redundancy. These design features are typical of ISP backbones (the main connections and routers composing an ISP's national or international network), which can differ in overall size but are qualitatively similar. One of their most obvious features is the complete absence of SF hubs; high-degree vertices can exist but are found only within the local networks at the far periphery of the network and would not appear anywhere close to the backbone.

Fig. 2.
Router-level topology of Abilene. Each vertex represents a router, and each link represents a physical connection; however, each physical connection can support many virtual connections, giving the appearance of greater connectivity at higher layers of ...

Although the issue of whether the real Internet actually has power laws in its connectivity is beyond the scope of this article, there exists considerable evidence that existing claims of power laws in the router-level Internet may be the result of misinterpretation of available measurements and/or their naive and inappropriate statistical analysis (see ref. 6 for a detailed discussion). It is certainly the case that current router technology, in principle, could support high variability (possibly scaling), but a closer look at existing router technology confirms that if high connectivity exists at all, it will be found toward the network periphery and not in its core. As we will show next, this is a consequence of the need for a high-performance network.

Internet Modeling: An Optimization-Based Approach

Much of the topology of Abilene and other ISP backbones can be understood by using annotated graphs with a few technological and economic constraints that provide a simple, yet surprisingly complete model of the essentials of network design (7). Highly optimized/organized tolerance/tradeoffs (HOT) has been proposed as a conceptual framework for capturing the highly organized, optimized, and RYF structure of complex highly evolved systems (8). HOT seeks an abstract but unified approach to diverse complex systems through models involving optimization of tradeoffs between multiple functional objectives of networks subject to constraints on their components, usually with an explicit source of uncertainty against which solutions must be tolerant, or robust. Constrained optimization and robustness are the universal themes, but models of function, uncertainty, component constraints, and environment are necessarily domain-specific. One consistent result of the HOT framework has been that once functional performance and robustness tradeoffs are considered, then in a variety of toy models engineering design (7) or biological evolution (9) easily generates power laws. This can occur in both deterministic and stochastic HOT models, including models motivated by physics (8). Power laws have been a central focus of the “emergent complexity” view of SF and related methods, which arrive at them in completely different ways than HOT with its focus on “organized complexity.”

Here, we present toy networks reflecting the HOT approach to modeling the router-level Internet, which we will contrast with the corresponding SF-network models. To this end, consider again the example network shown in Fig. 1d, which we argue captures the kind of essential domain-specific tradeoffs that occur in engineering. Accordingly, we refer to this toy network as HOTnet, although it is important to underscore that our results do not depend on designs being formally optimal, which is unlikely to occur in practice. Instead, we will argue that any sensible network-design process with minimally realistic assumptions would produce something qualitatively similar.

A HOT model of the Internet's router-level topology requires two general elements: constraints and functional objectives. First, the technological and economic constraints on components such as routers and links and their interconnection restrict what topologies are feasible or possible. Second, network backbones, and router-level connectivity more generally, are subsystems in the larger, decentralized and layered Internet infrastructure. The consequence is that such subsystems can only be understood fully in terms of the functions that they provide to the higher layers of the protocol stack and the rest of the network. The main purpose for building physical network infrastructures at the lower layers of the protocol stack is to carry effectively the expected or projected overall traffic demand generated at the higher layers, which in turn is ultimately driven by users at the application layer. Such teleological explanations are understandably avoided in physics but are essential for engineering networks, and this gap is responsible for much of the difference between the two approaches described in this article.

A standard metric for network performance adopted here is the maximum throughput of the network under a “gravity model” of end-user traffic demands (25). It assumes that every end vertex i has a total bandwidth demand xi, that two-way traffic is exchanged between all pairs (i, j) of end vertices i and j, and the flow Xij of traffic between i and j is given by Xij = ρxixj, where ρ is some global constant, and is otherwise uncorrelated from all other flows. Although more elaborate metrics are possible, this notion of network performance is a reasonable measure of the ability to provide a fair allocation of end-user bandwidths. Our performance measure for a given network g is then its maximum throughput with gravity flows, computed as P(g) = maxρ Σij Xij, subject to RXB, where R is the routing matrix obtained by using standard shortest-path routing. R = [Rkl], with Rkl = 1 if flow l passes through router k, and Rkl = 0 otherwise. X is the vector of all flows Xij, indexed to match the routing matrix R, and B is a vector consisting of all router bandwidth capacities.

The crucial elements of a design aimed at this notion of performance are realistic router capacities and economic considerations. Hardware technology fundamentally limits the number of data packets that can be processed per unit of time; thus, routers must obey a form of flow conservation in the traffic that they handle. Although total router capacity is constantly increasing as hardware improves, this tradeoff in router utilization cannot be avoided. Fig. 3a shows the router bandwidth-degree limits used in this model. In terms of economics, the cost of installing and operating physical links increases with link distance and can dominate the total budget for the global infrastructure, particularly in the backbone. Although routers impose overall bandwidth limits, the backbone cost is primarily dominated by the installation and operation of links. This cost imposes strong incentives to minimize the number and length of deployed links by aggregating and multiplexing traffic at all levels of the network hierarchy, from the periphery to the core. Thus, the combination of router technology and link costs necessitate that when moving from the periphery to the network core, the link capacities, link lengths, and total router throughput generally increase while router degrees decrease. The result is possibly highly variable bandwidth and router degrees at the network's periphery, with necessarily a much greater uniformity of high-bandwidth and low-degree routers in the core.

Fig. 3.
HOTnet vs. SFnet. (a) Achieved router utilization: HOTnet (circles) is close to the “efficient frontier,” and SFnet (diamonds) operates significantly below this frontier, with the highly connected hub core router (the diamond in the right ...

As noted above, the network HOTnet shown in Fig. 1d was inspired by the real Abilene network, and its overall connectivity was designed to achieve high performance while maintaining the scaling-degree sequence shown in Fig. 1e. This network uses essentially the Abilene backbone as its core (the inner circle of routers in Fig. 1d) and then assumes that end users (the outer circle) connect through small and greatly compressed single-level regional networks (the middle circle of vertices). This allows us to create a network that uses the same technology as the real Internet but has a scaling-degree sequence. In particular, this scaling vertex degree is achieved in a minimal but technologically plausible way by choosing a gravity model of end-user traffic demands and then aggregating these end users with routers that have high variability in their connectivity but must satisfy a particular router-technology constraint. Although the resulting network shown in Fig. 1d is far too compressed to look like the real Internet, it has the same performance objectives, constraints, and design principles, although simplified, and shows that a scaling-degree sequence is at least plausibly consistent with Internet technology and economics. It also could reasonably be argued that this design-driven toy model grossly oversimplifies real Internet technology and economics, but we next demonstrate that this type of model has superior explanatory power to alternatives that ignore them entirely.

Contrasting HOT and SF Models

In view of the empirical evidence and the engineering arguments against popular SF claims regarding the location and criticality of the highest-connectivity routers, we next quantify more precisely the qualitative observations that we made above to illuminate the key methodological differences behind these different approaches and their resulting models. In doing so, we consider again the four toy models shown in Fig. 1 along with their most relevant properties. To contrast the features of graphs having the same scaling-degree sequence, we first consider the network HOTnet shown in Fig. 1d alongside the “most preferential” network in Fig. 1b, which we denote in the remainder of this article as SFnet. In computing the performance of these two graphs, we observe that P(HOTnet) = 5.76 × 1011 bps, whereas P(SFnet) = 4.89 × 109 bps, a difference of >2 orders of magnitude.

This enormous performance difference can be understood by examining the utilization of individual routers within each network, as illustrated in Fig. 3a. This figure shows the overall feasible configuration region encapsulating the conservation between router degree and router throughput (measured in bandwidth) as discussed above and represented as B in the computation of performance. Although greatly simplified for use here, this abstract representation for router bandwidth is consistent with real router technology (7), and it is adequate for our purposes because the resulting conclusions depend only on the most general features of Fig. 3a and not on specific details. The unambiguous source of the poor SFnet performance is that the high-degree hubs become saturated and create severe bottlenecks, leaving the rest of the network with low overall utilization. In contrast, the connectivity in HOTnet is such that the core routers are highly used and therefore enable greater overall network throughput.

An additional view into the performance and utilization of these two networks is available by considering the distribution of bandwidth that is actually delivered to the end users in these two networks under maximum-flow conditions, as shown in Fig. 3b. The distribution of achieved end-user bandwidth for HOTnet is highly variable, spanning 4 orders of magnitude (as opposed to five or more found in real networks; see ref. 23), but is considerably higher than what is received by users in SFnet, who get uniformly low bandwidth. Another issue not quantified here is that no matter where the high-degree SF hubs were located physically, the link costs to connect them would be prohibitively high. In contrast, the design aspects incorporated into HOTnet ensure that the deployed routers are used efficiently and the network is able to satisfy end-user bandwidth demands that are highly variable with relatively few long-range links. For network engineers, the combination of superior throughput, high router utilization, low link costs, and realistic end-user bandwidth makes HOTnet highly desirable but SFnet a very poor design choice, although networking reality dictates the need for some degree of overprovisioning that will result in a slightly less efficient network than HOTnet.

Another important comparison between the graphs of HOTnet and SFnet is to investigate the presence of Achilles' heel hubs. Here, we will consider robustness to router failures, defining this robustness as the remaining performance of the network after routers are removed and after rerouting of traffic. That is, addressing the issue of network robustness for the Internet requires, at a minimum, incorporating a simple abstraction of IP routing that accounts for the feedback mechanism by which the real network “sees” damage and works around it. Note that the main mechanism by which users improve robustness to network losses is through link redundancy (e.g., multihoming), but this was not an objective of our heuristic HOT design. However, we still can illustrate the differences between HOTnet and SFnet in some limited way. Fig. 3c shows the impact of deleting routers in succession, always taking the worst case that has not yet been deleted. The measure of performance after deletion of a vertex is the amount of original traffic that still can be served by the remaining network after rerouting but with routers that still have to adhere to their original bandwidth constraints.

Consistent with SF claims (3), the SFnet network is indeed fragile to the deletion of worst-case vertices (here, worse case means highest degree) but resilient to deletions of other vertices. In stark contrast, HOTnet is not only robust to worst-case deletions (here, worst cases are low connectivity core vertices) but also shows high tolerance to deleting other vertices. In particular, loss of high-degree edge routers disconnects only low-bandwidth users and has no other effect on overall connectivity. Because SFnet has such poor nominal performance to start with, its performance is worse intact than HOTnet after the latter has sustained substantial damage. Thus, the Achilles' heel claim for SF networks does not seem to hold simply on the basis of having a scaling-degree sequence D, and we will provide one possible explanation for this difference in the next section. Fortunately, the actual Internet is more like HOTnet than SFnet and also has a great deal of additional robustness. Because the real Internet consists of multiple redundant HOTnet-type backbones that are moderately loaded, the ability to reroute traffic ensures that end users typically experience no discernible degradation in performance when core routers fail. In particular, the real Internet would never experience the type of separation of the network into disjoint components as claimed by the Achilles' heel hub argument unless massive losses occurred.

The HOTnet model shown in Fig. 1d and the SFnet model shown in Fig. 1b are just two points in the space G(D) of simple connected graphs having identical scaling-degree D (shown in Fig. 1e). The space G(D) is difficult to visualize, mainly because it is very diverse and has a combinatorially large number of elements. However, some aspects can be explored by projecting this high-dimensional space onto lower dimensions by using macroscopic measures. Here we leverage our previously defined notions of performance P(g) and relative likelihood S(g), and we show the values for our toy networks in Fig. 4. The ability of the P(g) and S(g) measures to help differentiate among graphs in the space G(D) is illuminated further by considering the two other networks shown in Fig. 1.

Fig. 4.
The diversity of graphs having the same degree sequence D. Despite having the identical budget, technology constraint, degree sequence, and traffic-demand model, when computing and plotting for the four network models in Fig. 1 their S(g) (x axis) and ...

Fig. 1c depicts a graph having a heuristically “poor” engineering design (denoted BADnet), and Fig. 1a depicts a graph having “random” connectivity (denoted RNDnet) and is typical of graphs grown by PA. Although all four toy networks shown in Fig. 1 are identical as far as their degree sequence D is concerned, three of them occupy completely opposite corners of the P(g) versus S(g) plane. The BADnet network demonstrates that low S(g) does not necessarily imply high performance, and in general, graphs having low S(g) may be completely different from one another. In contrast, the RNDnet shows that other graphs resulting from SF models have the same poor qualitative and quantitative features as SFnet, and for the same reasons. We also observe that graphs having S(g) ≈ 1 are much more alike (essentially unique), and our results to date suggest that if D exhibits high variability, it is impossible to have graphs g [set membership] G(D) with both high S(g) and high P(g).

Finally, we consider the graph operation of pairwise degree-preserving rewiring, whereby two randomly chosen edges are rewired but constrained to preserve the graph degree and network connectivity. It is easily shown that by a finite succession of such rewirings, any graph g [set membership] G(D) can be converted to any other, and thus this process provides a simple mechanism for exploring the space G(D). The additional points in Fig. 4 correspond to rewirings of HOTnet and SFnet, respectively, and demonstrate that although rewiring the former quickly produces networks with poor performance, rewiring the latter produces little improvement in performance. Note that the PA graph shown in Fig. 1a is representative of a large number of graphs resulting from arbitrary pairwise rewiring, thus justifying its name as RNDnet and supporting the conjecture that all SF models generate essentially the same ensemble of graphs.

Although far from comprehensive, the structural metric S(g) provides some understanding of the diversity of graphs in the space G(D). One striking feature of this view is that some of the most celebrated features of SF models, particularly the Achilles' heel vulnerability, seem to hold only for graphs having scaling-degree sequence D and high S(g). It is not a necessary consequence of scaling alone, because it does not apply to networks such as HOTnet that have low S(g), even if they have the same scaling-degree D. However, recalling that in addition to measuring the hub-like nature of a graph, S(g) also has an interpretation as relative graph log-likelihood, the concentration of points in Fig. 4 suggests that the vast majority of graphs resulting from SF models have a relatively high likelihood of occurring and that the likelihood of recovering a graph similar to HOTnet through probabilistic construction is vanishingly small.

The Real RYF Internet

The preceding discussion suggests that probabilistic constructions are unlikely to capture the true router-level structure of the Internet and also that claims of a vulnerability in high-degree nodes are not supported by either engineering data or theory. The true RYF nature of the Internet is a complex and heavily studied issue, but we will sketch some central features. The perception of the Internet as a simple, robust, and homogeneous resource is the result of a layered architecture that uses multiple forms of feedback control that enable robust performance in the presence of frequent disruptions and enormous heterogeneity. The lowest layers of the protocol stack (involving the physical infrastructure such as routers and fiber-optic cables) have hard technological and economic constraints, but each higher layer defines its own, often unique connectivity, and the corresponding network topologies become, by design, increasingly virtual and unconstrained. For example, in contrast to routers and physical links, the allowable connectivity of documents and virtual links in the World Wide Web (WWW) is designed to be essentially completely unconstrained.

An important feature of the Internet's highly organized but largely hidden complexity is to make the full system robust to the perturbations for which it was designed (26) but also potentially quite vulnerable to other perturbations (27). All components must obey the protocols, but because of extensive feedback regulation, the overall system can tolerate otherwise enormous variability within these constraints and still deliver robust functionality to applications, which are also the least constrained components. Because the complete absence of a component is allowed, the system, by design, is robust to components that “fail off” by removal from the network, whether caused by focused attacks or other failures.

Note that it is protocols and feedback regulation and not simple redundancy per se that enables this extraordinary robustness. Another striking aspect of this robust design is a scalability, evolvability, and adaptability to exactly the kind of radical network change (i.e., in both hardware at the lower layers and applications at the highest layer) that the Internet has undergone in transforming from an academic research network to a critical component of the information infrastructure. Unfortunately, the Internet's strong robustness and adaptability coexists with an equally extreme fragility to components “failing on,” particularly by malicious exploitation or hijacking of the very mechanisms that confer its robustness properties at higher levels in the protocol stack. Worms, viruses, spam, and denial-of-service attacks remain familiar examples (28). This RYF tradeoff is a critical aspect of the Internet, and much research is devoted to enhancing these protocols in the face of new challenges. Thus, understanding Internet robustness requires a perspective that incorporates protocols, layering, and feedback regulation, and this view suggests that the most essential RYF features of the Internet actually come from aspects that are only indirectly related to graph connectivity.

The presentation here has emphasized the HOT framework as an alternate approach to SF models when considering the RYF nature of the Internet, and many other choices of functions and constraints are possible. Other researchers might emphasize alternative features that highlight particular tensions (e.g., design tradeoffs at different levels of the IP stack) and would be justified in doing so. The main point is the importance of incorporating issues such as performance, constraints, and tradeoffs (all of the things that make engineering different from physics) when considering the “essential” features of a highly evolved system. Here we denote highly evolved systems as those resulting from an iterative design that incorporates tradeoffs between performance and the use of available resources. Thus, the RYF features of the Internet are the result of its highly evolved nature, and a key objective here has been to incorporate some of the most essential features in a simple model that can be used to highlight the potential dangers of ignoring such aspects entirely.

Conclusion

It is certainly appealing that SF network models can avoid all Internet-specific structures, such as protocol stacks, technological or economic constraints, and user heterogeneity, yet make interesting and testable predictions. Unfortunately, this fact yields results that collapse when tested with real data or when examined by domain experts. Here, we have shown that there exist technological, economic, and graph theoretic reasons why the most important SF claim (i.e., that the Internet has “hubs” that form an Achilles' heel through which most traffic flows and the loss of which would fragment the Internet and constitute its attack vulnerability) cannot be (and is not) true for the current router-level Internet. More generally, Table 1 shows that SFnet and HOTnet are opposite in essentially every meaningful sense, and the real Internet network is much more like HOTnet.

Table 1.
SFnet vs. HOTnet and the real Internet

This raises the more basic question of the applicability to highly evolved systems of unstructured, ensemble-based approaches, of which SF networks are just one example, and a largely parallel story in biology further suggests that the answer may be negative. Here, interesting and testable SF claims about metabolic networks (29, 30) contrast sharply with both real data and concrete HOT models (9). Again, functional descriptions and component constraints, such as conservation of energy and small moieties, the biochemical nature of underlying reactions, and the importance of robustness and evolvability prove essential (31). However, while the router-level story here may be reflective of a broader debate about methodologies appropriate for complex networks, it is expected to take an even greater effort in domains like biology to reach the same level of clarity.

Acknowledgments

This work was partially supported by Boeing, Army Institute for Collaborative Biotechnologies, an Air Force Office of Scientific Research Award FA9550-05-1-0032 “Bio-Inspired Networks,” and the Lee Center for Advanced Networking (California Institute of Technology). Parts of this work were done at the Institute of Pure and Applied Mathematics (University of California, Los Angeles) as part of the 2002 annual program on large-scale communication networks.

Notes

This paper was submitted directly (Track II) to the PNAS office.

Abbreviations: RYF, robust yet fragile; SF, scale-free; PA, preferential attachment; ISP, Internet service provider; IP, Internet protocol; bps, bits per second; HOT, highly optimized/organized tolerance/tradeoffs; RND, random; WWW, World Wide Web.

Footnotes

**Detailed information about the objectives, organization, and development of the Abilene network are available from www.internet2.edu/abilene.

††SKITTER Project. Cooperative Association for Internet Data Analysis, University of California San Diego Supercomputing Center (www.caida.org).

References

1. Barabási, A.-L. & Albert, R. (1999) Science 286, 509-512. [PubMed]
2. Yook, S.-H., Jeong, H. & Barabási, A.-L. (2002) Proc. Natl. Acad. Sci. USA 99, 13382-13386. [PMC free article] [PubMed]
3. Albert, R., Jeong, H. & Barabási, A.-L. (2000) Nature 406, 378-382. [PubMed]
4. Albert, R. & Barabási, A.-L. (2002) Rev. Mod. Phys. 74, 47-97.
5. Newman, M. E. J. SIAM Rev. (2003) 45, 167-256.
6. Li, L., Alderson, D., Doyle, J. C. & Willinger, W. (2005) Internet Math, in press.
7. Li, L., Alderson, D., Doyle, J. C. & Willinger, W. (2004) Proc. ACM SIGCOMM.
8. Carlson, J. M. & Doyle, J. C. (2002) Proc. Natl. Acad. Sci. USA 99, Suppl. 1, 2538-2545. [PMC free article] [PubMed]
9. Tanaka, R. (2005) Phys. Rev. Lett. 94, 168101. [PubMed]
10. Moritz, M. A., Morais, M. E., Summerell, L. A., Carlson, J. M. & Doyle, J. C. (2005) Proc. Natl. Acad. Sci. USA, in press.
11. Mandelbrot, B. B. (1997) Fractals and Scaling in Finance: Discontinuity, Concentration, Risk (Springer, New York).
12. Bollobás, B. & Riordan, O. (2003) Internet Math 1, 1-35.
13. Dorogovtsev, S. N. & Mendes, J. F. F. (2003) Evolution of Networks: From Biological Nets to the Internet and WWW (Oxford Univ. Press, New York).
14. Pastor-Satorras, R. & Vespignani, A. (2004) Evolution and Structure of the Internet: A Statistical Physics Approach (Cambridge Univ. Press, Cambridge, MA).
15. Aiello, W., Chung, F. & Lu, L. (2000) Proc. ACM STOC, 171-180.
16. Gkantsidis, C., Mihail, M. & Zegura, E. (2003) Proc. SIAM Alenex, 16-25.
17. Chung, F. & Lu, L. (2003) Internet Math 1, 91-113.
18. Newman, M. E. J. (2002) Phys. Rev. Lett. 89, 208701. [PubMed]
19. Erdos, P. & Renyi, A. (1959) Publ. Math. (Debrecen) 9, 290-297.
20. Newman, M. E. J. (2005) arXiv: cond-mat/0412004.
21. Govindan, R. & Tangmunarunkit, H. (2000) Proc. IEEE INFOCOM 3, 1371-1380.
22. Spring, N., Mahajan, R. & Wetherall, D. (2002) Proc. ACM SIGCOMM, 133-145.
23. Alderson, D. (2004) Technological and Economic Drivers and Constraints in the Internet's “Last Mile” (California Institute of Technology, Pasadena), Technical Report CIT-CDS-04-004.
24. Willinger, W., Alderson, D., Doyle, J. C. & Li, L. (2004) Proc. ACM SIGCOMM IMC, 88-100.
25. Zhang, Y., Roughan, M., Lund, C. & Donoho, D. (2003) Proc. ACM SIGCOMM, 301-312.
26. Clark, D. D. (1988) ACM Comput. Commun. Rev. 18, 106-114.
27. Willinger, W. & Doyle, J. C. (2004) in Robust Design: A Repertoire of Biological, Ecological, and Engineering Case Studies, ed. Jen, E. (Oxford Univ. Press, New York).
28. Bank, D. & Richmond, R. (July 18, 2005) Wall Street Journal, Section R, p. 1.
29. Jeong, H., Tombor, B., Albert, R., Oltavi, Z. N. & Barabasi, A.-L. (2000) Nature 407, 651-654. [PubMed]
30. Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N. & Barabasi, A.-L. (2002) Science 297, 1551-1555. [PubMed]
31. Csete, M. & Doyle, J. (2004) Trends Biotechnol. 22, 446-450. [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Links

  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...