Internet Facts, Figures, Failure Examples, and Reliability-Associated Observations
Some of the important Internet facts, figures, and examples are as follows:
- • In 2000, in the United States Internet-related economy generated around $830 billion in revenues .
- • From 2006 to 2011, developing countries around the globe increased their share of the world’s total number of Internet users from 44% to 62% .
- • In 2011, over 2.1 billion people around the globe were using the Internet, and approximately 45% of them were below the age of 25 years .
- • In 2001, there were 52,658 Internet-related incidents and failures .
- • In 2000, the Internet carried 51 % of the information flowing through two-way telecommunication, and by 2007 over 97% of all telecommunicated information was transmitted through the Internet .
- • On August 14, 1998, a misconfigured main Internet database server wrongly referred all queries for Internet systems/machines with names ending in “net” to the incorrect secondary database server. In turn, due to this problem, most of the connections to “net” Internet web servers and other end stations malfunctioned for a number of hours .
- • On April 25, 1997, a misconfigured router of a Virginia service provider injected a wrong map into the global Internet and, in turn, the Internet providers who accepted this map automatically diverted all their traffic to the Virginia provider . This caused network congestion, instability, and overload of Internet router table memory that ultimately shut down many of the main Internet backbones for around two hours .
- • On November 8, 1998, a malformed routing control message because of a software fault triggered an inter-operability problem between a number of core Internet backbone routers produced by different vendors. In turn, this caused a widespread loss of network connectivity in addition to an increment in packet loss and latency . It took many hours for most of the backbone providers for overcoming this outage.
A study in 1999 reported the following four Internet reliability-associated observations :
i Most interprovider path malfunctions occur from congestion collapse.
ii Mean time to failure (MTTF) and mean time to repair (MTTR) for most of the Internet backbone paths are approximately 25 days or less and 20 minutes or less, respectively.
iii In the Internet backbone infrastructure, there is only a minute fraction of network paths that contribute disproportionately, directly or indirectly, to the number of long-term outages and backbone unavailability.
iv Mean time to failure and availability of the Internet backbone structure are quite significantly lower than the Public Switched Telephone Network.
Internet Outage Categories and an Approach for Automating Fault Detection in Internet Services
A case study of Internet-related outages carried out over a period of one year has grouped the outages under 12 categories (along with their occurrences percentages in parentheses), as shown in Figure 6.3 .
Past experiences over the years indicate that many Internet services (e.g., e- commerce and search engines) suffer faults, and a quick detection of these faults could be an important factor in improving system availability. For this very purpose, an approach called the pinpoint method is considered extremely useful. This method combines the low-level monitors’ easy deploy-ability with the higher-level monitors’ ability for detecting application-level faults . The method is based upon the
FIGURE 6.3 Categories of Internet outages (along with their occurrence percentages in parentheses).
following three assumptions with respect to the system under observation and its workload : 
• Stage III: Detecting anomalies in system behaviors. This is basically concerned with analyzing the ongoing behaviors of the system and detecting anomalies with respect to the reference model.
Additional information on this method is available in Kiciman and Fox .
-  The software is composed of a number of interconnected modules with clearlydefined narrow interfaces, which could be software subsystems, objects, orsimply physical mode boundaries. • There is a considerably higher number of basically independent requests (i.e.,from different users). • An interaction with the system is relatively short-lived, the processing ofwhich can be decomposed as a path or, more clearly, a tree of the names ofelements/parts that participate in the servicing of that request. The pinpoint method is a three-stage process and its stages are as follows : • Stage I: Observing the system. This is concerned with capturing the runtime path of each and every request served/handled by the system and then,from these paths, extracting two specific low-level behaviors that are likely toreflect high-level functionality (i.e., interactions of parts/components and pathshapes). • Stage II: Learning the patterns in system behavior. This is concerned withconstructing a reference model that clearly represents the normal behavior ofan application in regard to part-component interactions and path shapes. Themodel is developed under the assumption that most of the system functionsnormally most of the time.