Technical Highlights

CloudEngine DC switches are designed to support the development trend of DCNs over the next 2-10 years. This includes supporting the Ethernet 400GE standard, telemetry (a next-generation DC network monitoring technology), Layer 2 switching, Layer 3 routing, and other basic technologies in the TCP/IP protocol stack, as well as network management and O&M.

Evolution from 25GE, 100GE, to 400GE

Ethernet has transitioned rapidly from supporting rates of 1 Mbit/s, 10 Mbit/s, and 100 Mbit/s (FE), to supporting rates of 1 Gbit/s (GE), 10 Gbit/s (10GE), 40 Gbit/s (40GE), and even 100 Gbit/s (100GE). Network traffic has grown at a rapid pace as services such as big data, smart city, mobile Internet, and cloud computing gain popularity and continue to develop. The increasing demand for bandwidth drives standards organizations to not only optimize current standards, but also push the limits of what is currently possible.

Figure 12.1 shows the physical networking architecture of mainstream DCNs. The spine-leaf architecture is generally used, with 10GE interfaces functioning as downlink interfaces for server access and 40GE interfaces functioning as uplink interfaces on leaf nodes.

To meet the growing service requirements, backbone switches are transitioned to 100GE. The server performance is not a bottleneck, as it can keep pace with these requirements. For example, the network throughput of a mid-range x86 server can reach 20 Gbit/s, and that of a high-end server can reach 40 Gbit/s or even 80 Gbit/s. The 25GE standard was developed to help increase the access rate of DC servers.

Typical network deployment in a DC

FIGURE 12.1 Typical network deployment in a DC.

1. 25GE standard

At the IEEE conference held in Beijing in 2014, Microsoft first proposed the 25GE project initiation request for specific TOR and server interconnection scenarios. The IEEE conference rejected this request because it was assumed that 25GE would diversify industry investment and hinder industry development. The 25GE standard can resolve a series of issues, such as CPU performance matching and Peripheral Component Interconnect Express (PCIe) link width matching, caused by the transition of server NICs from 10-40 Gbit/s. Vendors including Microsoft and Qualcomm established the 25GE Ethernet Consortium and invested in the 25GE research. To avoid 25GE becoming simply a de-facto standard, the IEEE passed the 25GE project in July 2014. Through the development of networks, it has since been proven that the 25GE standard can economically and efficiently expand network bandwidth, provide better support for next-generation servers and storage solutions, and cover more interconnection scenarios in the future.

The 25GE standard will be mainly applied to server access in next- generation DCs. The reasons for using 25 Gbit/s instead of 40 Gbit/s are as follows:

1. Advantages in technology implementation

Serializer/deserializer (SerDes) is essential in 25GE and has been widely used in various circuit and optical fiber communication technologies. For example, SerDes is commonly used in

Components of the Cloud DCN Solution ■ 431

high-speed communication scenarios such as interconnection between the PCIe bus in a computer and the NIC and internal chip of a switch. The serializer receives and serializes transmitted data, and the deserializer at the receiver reconstructs the serial bit stream and converts it into the required data. The number of SerDes connections required by a switch port is called the number of lanes.

SerDes has been developed over a number of years, and its speed can reach 25 Gbit/s. Only one 25 Gbit/s SerDes lane is required for all end-to-end connections when traffic is transmitted from one 25 Gbit/s NIC to another 25 Gbit/s NIC through a switch. A 40GE interface uses the Quad Small Form-factor Pluggable (QSFP) form factor and is constructed of four parallel 10GE links (each of which uses 12.5 GHz SerDes), requiring four SerDes lanes.

At the aggregation and backbone layers, 100GE interfaces have gradually become mainstream. Mature IEEE 100 Gbit/s Ethernet standards have been developed and are implemented by running four 25 Gbit/s lanes (in compliance with IEEE 802.3bj) on four optical fibers or copper cables. This lays a foundation for the 25GE standard. A 100GE interface, which has significant advantages over a 40GE interface in terms of speed, can be connected to the remote four 25GE interfaces through a QSFP28 to SFP28 l-to-4 cable.

2. Improved switch performance

Compared with the existing 10GE solution, the singlelane 25GE solution improves the performance by 2.5 times. Additionally, 25GE ensures that switches have higher port density than 40GE used for rack server connections, as shown in Table 12.1.

TABLE 12.1 Technical Specifications Comparison between the 25GE and 40GE Standards






Factor Type



Number of Optical Fibers













3. Smooth evolution of the existing topology, reducing costs

Capital expenditure (CapEx) is a key factor that needs to be considered when a new DC technology is used. One of the most important aspects of DC deployment is cabling, which can be a complex and difficult part of DC management.

  • 40GE switches use QSFP+ optical modules. For connections inside a cabinet or between adjacent cabinets, QSFP+ DAC cables can be used. For connections spanning longer distances, MPO cables must be used. In most cases, 12-core optical fibers are used for QSFP+ optical modules. Compared with 10GE interfaces that use 2-core LC-connector optical fibers, 40GE interfaces have much higher costs and are incompatible with 10GE interfaces. If 10GE interfaces are upgraded to 40GE ones, the existing optical fibers must be replaced with MPO cables.
  • 25GE switches use SFP28 optical modules, which are similar to 10GE SFP optical modules in that they use single-lane connections and are compatible with optical fibers using LC connectors. 10GE can be seamlessly upgraded to 25GE, removing the need to re-plan the topology or redeploy cables. (Seamless upgrade is not possible from 10GE to 40GE.) After switches are upgraded or have 25GE optical modules installed, the switches require no further configuration, saving labor resources.

The 25GE standard can use the existing 10GE topology for smooth evolution. This not only reduces the need to purchase new TOR switches and cables, but also reduces the requirements for power supply, heat dissipation, and space when compared with 40GE for rack server connections.

As mentioned earlier, mainstream DCNs typically use the 10GE/40GE network architecture. To support large-scale deployment of services such as AI computing and cloud DCNs, the next-generation DCN architecture is evolving to the 25GE/100GE network architecture. Some Internet giants have already implemented large-scale deployment of 25GE/100GE networks, redefining the edge of DCNs. It is possible that 25GE/100GE will become the mainstream trend before long.

2. 100GE standard

To understand the 100GE standard, it is necessary to know about the standardization of optical modules. Initially, each vendor used its own form factor and dimensions for optical modules, as no industry standard yet existed. The IEEE 802.3 Working Group played an important role in standardizing optical modules, and the Multi Source Agreement (MSA), which is an industry alliance, defined the unified optical module structure (including the form factor type, dimensions, and pin assignment) based on the IEEE standards.

In 2016, the IEEE established a working group to study the next- generation 100GE standard and then in 2012 released multiple 100GE standards. The IEEE and MSA have defined more than 10 100GE standards, covering 100GE uplink scenarios that span different distances. Table 12.2 compares these standards.

The 100GBASE series standards are formulated by IEEE 802.3. Figure 12.2 shows the naming conventions, and Table 12.3 describes them.

The transmission features of optical fibers and the manufacturing costs of optical modules determine the application of these

TABLE 12.2 Comparison of Applications of 100GE Optical Module Standards




Optical Fiber Type

Transmission Distance






Multi-mode optical fiber with a center wavelength of 850 nm

Optical multi-mode 2 (OM2) fiber: 30 m; optical multi-mode 3 (OM3) fiber: 100 m; optical multi-mode 4 (OM4) fiber: 150 m





Single-mode optical fiber with a center wavelength of 1295.56-1309.14nm

Single-mode (G.652) optical fiber: 10 km





Single-mode optical fiber with a center wavelength of 1295.56-1309.14nm

Single-mode (G.652) optical fiber: 40 km




8-core/12- core MPO

Multi-mode optical fiber with a center wavelength of 850 nm

Optical multi-mode (OM3) fiber: 70 m Optical multi-mode (OM4) fiber: 100 m

100G PSM4




Single-mode optical fiber with a center wavelength of 1310 nm

Single-mode (G.652) optical fiber: 500 m

100G CWDM4



Single-mode optical fiber with a center wavelength of 1310 nm

Single-mode (G.652) optical fiber: 2 km

Naming conventions of 100GBASE series standards. TABLE 12.3 Naming Conventions of 100GBASE Series Standards

FIGURE 12.2 Naming conventions of 100GBASE series standards. TABLE 12.3 Naming Conventions of 100GBASE Series Standards






Rate standard. In this example, the value is 100 Gbit/s




PMD type, indicating the transmission distance


The transmission distance is at the 10 cm level. К indicates backplane, that is, the signal transmission distance between backplanes


The transmission distance is at the meter level. C indicates copper and high-speed cable connections


The transmission distance is at the 10 m level. S indicates short-distance transmission, which is often provided by multi-mode optical fibers


The transmission distance is 500 m


The transmission distance is 2 km


The transmission distance is 10 km. L indicates longdistance transmission


The transmission distance is 40 km. E indicates extended


The transmission distance is 80 km


Number of lanes

Number of SerDes lanes occupied by 100GE


Four SerDes lanes (4x25GE) are occupied


Ten SerDes lanes (lOx 10GE) are occupied

standards in different scenarios. For example, multi-mode optical fibers are commonly used for short-distance transmission, and single-mode optical fibers are commonly used for long-distance transmission. In DCs, the IEEE 100GBASE series standards can support both long- and short-distance transmission. Among these standards, 100GBASE-SR4 and 100GBASE-LR4 are used most often. However, in most DC interconnection scenarios, the transmission distance supported by 100GBASE-SR4 is too short, and the cost of

100GBASE-LR4 is too high. For medium-distance transmission, the Parallel Single Mode 4-channel (PSM4) and Coarse Wavelength Division Multiplexing 4-lane (CWDM4) standards proposed by the MSA are a cost-effective option.

CWDM4 multiplexes four parallel 25 Gbit/s lanes to a 100 Gbit/s fiber link by using the optical components MUX and DEMUX. This is similar to LR4. The differences are as follows:

1. Different channel spacings

CWDM4 defines a channel spacing of 20 nm, whereas LR4 defines a LAN-WDM spacing of 4.5 nm. A larger channel spacing results in lower requirements for optical components and lower costs.

2. Different lasers

CWDM4 uses a single direct modulated laser (DML), whereas LR4 uses an Electro-absorption Modulated Laser (EML), which is composed of a DML and an electro-absorption modulator (EAM).

3. Different temperature control requirements

The lasers used in LR4 require a Thermo Electric Cooler (TEC) driver chip because of the smaller channel spacing.

Due to the preceding differences, the cost of 100GBASE-LR4 optical modules is higher than that of 100G CWDM4 optical modules. In addition to CWDM4, PSM4 is another choice for medium-distance transmission. The 100G PSM4 specification defines requirements for a point-to-point 100 Gbit/s link over eight single-mode fibers (4 transmit and 4 receive), each transmitting at 25 Gbit/s.

Because CWDM4 uses a wavelength division multiplexer, the optical module cost of CWDM4 is higher than that of PSM4. CWDM4, however, requires only two single-mode optical fibers to transmit and receive signals, whereas PSM4 requires eight. As a result, the optical module cost of PSM4 increases as the transmission distance increases.

A complete optical module solution includes both the optical/ electrical interface standards of optical modules and the matching form factor. The development of optical modules is characterized by high rate, high density, low cost, and low power consumption. For example, the C-form factor pluggable (CFP)

Form Factor



Channel x Rate



32 W

10 x 10 Gbit/s or 4x25 Gbit/s

Large size, high power consumption, and long transmission distance


12 W

4x25 Gbit/s

Large size, high power consumption, and long transmission distance



4x25 Gbit/s

Medium size and low power consumption


3.5 W

4x25 Gbit/s

Small size and low power consumption

was the first type of module used, but as the integration of modules improved and dimension restrictions became more important, CFP evolved to CFP2 and CFP4, and then to the popular QSFP28 (Table 12.4).

After several generations of development, 100GE optical modules have become mature. New 100G MSA standards are also established to support the application of new technologies and new development directions, as well as promoting sustainable development of the related industry chain. It has become a never-ending challenge to achieve higher bandwidth and lower latency. As related technologies are introduced, the traditional analog optical transmission system has evolved to digital optical transmission, leveraging sophisticated chip technology and improved chip processing capabilities that make 400GE possible.

3. 400GE standard

The IEEE 802.3 Working Group is responsible for defining the 400GE standard. In 2013, the IEEE initiated the 400GE standard project and started the study group to discuss the 400GE specifications. It officially released the 400GE and 200GE standards (IEEE 802.3bs) after numerous meetings and in-depth research. The key aspects of these standards are the hierarchical structure, FEC specifications, and physical optical interface transmission mechanism. Table 12.5 describes the transmission distance and encoding scheme of each 400GE standard.

Components of the Cloud DCN Solution ■ 437

TABLE 12.5 400GE Transmission Distance and Encoding Scheme


Transmission Distance

Encoding Mode



16x25 Gbit/s NRZ


500 m

4X100 Gbit/s PAM4


2 km

8x50 Gbit/s PAM4


10 km

8x50 Gbit/s PAM4

The 400GBASE-DR4, 400GBASE-FR8, and 400GBASE-LR8 standards using PAM4 encoding have become the focus of attention, while 400GBASE-SR16, a multimode standard, has received little attention. PAM4 differs from non-return-to-zero (NRZ) signal transmission, which was commonly used in the 100GE standard. A PAM4 signal uses four signal levels for transmission. Within each clock period, 2-bit logic information, that is, 00, 01, 10, and 11, can be transmitted. Conversely, an NRZ signal uses two signal levels — high and low — to represent 1 and 0 in digital logic signals. Given the same baud rate, PAM4 achieves double the transmission efficiency compared with NRZ. As a result of its higher efficiency, PAM4 was defined as the 400GE electrical signal standards by the IEEE.

Because the SerDes rate can reach 25 Gbit/s, and the bit rate can reach 50 Gbit/s through PAM4 modulation, IEEE 802.3 selected 50 Gbps/lane PAM4 as the encoding technology for 400GE and 200GE interfaces. Figure 12.3 shows the structure of a 400GE CFP8 optical module.

  • 400GE MAC uses 400G AUI-16 interfaces (16x25 Gbit/s) to transmit data to the 400GE CFP8 optical module through the physical coding sublayer (PCS) and physical medium attachment sublayer (PMA).
  • • During data transmission, the PMA aggregates 16 lanes into 8 lanes and converts the modulation code to PAM4 with a rate of 25 Gbit/s (the actual value is 50 Gbit/s).
  • • The data is then transmitted to the physical medium-dependent sublayer (PMD). The transmitter and receiver at the PMD implement electrical-to-optical and optical-to-electrical conversion, respectively.
GE CFP8 optical module

FIGURE 12.3 400GE CFP8 optical module.

  • • In the transmit direction, eight transmitters convert electrical signals into optical ones, and the signals from the eight lanes are then multiplexed to one 400G link through a single-mode optical fiber.
  • • In the receive direction, the optical fiber demultiplexes the eight wavelengths on the optical fiber to eight receivers for optical-to- electrical conversion. The converted signals are then sent to the PMA.

CFP8 optical modules are not suitable for DCNs, because they are power-hungry and too large. Instead, the Quad Small Form Pluggable (QSFP) and QSFP-Double Density (DD) form factors are used. Table 12.6 compares these modules.

400GE QSFP-DD and QSFP optical modules use CDAUI-8 and PAM4 for electrical-layer lanes and use 56 Gbit/s laser technology

TABLE 12.6 Comparison of 400GE Optical Module Form Factors

Form Factor

Dimensions (mm)


Consumption (W)

Electrical Interface Lanes


Interface Lanes




16x25 Gbit/s 8x56 Gbit/s

8x56 Gbit/s




8x56 Gbit/s

8x56 Gbit/s




8x56 Gbit/s

4x 100 Gbit/s

(4x100 Gbit/s PAM4) or mature 28 Gbit/s laser technology (8x50 Gbit/s) for optical-layer lanes.

At present, vendors tend to use 4x56 Gbit/s PAM4 for electrical components and use existing 28 Gbit/s components for optical components. Although standardization of 200GE started later than that of 400GE, it is easy to implement 200GE optical modules. In DC scenarios, it is likely that 200GE optical modules were being used before 400GE optical modules, leading to the maturity and large- scale development of 200GE optical modules. As more in-depth research is conducted for single-wavelength 100GE technologies, it is expected that short-distance 400GE optical modules will become more popular in DC scenarios.

< Prev   CONTENTS   Source   Next >