Intel Atom Processor Architecture

Until the Intel Atom Clover Trail platform, the Intel Atom processor is based on a microarchitecture code-named Saltwell that applies the two-issue wide and in-order pipeline; it also supports Intel Hyper-Threading Technology. The microarchitecture is shown in Figure 2-1.

Figure 2-1. Intel Atom architecture

The front-end area is an optimized pipeline, including

• 32 KB, 8-way set-associative, L1 Cache

• Branch-prediction unit and instant translation look-aside buffer (ITLB)

• Two instruction decoders, each of which decodes two instructions at most per cycle

In each cycle, the front end may transmit two instructions at most to the instruction queue for scheduling. Also in each cycle, the scheduler may transmit two instructions

at most to the integer or SIMD/floating-point execution area through the two-way port. (Single instruction, multiple data [SIMD]) is introduced in the next section.)

The ports for the integer or SIMD/floating-point areas have the following binding features:

Integer execution area

1. Port 0: Arithmetic logic unit 0 (ALU0), shift/rotate unit, and load/store unit.

2. Port 1: Arithmetic logic unit 1, bit-processing unit, jump unit, and LEA.

3. Effective waiting time of “load-to-use” in cycle 0.

SIMD/floating-point execution area

4. Port 0: SIMD arithmetic logic unit, shuffle unit, SIMD/ floating-point multiplication unit, and division unit.

5. Port 1: SIMD arithmetic logic unit and floating-point adder.

6. In the SIMD/floating-point execution areas, the SIMD arithmetic logic unit and shuffling unit are 128 bits wide, but the 64-bit integer SIMD calculation is limited to port 0.

7. The floating-point adder can perform Add packed

single-precision (ADDPS)/ Subtract packed single-precision (SUBPS) in the 128-bit data path, whereas other floating-point addition operations are performed in the 64-bit data path.

8. The security-instruction-recognition algorithm of floating-point/SIMD operations can directly execute new,

shorter integer arithmetic instructions without waiting for old floating-point/SIMD instructions (which may cause some abnormality).

9. The floating-point multiplication pipeline also supports the storage load.

10. The floating-point addition instruction with load/store reference is distributed through two ports.

The instruction queue conducts the static partition in order to schedule the execution instructions from the two threads. The scheduler can select an instruction from two threads and assign them to port 0 or port 1 for the execution. The hardware selects the pre-fetch/decode/dispatch on the two threads and performs the next execution based on the readiness of each thread.

Silvermont: Next-Generation Microarchitecture

Intel's Silvermont microarchitecture was designed and co-optimized with Intel's 22 nm SoC process using 3D tri-gate transistors. By taking advantage of this industry-leading technology, Silvermont microarchitecture includes

• A new out-of-order execution engine that enables best-in-class, single-threaded performance.

• A new multi-core and system fabric architecture scalable up to eight cores and enabling greater performance for higher bandwidth, lower latency, and more efficient out-of-order support for a more balanced and responsive system.

• New Intel architecture instructions and technologies bringing enhanced performance, virtualization, and security management capabilities to support a wide range of products. These instructions build on Intel's existing support for 64-bit and the breadth of the Intel architecture software installed base.

• Enhanced power-management capabilities including a new intelligent burst technology, low-power C states, and a wider dynamic range of operation taking advantage of Intel's 3D transistors. Intel Burst Technology 2.0 support for singleand multi-core offers great responsiveness scaled for power efficiency.

The microarchitecture is shown in Figure 2-2.

Figure 2-2. Silvermont microarchitecture

Silvermont provides the following benefits and features:

High performance without sacrificing power efficiency: Out-oforder execution pipeline, macro-operation execution pipeline with improved instruction latencies and throughput, and smart pipeline resource management

Power and performance: Efficient branch processing, accurate branch predictors, and fast-recover pipeline

Faster and more efficient access to memory: Low latency,

high-bandwidth caches, out-of-order memory transactions, and multiple advanced hardware prefetchers, balanced-core, and memory subsystems

< Prev   CONTENTS   Next >