Keeping the pipeline busy requires that the processor begin executing a second instruction before the
first has traveled completely through the pipeline. However, suppose a program has an instruction
that requires summing three numbers:
X = A + B + C
If the processor already has A and B stored in registers but needs to get C from memory, this causes a
"bubble," or stall, in the pipeline in which the processor cannot execute the instruction until it obtains
the value for C from memory. This bubble must move all the way through the pipeline, forcing each
stage that contains the bubble to sit idle, wasting execution resources during that clock cycle. Clearly,
the longer the pipeline, the more significant this problem becomes.
Processor stalls often occur as a result of one instruction being dependent on another. If the program
has a branch, such as an IF–THEN loop, the processor has two options. The processor either waits for
the critical instruction to finish (stalling the pipeline) before deciding which program branch to take, or
it predicts which branch the program will follow.
If the processor predicts the wrong code branch, it must flush the pipeline and start over again with
the IF–THEN statement using the correct branch. The longer the pipeline, the higher the performance
cost for branch mispredicts. For example, the longer the pipeline, the more the processor must execute
speculative instructions that must be discarded when a mispredict occurs. Specific to the NetBurst
design was an improved branch-prediction algorithm aided by a large branch target array that stored
branch predictions.
Hyper-Threading Technology
Intel Hyper-Threading (HT) Technology is a design enhancement for server environments. It takes
advantage of the fact that, according to Intel estimates, the utilization rate for the execution units in a
NetBurst processor is typically only about 35 percent. To improve the utilization rate, HT Technology
adds Multi-Thread-Level Parallelism (MTLP) to the design. In essence, MTLP means that the core
receives two instruction streams from the operating system (OS) to take advantage of idle cycles on
the execution units of the processor. For one physical processor to appear as two distinct processors
to the OS, the design replicates the pieces of the processor with which the OS interacts to create two
logical processors in one package. These replicated components include the instruction pointer, the
interrupt controller, and other general-purpose registers―all of which are collectively referred to as
the Architectural State, or AS (see Figure 5).
Figure 5. Hyper-Threading Technology
IA-32 Processor with
Hyper-thread Technology
AS1
AS2
Processor
Core
Logical
Logical
processor
processor
System Bus
Traditional Dual-processor
(D) System
AS
AS
Processor
Processor
Core
Core
System Bus
7