<architecture> A sequence of functional units ("stages") which performs a task in several steps, like an assembly line in a factory. Each functional unit takes inputs and produces outputs which are stored in its output buffer. One stage's output buffer is the next stage's input buffer. This arrangement allows all the stages to work in parallel thus giving greater throughput than if each input had to pass through the whole pipeline before the next input could enter.
The costs are greater latency and complexity due to the need to synchronise the stages in some way so that different inputs do not interfere. The pipeline will only work at full efficiency if it can be filled and emptied at the same rate that it can process.
Pipelines may be synchronous or asynchronous. A synchronous pipeline has a master clock and each stage must complete its work within one cycle. The minimum clock period is thus determined by the slowest stage. An asynchronous pipeline requires handshaking between stages so that a new output is not written to the interstage buffer before the previous one has been used.
Many CPUs are arranged as one or more pipelines, with different stages performing tasks such as fetch instruction, decode instruction, fetch arguments, arithmetic operations, store results. For maximum performance, these rely on a continuous stream of instructions fetched from sequential locations in memory. Pipelining is often combined with instruction prefetch in an attempt to keep the pipeline busy.
When a branch is taken, the contents of early stages will contain instructions from locations after the branch which should not be executed. The pipeline then has to be flushed and reloaded. This is known as a pipeline break.