Chinaunix首页 | 论坛 | 博客
  • 博客访问: 3820971
  • 博文数量: 197
  • 博客积分: 10086
  • 博客等级: 上将
  • 技术积分: 5145
  • 用 户 组: 普通用户
  • 注册时间: 2007-05-13 10:50
文章分类

全部博文(197)

文章存档

2011年(2)

2009年(30)

2008年(165)

我的朋友
X86

分类:

2008-06-12 09:36:29

x86 architecture
From Wikipedia, the free encyclopedia
  (Redirected from X86)
Jump to: navigation, search

Intel Core 2 Duo, an example of a state-of-the-art x86 compatible multi-core processor.
Intel Core 2 Duo, an example of a state-of-the-art x86 compatible multi-core processor.
AMD Athlon (early version), another technically different, but fully compatible x86 implementation.
AMD Athlon (early version), another technically different, but fully compatible x86 implementation.
The generic term x86 refers to the instruction set of the most commercially successful CPU architecture in the history of personal computing.[1] It is used in processors from Intel, AMD, VIA, and others, and derived from the model numbers of the first few generations of processors, backward compatible with Intel's original 16-bit 8086 CPU, most of which were ending in 86.[2] Since then, many additions and extensions have been added to the x86 instruction set, almost consistently with full backwards compatibility.

As the x86 term became common after the introduction of the 80386, it usually implies a binary compatibility with the 32-bit instruction set of the 80386. This may sometimes be emphasized as x86-32 to distinguish it either from the original 16-bit x86-16 or from the newer 64-bit x86-64 (also called x64).[3]

Modern x86-hardware usually has 64-bit capabilities, at least in personal computers and servers. However, to avoid compatibility problems, x86-software usually implies only 32-bit, with the term x86-64 or x64 reserved to denote 64-bit software.[4][5]

Today, the x86 architecture is ubiquitous among desktop and notebook processors, as well as a growing majority among servers and workstations. With the exception of AMD's Geode CPU and Intel's new Silverthorne CPU, the x86 processor architecture is generally uncommon in embedded systems, and niches such as appliances and toys lack any significant x86 presence.[6] A large amount of computer software supports the platform, including operating systems such as MS-DOS, Windows, Linux, BSD, Solaris, and Mac OS X.

Contrary to popular belief, x86 architecture is not completely synonymous with IBM PC compatibility, as IBM PC compatibles use a multitude of other hardware in order to work properly, albeit with some of it standardized. For instance, the original Xbox game console used an x86 chip, but its hardware was different from the typical PC, and the Xbox controllers did not conform to keyboard or joystick scancode standards. Also, the GRID Compass laptop (one of the first on the market) used an x86 chip before the IBM PC compatible market even started.
Contents
[hide]

    * 1 Chronology
    * 2 History
    * 3 Overview
          o 3.1 Basic properties of the architecture
          o 3.2 Current implementations
    * 4 Segmentation
    * 5 Addressing modes
    * 6 x86 registers
          o 6.1 16-bit
          o 6.2 32-bit
          o 6.3 64-bit
          o 6.4 Miscellaneous/Special Purpose
          o 6.5 Purpose
          o 6.6 Structure
    * 7 Operating modes
          o 7.1 Real mode
          o 7.2 Protected mode
                + 7.2.1 Virtual 8086 mode
          o 7.3 64-bit Long mode
    * 8 Extensions
          o 8.1 Floating point unit
          o 8.2 MMX
          o 8.3 3DNow!
          o 8.4 SSE
          o 8.5 Physical Address Extension (PAE)
    * 9 Virtualization
    * 10 See also
    * 11 Footnotes
    * 12 References
    * 13 External links

[edit] Chronology

The table below lists brands of common[7] consumer targeted processors implementing the x86 instruction set, grouped by generations that highlight important points in x86 history. Note: CPU generations are not strict - each generation is roughly marked by significantly improved or commercially successful processor microarchitecture designs.
Generation     First introduced     Prominent Consumer CPU brands     linear / physical address space     Notable (new) features
1 (IA-16)     1978     Intel 8086, Intel 8088     16-bit / 20-bit (segmented)     first x86 microprocessors
2     1982     Intel 80186, Intel 80188, NEC V20     see above     hardware for fast address calculations, fast mul/div etc
2     1982     Intel 80286     16-bit (30-bit virtual) / 24-bit (segmented)     MMU, for protected mode and a larger address space
3 (IA-32)     1985     Intel386, AMD Am386     32-bit (46-bit virtual) / 32-bit     32-bit instruction set, MMU with paging
4     1989     Intel486     see above     risc-like pipelining, integrated FPU, on-chip cache
5     1993     Pentium, Pentium MMX     see above     superscalar, 64-bit databus, faster FPU, MMX
5/6     1996     Cyrix 6x86, Cyrix MII     see above     register renaming, speculative execution
6     1995     Pentium Pro, AMD K5     see above / 36-bit physical (PAE)     μ-op translation, PAE (not K5), integrated L2 cache (not K5)
6     1997     AMD K6/-2/3, Pentium II/III     see above     L3-cache support, 3D Now, SSE
7     1999     Athlon, Athlon XP     see above     superscalar FPU, wide design (up to three x86 instr./clock)
7     2000     Pentium 4     see above     deeply pipelined, high frequency, SSE2, hyper-threading
6/7-M     2003     Pentium M     see above     optimized for low power
8 (x86-64)     2003     Athlon 64     64-bit / 40-bit physical in first impl.     x86-64 instruction set, on-die memory controller, hypertransport
8     2004     Prescott     see above     very deeply pipelined, very high frequency, SSE3
9     2006     Intel Core, Intel Core 2     see above (some are 32-bit only)     low power, multi-core, lower clock frequency, SSE4
10     2007-2008     AMD Phenom     see above     monolithic quad-core, 128 bit FPUs, SSE4a Hyper Transport 3, native memory controller, on-die L3 cache

[edit] History

The x86 architecture first appeared as the Intel 8086 CPU released in 1978, a fully 16-bit design based on the earlier Intel 8085. Although not binary compatible, it was designed to allow assembly language programs written for the 8085 be mechanically translated into the equivalent 8086 assembly. This made the new processor a tempting migration path for 8085 hardware and software vendors, but - mainly due to a wider databus - not without significant redesign of system hardware. To address this, Intel introduced the almost identical, but externally 8-bit, 8088 which permitted simpler printed circuit boards, demanded fewer (1-bit wide) DRAM chips, and more easily could be interfaced to already established (i.e. low-cost) 8-bit system and peripheral chips. Among other, non technical, factors, this contributed to the fact that IBM built their IBM PC around the 8088, despite a presence of (at the time) better 16-bit microprocessors from Motorola, Zilog, and National Semiconductor. Subsequently, the IBM PC became the dominant personal computer platform and the 8088 and its successors became the dominant CPU architecture for desktop and laptop computers.

At various times, companies such as IBM, NEC, AMD, TI, STM, Fujitsu, OKI, Siemens, Cyrix, Intersil, C&T, NexGen, and UMC started to design and/or manufacture x86 processors intended for personal computers as well as embedded systems. Such x86 implementations are seldom plain copies but often employ different internal microarchitectures as well as different solutions at the electronic and physical levels. Quite naturally, early compatible chips were 16-bit, while 32-bit designs appeared much later. For the personal computer market, real quantities started to appear around 1990 with i386 and i486 compatible processors, often named similarly to Intel's original chips. Other companies, which designed or manufactured x86 or x87 processors, include ITT Corporation, National Semiconductor, ULSI System Technology, and Weitek.

Following the fully pipelined i486, Intel introduced the Pentium brand name (which, unlike numbers, could be trademarked) for their new line of superscalar x86 designs. With the x86 naming scheme now legally cleared, IBM partnered with Cyrix to produce the 5x86 and then the very efficient 6x86 (M1) and 6x86MX (MII) lines of Cyrix designs, which were the first x86 chips implementing register renaming to enable speculative execution. AMD meanwhile designed and manufactured the advanced but delayed 5k86 (K5), which, internally, was heavily based on AMD's earlier 29K RISC design; similar to NexGen's Nx586, it used a strategy where dedicated pipeline stages decode x86 instructions into uniform and easily handled micro-operations, a method that has remained standard to this day.

Some early versions of these chips had heat dissipation problems. The 6x86 was also affected by a few minor compatibility issues, the lacked an and (the then crucial) pin-compatibility, while the had somewhat disappointing performance when it was (eventually) launched. A low customer awareness of alternatives to the Pentium line further contributed to these designs being comparatively unsuccessful, despite the fact that the had very good Pentium compatibility and the was significantly faster than the Pentium on integer code. later managed to establish itself as a serious contender with the line of processors, which gave way to the highly successful and . There were also other contenders, such as , (), , and . ' energy efficient and processors were designed by and are in full production today.

The architecture has twice been extended to a larger . In 1985, Intel released the 32-bit 386 to gradually replace the earlier 16-bit chips (which were sold for many more years). This extension to the architecture is sometimes called x86-32 to differentiate it from the original "x86-16" or the newer extension. However, it was originally referred to as i386 by Intel (and others) and later renamed (for Intel Architecture-32-bit) when Intel unveiled its unrelated 64-bit architecture, referred to as . In 1999-2003, further extended the architecture to 64 bits, originally called in AMD documents, but now . Intel soon adopted AMD's architectural extensions under the name which was later renamed and finally (not to be confused with the unrelated architecture). and have used their own vendor-neutral for this same architecture.

[] Overview

[] Basic properties of the architecture

The x86 architecture is a variable instruction length, primarily two-address, "" design with emphasis on . The instruction set is not typical CISC however, but basically an extended and orthogonalized version of the simple eight-bit architecture. Words are stored in order and 16-bit and 32-bit accesses are allowed to unaligned memory addresses.

To conserve opcode space, most register-addresses are three bits, and at most one operand can be in memory (some highly orthogonal "CISC" designs, such as the , may use two memory operands), but this memory operand may also be the destination, while the other operand, the source, can be either register or immediate. This contributes, among other factors, to a code footprint that rivals eight-bit machines and enables efficient use of instruction cache memory. The relatively small number of general registers (also inherited from 8085) has made register-relative addressing (using small immediate offsets) an important method of accessing operands, especially on the stack. Much work has therefore been invested in making such accesses as fast as register accesses, i.e. a one cycle instruction throughput in most circumstances.

[] Current implementations

During execution, current x86 processors employ extra decoding steps to split most instructions into smaller pieces. These are then handed to a control unit that buffers and schedules them in compliance with x86 semantics so that they can be executed by one of three or four execution engines. Furthermore modern design are usually and features out of order execution, which means they can execute multiple x86 instructions simultaneously and not necessarily in the same order as given in the instruction stream.

When Intel first introduced this design approach with the Pentium Pro they referred to it as a "RISC Core", but soon dropped that term. The approach is similar to the traditional used in their earlier x86 designs but differs mainly in the fact that the translation from the external instruction set to the internal occurs asynchronously. Not having to synchronize the CPU's internals with the decode steps relieves a burden on the CPU designers.

does not use this approach in their x86 compatible CPUs. Instead they use a translation engine to convert x86 instructions to the CPU's native instructions. Transmeta argues that their approach allows for more power efficient designs since the CPU can forgo the complicated decode step of more traditional x86 implementations.

[] Segmentation

Further information:

Minicomputers during the late 1970s were running up against the 16-bit 64- address limit, as memory had become cheaper. Most such companies therefore redesigned their processors to directly handle 32-bit addressing and data. The original 8086, developed from the simple microprocessor and primarily aiming at another market, instead adopted a much-criticized concept of segment registers which raised the memory address limit by only 4 bits, to 20 bits (1 ).

Data and/or code could be managed within "near" 16-bit segments within this 1 address space, or a compiler could operate in a "far" mode using 32-bit segment:offset pairs reaching (only) 1 MiB. While that would also prove to be quite limiting by the mid-1980s, it was working for the emerging PC market, and made it very simple to translate software from the older , , and to the newer processor. Seven years later, in 1985, this cumbersome addressing model was effectively factored out by the introduction of 32-bit offset registers, in the design.

In , segmentation is achieved by the segment address left by 4 bits and adding an offset in order to receive a final 20-bit address. For example, if DS is A000h and SI is 5677h, DS:SI will point at the absolute address DS × 16 + SI = A5677h. Thus the total address space in real mode is 220 bytes, or 1 , quite an impressive figure for 1978. All memory addresses consist of both a segment and offset; every type of access (code, data, or stack) has a default segment register associated with it (for data the register is usually DS, for code it is CS, and for stack it is SS). For data accesses, the segment register can be explicitly specified (using a segment override prefix) to use any of the four segment registers.

In this scheme, two different segment/offset pairs can point at a single absolute location. Thus, if DS is A111h and SI is 4567h, DS:SI will point at the same A5677h as above. This scheme makes it impossible to use more than four segments at once. CS and SS are vital for the correct functioning of the program, so that only DS and ES can be used to point to data segments outside the program (or, more precisely, outside the currently-executing segment of the program) or the stack. This scheme was intended as a compatibility measure with the .

In , a segment register no longer contains the physical address of the beginning of a segment, but contain a "selector" that points to a system-level structure called a segment descriptor. A segment descriptor contains the physical address of the beginning of the segment, the length of the segment, and access permissions to that segment. The offset is checked against the length of the segment, with offsets referring to locations outside the segment causing an exception. Offsets referring to locations inside the segment are combined with the physical address of the beginning of the segment to get the physical address corresponding to that offset.

The segmented nature can make programming and compilers design difficult because the use of near and far pointers affect performance. The introduction of bank switching schemes such as EEMS made programming even more complicated before the adoption of 32 bit addressing methods with later processors.

[] Addressing modes

Addressing modes for 16-bit x86 processors can be summarized by this formula:

\begin{Bmatrix}CS:\\DS:\\SS:\\ES:\end{Bmatrix}
\begin{bmatrix}\begin{Bmatrix}IP\\BX\\BP\end{Bmatrix}\end{bmatrix} + 
\begin{bmatrix}\begin{Bmatrix}SI\\DI\end{Bmatrix}\end{bmatrix} +
[displacement]

Addressing modes for 32-bit code on 32-bit or 64-bit x86 processors can be summarized by this formula:

\begin{Bmatrix}CS:\\DS:\\SS:\\ES:\\FS:\\GS:\end{Bmatrix}
\begin{bmatrix}\begin{Bmatrix}EAX\\EBX\\ECX\\EDX\\ESP\\EBP\\ESI\\EDI\end{Bmatrix}\end{bmatrix} + 
\begin{bmatrix}\begin{Bmatrix}EAX\\EBX\\ECX\\EDX\\EBP\\ESI\\EDI\end{Bmatrix}*\begin{Bmatrix}1\\2\\4\\8\end{Bmatrix}\end{bmatrix} +
[displacement]

Addressing modes for 64-bit code on 64-bit x86 processors can be summarized by these formulas:

\begin{Bmatrix}:\\FS:\\GS:\end{Bmatrix}
\begin{bmatrix}generalregister\end{bmatrix} + 
\begin{bmatrix}generalregister*\begin{Bmatrix}1\\2\\4\\8\end{Bmatrix}\end{bmatrix} +
[displacement]

and

RIP + [displacement]

The 8086 had 64 KiB of 8-bit (or alternatively 32 K-word of 16-bit) space, and a 64 KB (one segment) in memory supported by . Only words (2 bytes) can be pushed to the stack. The stack grows downwards (toward numerically lower addresses), its bottom being pointed by SS:SP. There are 256 , which can be invoked by both hardware and software. The interrupts can cascade, using the stack to store the .

[] x86 registers

For a description of the general notion of a CPU register, see .

[] 16-bit

The original Intel and have fourteen 16- registers. Four of them (AX, BX, CX, DX) are general registers (although each may have an additional purpose; for example only CX can be used as a counter with the loop instruction). Each can be accessed as two separate bytes (thus BX's high byte can be accessed as BH and low byte as BL). Four segment registers (CS, DS, SS and ES) are used to form a memory address. There are two pointer registers. SP points to the bottom of the stack and BP which is used to point at some other place in the stack or the memory(Offset). Two registers (SI and DI) are for array indexing. The contains such as , and . Finally, the instruction pointer (IP) points to the current instruction.

[] 32-bit

With the advent of the 32-bit 80386 processor, the 16-bit general-purpose registers, base registers, index registers, instruction pointer, and , but not the segment registers, were expanded to 32 bits. This is represented by prefixing an "E" (for Extended) to the register (thus the expanded AX became EAX, SI became ESI and so on). The general-purpose registers, base registers, and index registers could all be used as the base in addressing modes, and all of those registers except for the stack pointer could be used as the index in addressing modes. Two new segment registers (FS and GS) were added. With a greater number of registers, instructions and operands, the format was expanded. To provide backward compatibility, segments with executable code can be marked as containing either 16-bit or 32-bit instructions. Special prefixes allow inclusion of 32-bit instructions in a 16-bit segment or vice versa.

[] 64-bit

Starting with the AMD Opteron processor, the x86 in 64-bit long mode (as a subset of or mode) extended the 32-bit registers in a similar way that 32-bit protected mode did before it (RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP, RFLAGS, RIP). However, AMD also added 8 additional 64-bit general registers (R8, R9, ..., R15). The addressing modes were not dramatically changed from 32-bit mode, except that addressing was extended to 64 bits, physical addressing is now sign extended (so memory always adds equally to the top and bottom of memory; note that this does not affect linear or virtual addressing), and other selector details have been dramatically reduced.

[] Miscellaneous/Special Purpose

x86 processors also include various special/miscellaneous registers such as (CR0 through 4), (DR0 through 3, plus 6 and 7), (TR4 through 7), descriptor registers (GDTR, LDTR, IDTR), and a task register (TR).

[] Purpose

Although the main registers are "general-purpose" and can be used for anything, it was envisaged that they be used for the following purposes:

  • AX/EAX/RAX: accumulator
  • BX/EBX/RBX: base
  • CX/ECX/RCX: counter
  • DX/EDX/RDX: data/general
  • SI/ESI/RSI: "source index" for operations.
  • DI/EDI/RDI: "destination index" for string operations.
  • SP/ESP/RSP: Stack pointer for top address of the stack.
  • BP/EBP/RBP: stack base pointer for holding the address of the current .
  • IP/EIP/RIP: Instruction pointer. Holds the current instruction address.

No particular purposes were envisaged for the other 8 registers available only in 64-bit mode.

Some instructions compile and execute more efficiently when using these registers for their designed purpose. For example, using AL as an accumulator and adding an immediate byte value to it produces the efficient add to AL of 04h, whilst using the BL register produces the generic and longer add to register opcode of 80C3h.

[] Structure

General Purpose Registers (A, B, C and D)

7 6 5 4 3 2 1 0
R?X

E?X



?X






?H ?L

Segment Registers (C, D, S, E, F, and G)

1 0
?S

Pointer Registers (S and B)

7 6 5 4 3 2 1 0
R?P

E?P



?P

Index Registers (S and D)

7 6 5 4 3 2 1 0
R?I

E?I



?I

Instruction Pointer Register (I)

7 6 5 4 3 2 1 0
R?P

E?P



?P

x86-64-only General Purpose Registers (R8, R9, R10, R11, R12, R13, R14, R15)

7 6 5 4 3 2 1 0
?

?D

[] Operating modes

[] Real mode

Main article:

Real mode is an operating mode of and later x86-compatible . Real mode is characterized by a 20 bit segmented memory address space (meaning that only 1 of memory can be addressed), direct software access to routines and peripheral hardware, and no concept of or at the hardware level. All x86 CPUs in the series and later start up in real mode at power-on; CPUs and earlier had only one operational mode, which is equivalent to real mode in later chips.

In order to use more than 64 KiB of memory, the segment registers must be used. This created great complications for C compiler implementors who introduced odd pointer modes such as "near", "far" and "huge" to leverage the implicit nature of segmented architecture to different degrees, with some pointers containing 16-bit offsets within implied segments and other pointers containing segment addresses and offsets within segments.

[] Protected mode

Main article:

In addition to real mode, the Intel 80286 supports protected mode, expanding addressable to 16 and addressable to 1 , and providing , which prevents programs from corrupting one another. This is done by using the segment registers only for storing an index to a segment table. There were two such tables, the (GDT) and the (LDT), each holding up to 8192 segment descriptors, each segment giving access to 64 KiB of memory. The segment table provided a 24-bit , which can be added to the desired offset to create an absolute address. Each segment can be assigned one of four ring levels used for hardware-based computer security.

The introduced support in protected mode for , a mechanism making it possible to use .

Paging and segmented memory access are required for modern multitasking operating systems. , and were developed for the 386 because it was the first Intel architecture CPU to support paging and 32-bit segment offsets. The 386 architecture became the basis of all further development in the x86 series. The success of , the first widely accepted version of , was largely due to its ability to take advantage of 386 features, even though it was used mainly to run multiple sessions rather than to take advantage of the native 32-bit .

x86 processors that support protected mode boot into for backward compatibility with the older 8086 class of processors. Upon power-on (aka ), the processor initiates itself into Real mode, and then it begins loading programs automatically into from ROM and . A inserted somewhere along the sequence may be used to put the processor into the . The instruction set in protected mode is backward compatible with the one used in real mode.

[] Virtual 8086 mode

Further information:

There is also a sub-mode of operation in 32-bit Protected mode, called . This is basically a special hybrid operating mode that allows real mode programs and operating systems to run while under the control of a Protected mode supervisor operating system. This allows for a great deal of flexibility in running both Protected mode programs and real mode programs simultaneously. This mode is available in the 32-bit version of Protected mode; virtual 8086 mode does not exist previously in the 16-bit version of Protected mode, or in the 64-bit long mode.

[] 64-bit Long mode

Main article:

By 2002, it was obvious that the 32-bit address space of the x86 architecture was limiting its performance in applications requiring large data sets. A 32-bit address space would allow the processor to directly address only 4 GiB of data, a size surpassed by applications such as and , while using the 64-bit address, one can directly address 16777216 (more than 17 billion GiB) of data, although most 64-bit architectures don't support access to the full 64-bit address space (AMD64, for example, supports only 48 bits, split into 4 paging levels, from a 64-bit address).

, who would traditionally follow the lead of Intel, took the initiative of extending the 32-bit x86 architecture to , initially calling it x86-64, later renaming it AMD64. The , , , and later families of processors use this architecture. The success of the AMD64 line of processors coupled with the lukewarm reception of the IA-64 architecture prompted Intel to reverse-engineer and adopt the instruction set, adding new extensions of its own and branding it the EM64T architecture, and later re-branding it Intel 64.

In its literature and product version names, Microsoft and Sun refer to AMD64/Intel 64 collectively as x64 in the Windows and operating systems respectively. distributions refer to it either as "x86-64", its variant "x86_64", or "amd64". systems use "amd64" while uses "x86_64".

Long mode is mostly an extension of the 32-bit instruction set, but unlike the 16–to–32-bit transition, many instructions were dropped in the 64 bit mode. This does not affect actual binary backward compatibility (which would execute legacy code in other modes that retain support for those instructions), but it changes the way assembler and compilers for new code have to work.

This was the first time that a major upgrade of the x86 architecture was initiated and originated by a manufacturer other than Intel. It was also the first time that Intel accepted technology of this nature from an outside source.

[] Extensions

[] Floating point unit

Further information:

Initially, IA-32 included floating-point capabilities only on add-on processors (8087, 80287 and 80387.) With the introduction of the 80486, these 8 80x87 floating point registers, known as ST(0) through ST(7) are built in to the CPU. Each register is 80 bits wide and stores numbers in the double extended precision format of the .

These registers are not accessible directly, but are accessible like a stack. The register numbers are not fixed, but are relative to the top of the stack; ST(0) is the top of the stack, ST(1) is the next register below the top of the stack, ST(2) is two below the top of the stack, etc. That means that data is always pushed down from the top of the stack, and operations are always done against the top of the stack. So you couldn't just access any register randomly, it has to be done in the stack order.

[] MMX

Main article:

MMX is a instruction set designed by Intel, introduced in 1997 for MMX microprocessors. It developed out of a similar unit first used on the . It first appeared in the . It is supported on most subsequent IA-32 processors by Intel and other vendors. MMX is typically used for video applications.

MMX added 8 new "registers" to the architecture, known as MM0 through MM7 (henceforth referred to as MMn). In reality, these new "registers" were just aliases for the existing x87 FPU stack registers. Hence, anything that was done to the floating point stack would also affect the MMX registers. Unlike the FP stack, these MMn registers were fixed, not relative, and therefore they were randomly accessible. The instruction set did not adopt the stack-like semantics so that existing operating systems could still correctly save and restore the register state when multitasking without modifications.

Each of the MMn registers are 64-bit integers. However, one of the main concepts of the MMX instruction set is the concept of packed data types, which means instead of using the whole register for a single 64-bit integer (), two 32-bit integers (), four 16-bit integers () or eight 8-bit integers () may be used. Also because the MMX's 64-bit MMn registers are aliased to the FPU stack, and each of the stack registers are 80-bit wide, the upper 16-bits of the stack registers go unused in MMX, and these bits are set to all ones, which makes it look like NaN's or infinities in the floating point view. This makes it easier to tell whether you are working on a floating point data or MMX data.

[] 3DNow!

Main article:

In 1997 AMD introduced 3DNow! The introduction of this technology coincided with the rise of entertainment applications and was designed to improve the CPU's performance of graphic-intensive applications. 3D video game developers and 3D graphics hardware vendors use 3DNow! to enhance their performance on AMD's and series of processors.

3DNow! was designed to be the natural evolution of MMX from integers to floating point. As such, it uses the exact same register naming convention as MMX, that is MM0 through MM7. The only difference is that instead of packing byte to quadword integers into these registers, one would pack floating points into these registers. The advantage of aliasing registers with the FPU registers is that the same instruction and data structures used to save the state of the FPU registers can also be used to save 3DNow! register states. Thus no special modifications are required to be made to operating systems which would otherwise not know about.

[] SSE

Main articles: , , , , , and

In 1999, Intel introduced the Streaming SIMD Extensions (SSE) , following in 2000 with SSE2. The first addition made MMX almost obsolete and the second allowed the instructions to be realistically targeted by conventional compilers. Introduced in 2004 along with the revision of the processor, SSE3 added specific memory and -handling instructions to boost the performance of Intel's technology. AMD licensed the SSE3 instruction set and implemented most of the SSE3 instructions for its revision E and later Athlon 64 processors. The Athlon 64 does not support HyperThreading and lacks those SSE3 instructions used only for HyperThreading.

SSE discarded all legacy connections to the FPU stack. This also meant that this instruction set discarded all legacy connections to previous generations of SIMD instruction sets like MMX. But it freed the designers up, allowing them to use larger registers, not limited by the size of the FPU registers. The designers created eight 128-bit registers, named XMM0 through XMM7. (Note: in , the number of SSE XMM registers has been increased from 8 to 16.) However, the downside was that operating systems had to have an awareness of this new set of instructions in order to be able to save their register states. So Intel created a slightly modified version of Protected mode, called Enhanced mode which enables the usage of SSE instructions, whereas they stay disabled in regular Protected mode. An OS that is aware of SSE will activate Enhanced mode, whereas an unaware OS will only enter into traditional Protected mode.

SSE is a SIMD instruction set that works only on floating point values, like 3DNow!. However, unlike 3DNow! it severs all legacy connection to the FPU stack. Because it has larger registers than 3DNow!, SSE can pack twice the number of floats into its registers. The original SSE was limited to only single-precision numbers, like 3DNow!. The SSE2 introduced the capability to pack numbers too, which 3DNow! had no possibility of doing since a double precision number is 64-bit in size which would be the full size of a single 3DNow! MMn register. At 128-bit, the SSE XMMn registers could pack two double precision floats into one register. Thus SSE2 is much more suitable for scientific calculations than either SSE1 or 3DNow!, which were limited to only single precision. SSE3 does not introduce any additional registers.

[] Physical Address Extension (PAE)

Main article:

By default, physical addresses are 32-bit, however, there exists a page extension mode called or PAE, first added in the Intel , which allows an additional 4 bits of physical addressing. The size of memory in Protected mode is usually limited to 4 . Through tricks in the processor's page and segment memory management systems, x86 operating systems may be able to access more than 32-bits of address space, even without the switchover to the 64-bit paradigm. This mode does not change the length of segment offsets or linear addresses; those are still only 32 bits.

[] Virtualization

x86 is difficult because the architecture did not meet the until recently. Nevertheless, there are several commercial products, such as , and , as well as virtualization projects such as +, . Other methods, such as the ("KVM"), require newer processors which provide more hardware support for virtualization.

Intel and AMD have introduced x86 processors with hardware-based virtualization extensions that overcome the classical virtualization limitations of the x86 architecture. These extensions are known as (IVT or simply VT) that was code named "Vanderpool," and that was code named "Pacifica." Although most modern x86 server-based and many modern x86 desktop-based processors include these extensions, the technology is generally considered immature at this point with most software-based virtualization outperforming these extensions. This is expected to change as the technology matures.

[] See also

  • — — — —

[] Footnotes

  1. Unlike the (and the specific electronic and physical implementation) used for a specific chip design
  2. Intel abandoned its "x86" naming scheme with the in 1993 (as numbers could not be trademarked). However, the term x86 was already firmly established among technicians, compiler writers etc.
  3. Intel's naming are and ( or ) for x86 and respectively. Likewise, AMD today prefers over the name they once introduced.
  4. Linux* Kernel Compiling. Intel. Retrieved on -.
  5. Intel Web page search result for "x64". Retrieved on -.
  6. The embedded processor's market is populated by more than 20 different , which, due to the price sensitivity, low power and hardware simplicity requirements, outnumber the x86.
  7. . Intel. Retrieved on -.
  8. It had a slower however, which is slightly ironic as Cyrix started out as a designer of fast Floating point units for x86 processors.

[] References

  • Adams, Keith; Agesen, Ole (2006-21-2006). "". Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, USA, 2006. ACM 1-59593-451-0/06/0010. Retrieved on -. 
Rosenblum, Mendel; Garfinkel, Tal (May, 2005). "Virtual machine monitors: current technology and future trends". IEEE Computer, volume 38, issue 5.
阅读(991) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~