Chinaunix首页 | 论坛 | 博客
  • 博客访问: 662094
  • 博文数量: 291
  • 博客积分: 10025
  • 博客等级: 上将
  • 技术积分: 2400
  • 用 户 组: 普通用户
  • 注册时间: 2004-12-04 12:04
文章分类

全部博文(291)

文章存档

2008年(102)

2007年(112)

2006年(75)

2004年(2)

我的朋友

分类:

2007-06-29 10:30:06

Introduction
By Andrew Binstock

In late 2001, early 2002, two processor families based on the Intel NetBurst® microarchitecture were introduced: the Pentium® 4 and the Intel® Xeon® processor families. At the time of the launch, Intel undertook an extensive positioning initiative that aimed to distinguish these processor families much more clearly than ever before. A key goal of the effort was to communicate that Intel Xeon chips are intended for high-end workstations and servers, while Pentium 4 chips are primarily for use in desktops.

To make the distinction more clear, Intel abandoned its previous nomenclature in which Xeon processors were prepended with name of the sibling Pentium processor; such as the popular Pentium® III Xeon® chip. Today, there are just Pentium processors and Xeon processors. The Xeon offerings are further broken down into two branches: the dual processor Intel Xeon chip and the Intel Xeon processor MP (Multi Processor).

This article explains the technology differences between these two Intel Xeon processor families and how to choose the right one for a specific application. By extension, it also explains how the developer of a particular software application or solution can decide which platform to target.¹

¹ To round out the picture, the Celeron® processor was positioned as the value entry point; with 64-bit behemoth, Itanium® processor aimed at large, enterprise servers.

Dual vs. Multi-Processor
Intel Xeon processors, serving dual-processor server and workstation platforms, can be used solely on motherboards that have either one or two sockets. Intel Xeon Processor MP-based systems typically have room for four or more processors, although there is no minimum requirement of four processors on such systems—they are commonly available with configurations in which only one or two sockets are populated. This fact might tempt a prospective customer to purchase only MP processors, so as to gain maximum configuration flexibility. However, the processors serve different purposes due to other features of their architecture—a trait reflected in the premium charged for the MP processors. The most salient of these features is discussed next.

Cache on Hand
At the time of this article, Intel Xeon processors (dual processors) ship with 512KB of level 2 (L2) cache, while the multiprocessor models contain 1MB or 2MB of level 3 (L3) cache in addition to the 512KB of L2 cache. The additional cache means that considerably more data can be stored on the processor, of particular advantage for transaction-oriented servers or systems with high-data throughput. To see how and why this is so, it's important to understand how the three-level cache works.

The Intel NetBurst microarchitecture as it appears in Pentium 4 and Intel Xeon processors has two standard levels of cache. Level 1 (L1) cache is located inside the chip's execution core. Data items and instructions that the execution pipeline will immediately process or has just processed are stored here. At the time of this article, this cache can hold the equivalent of 12,000 microinstructions (the sub-instructions that make up a complete assembly-language instruction). This design enables the entire code of an executing loop to be stored in L1 cache, meaning that the processor has the fastest possible access to all loop instructions.

The L1 cache is itself fed by internal buses that obtain their data and instructions from the L2 cache, which at 512KB is substantially larger than its L1 counterpart. L2 cache on Pentium 4 and Intel Xeon chips (dual processor) is fed by the front-side bus, which is the bus that moves data between the processor and main memory (as well as to and from the AGP graphics subsystem). Data is moved to cache when it's needed immediately or when predictive mechanisms in the processor 'anticipate' it will be needed by upcoming instructions. Programmers can complement these mechanisms by manually preloading the L2 cache with the data items.

The front-side bus (FSB) that feeds the L2 cache can run at various speeds (more on this later). At the base rate of 400MHz, the FSB could deliver 3.2GB/sec to the L2 cache. Even at this speed, if the processor needs an item that is not in either of its caches (a situation known as a cache miss) a steep penalty is imposed on performance: The processor must idle while data is retrieved from memory. Despite the bus speed, the memory itself takes a while to find and deliver the needed data.²

While Intel has built-in, advanced technology to limit the number of cache misses, the optimal solution is to provide an even larger cache that can contain more data. This is particularly true for enterprise applications where many data items may need to be in play simultaneously to complete a transaction. Responding to this need, Intel added a third level of cache (L3) to its high-end 32-bit server-oriented chips—the Intel Xeon Processor MP.

This three-tier design means that if an L2-cache miss occurs, the L3 cache is examined before a fetch to memory is initiated. With up to 2MB cache at the L3 level, performance is greatly enhanced by the reduction memory latencies and by the greater flexibility in cache management presented by so large a capacity. Later discussion of performance benchmarks demonstrates this.

² This situation actually presents ideal conditions for use of Hyper-Threading Technology (HT Technology). Useful work can be performed by a separate thread on a cache miss.

Bus Speeds
The FSBs of the Intel NetBurst microarchitecture run at effective rates of 400MHz and 533MHz. These are effective rates because of some magic that Intel performs. The hardware speeds are 100MHz and 133MHz respectively. However, rather than sending data on every tick, as is traditional on buses, Intel sends data four times, meaning that the 100MHz bus attains the data throughput of a 400MHz bus. Because this bus is 64 bits (or 8 bytes) wide, at 400MHz the throughput is 3.2GB/sec. At 533MHz, it's 4.3GB/sec.

Today, the fastest FSB for the dual processor, Intel Xeon chip is 533MHz, while the multiprocessor model tops out at 400MHz. It may seem surprising that the higher-end chip would have a slower clock but, in fact, it's not. The 3.2GB/sec. capacity of the Intel Xeon processor MP is a very difficult pipe to keep filled. Only in situations where a processor is reading huge blocks of data can this level be reached. These situations do occur in imaging and advanced multimedia applications, which, coincidentally, are also computationally intensive. Hence, the faster bus is a better fit with the processor that handles such tasks—the dual processor, Intel Xeon chip.

Enterprise computing, on the other hand, requires processing of numerous smaller data items that on aggregate never fully fill a bus at 3.2GB/sec on a sustained basis. In view of this, Intel appears to have chosen not to raise the cost of the Intel Xeon processor MP by giving it a feature not likely to be needed.

Clock Speeds
Clock speeds for the Intel Xeon processor families follow the same model as the bus speeds. Currently, the dual processor model tops out at 3.06GHz, while the fastest Intel Xeon processor MP runs at 2GHz. The speed differential is similar in origin to the difference in bus speeds: the primary target of the dual processor chip is computationally intensive workloads—those that benefit from high clocks and fast buses. While the Intel Xeon processor MP targets systems that are database and transaction capable. It's important when considering this distinction to remember that multiprocessor models will typically reside on servers with 4 to 8 chips. (In typical cases, that is. Models from IBM and Unisys feature as many as 32 processors.) As such, they are designed to scale performance by addition of processors. Their focus is handling multiple threads with multiple transactions; hence they need the ability to manage large caches and multiple threads while performing work that is generally not computationally demanding.

Performance Comparison
AnandTech, a hardware analysis group, compared servers running both models of Xeon processors in November 2002. In the company's benchmarks (available at *) run against two databases—one small, the other moderate—the following results were obtained. (All processors were running with Hyper-Threading Technology enabled):

3GB Web database benchmark

System ConfigurationTransactions/sec.
2 x Intel Xeon processor (2.8 MHz)1260
2 x Intel Xeon processor MP (2.0 MHz)1102
4 x Intel Xeon processor MP (2.0 MHz)2070
Intel's decision to add L3 cache to the MP model is borne out by these results: even though the dual processor model's clock is 40% faster, test results are only 14% faster—clearly the cache is optimally used in database transactions. The cache design is even more compelling in light of the remarkable scalability of the Intel Xeon processor MP. Going from 2 to 4 chips resulted in an 88% increase in throughput. This is a truly superior level of processor scalability and demonstrates an optimal use of processor resources.

Now, let's examine a similar benchmark run against a much larger database—the kind for which the Intel Xeon processor MP is designed.

25.2GB Ad database benchmark

System ConfigurationTransactions/sec.
2 x Intel Xeon processor (2.8 MHz)1433
2 x Intel Xeon processor MP (2.0 MHz)1485
4 x Intel Xeon processor MP (2.0 MHz)2497
With greater volumes of data now involved, the multiprocessor systems really come into their own. Note the 2.0GHz multiprocessor system is handling more transactions than its 2.8GHz DP sibling. In addition, the scalability between the dual and quad-processing MP systems remains very high.

Results such as these can be further enhanced by the chipset that supports each processor. Chipsets supporting the dual processor model currently are limited to accessing a maximum of 16GB of RAM, whereas the multiprocessor chipsets scale to 64GB of RAM. This difference emphasizes the distinguishing feature of the Intel Xeon processor MP: much greater headroom for expansion.

Summary
The key distinctions between the two families of Intel Xeon processors can be summarized by their salient features: the dual processor model scales to a maximum of two processors and specializes in computationally demanding applications where fast clocks and FSBs generate a significant benefit. The multiprocessor model scales to 32 processors and by its large cache delivers optimal performance in transactional and database applications where large numbers of data items are in play at once.

As such, Intel recommends that the processors match their software in roughly this manner:

  • Application servers, CRM processing, etc.: Dual processor chips for small installations, Intel Xeon processor MP for sites with moderate to heavy loads.
  • Database servers, ERP packages, enterprise applications: Intel Xeon processor MP.
With front-end servers the picture is a bit less clear.³

Needless to say, overlap exists between these categories. When the server you're considering falls into one of these areas of overlap, Intel recommends examination of the system's projected expansion requirements. If you anticipate a need for further capacity or performance, the Intel Xeon processor MP is the only chip that can provide the headroom. However, if your needs are modest and you expect that they will not grow significantly beyond current levels, the dual processor model will probably suffice.

³ Dual processor chips appear to be well suited for front-end servers such as web servers. However, multiprocessor chips are being increasingly used for this purpose as well.

About the Author
Andrew Binstock is the principal analyst at Pacific Data Works LLC. He was previously a senior technology analyst at PricewaterhouseCoopers, and earlier editor in chief of UNIX Review and C Gazette. He is the lead author of "Practical Algorithms for Programmers," from Addison-Wesley Longman, which is currently in its 12th printing and in use at more than 30 computer-science departments in the United States.




Related Resources

阅读(665) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~