QPI (CSI) explained[8] [zz]-conghonglei-ChinaUnix博客

conghongleihonglei.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

conghonglei

博客访问： 478600
博文数量： 143
博客积分： 6159
博客等级：准将
技术积分： 1667
用户组：普通用户
注册时间： 2010-08-25 23:08

文章分类

全部博文（143）

杂记（4）
programming（8）

erlang（1）
日志计划（4）
心路（9）
system（12）
network（19）
总线接口（17）
Linux（14）

nptl（11）
Joke（41）
未分配的博文（15）

文章存档

2013年（1）

2012年（11）

2011年（55）

2010年（76）

我的朋友

Multiprocessor Systems

When the P6 front side bus was first released, it caused a substantial shift in the computer industry by supporting up to four processors without any chipset modifications. As a result, Intel based systems using Linux or Windows penetrated and dominated the workstation and entry level server market, largely because the existing architectures were priced vastly higher.

However, Intel hesitated to extend itself beyond that point. This hesitancy was partially due to economic incentives to maintain the same infrastructure, but also the preferences of key OEMs such as IBM, HP and others, who provide value added in the form of larger multiprocessor systems. Balancing all the different priorities inside of Intel, and pleasing partners is nearly impossible and has handicapped Intel for the past several years. However, it is quite clear that any reservations at Intel disappeared around 2002-3, when CSI development started.

Intel's patents clearly anticipate two and four processor systems, as shown in Figure 6. Each processor in a dual socket system will require a single coherent full width CSI link, with one or two half-width links to connect to I/O bridges, making the system fully symmetric (half-width links are shown as dotted lines). Processors in four socket systems will be fully connected, and each processor could also connect directly to the I/O bridge. More likely, each processor, or pair of processors, could connect to a separate I/O bridge to provide higher I/O bandwidth in the four socket systems.

Figure 6 – 2 and 4P CSI System Diagrams [2] [34]

Fully interconnected systems, such as those shown in Figure 6 enjoy several advantages over partially connected solutions. First of all, transactions occur at the speed of the slowest participant. Hence, a system where every caching agent (including the I/O bridge) is only one hop away ensures lower transaction latency. Secondly, by lowering transaction latency, the number of transactions in flight is reduced (since the average transaction life time is shorter). This means that the buffers for each caching agent can be smaller, faster and more power efficient. Lastly, operating systems and applications have trouble handling NUMA optimizations, so more symmetrical systems are ideal from a software perspective.

Interacting with I/O

Of course, ensuring optimal communication between multiple processors is just one part of system design. The I/O architecture for Intel’s platform is also important, and CSI brings along several important changes in that area as well [36].

As Figure 6 indicates, some CSI based systems contain multiple I/O hubs, which need to communicate with each other. Since the I/O hubs are not connected, Intel’s engineers devised an efficient method to forward I/O transactions (typically PCI-Express) through CSI. Because CSI was optimized for coherent traffic, it lacks many of the features which PCI-Express relies upon, such as I/O specific packet attributes. To solve this problem, PCI-E packets are tunneled through CSI, leaving much or all of the PCI-E header information intact.

阅读(521) | 评论(0) | 转发(0) |

上一篇：QPI (CSI) explained[7] [zz]

下一篇：QPI (CSI) explained[9] [zz]

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6