从我的新电脑谈oprofile
一 我的新电脑
刚从合肥来到苏州,跟着电脑也换了,虽然新换的电脑也是上届的学长用过的,但是比起我在合肥用的赛扬的不知道要强了多少倍!
由于前段时间看过部分oprofile的文章,感觉很感兴趣,就想自己也做做实验,于是读了一下intel manual 3B的关于硬件性能检测的部分,无比兴奋的就开始了实验,可是老是不成功,自己设的事件采集老是失败,后来才知道我的台式机和笔记本都不支持硬件性能检测。不过幸运的是现在机会终于来了,现在用的是intel E6550的CPU,两个物理核心,每个物理核又有两个逻辑核,有点可惜的是,不支持超线程的技术,现在流行的intel core i7是四核,8线程的,让我羡慕不已阿!
现面就来解释下我这个cpu的一些参数:
cpu family :代表处理器属于哪一个系列,6指的是6系列,主要包括:Pentium Pro、Pentium II、Pentium II Xeon、Pentium III和Pentium III Xeon处理器
model:型号,可用来确定处理器的制作技术以及属于该系列的第几代设计(或核心,通常和cpu family配合使用
stepping:步进编号用来标识处理器的设计或制作版本,有助于控制和跟踪处理器的更改,步进还可以让最终用户更具体地识别其系统安装的处理器版本,确定微处理器的内部设计或制作特性。
siblings:逻辑处理器的数目,这里是2,即一个处理器核上有分成两个逻辑核
core id: 是我们通常说的CPU ID
cpu cores:这个处理器上有几个核
apic id: 高级可编程中断控制器的编号
cpuid level:这个东西没怎么搞清楚,如果有明白的朋友清给我留言。
二 oprofile
1 现面来看看我的机器支持的硬件事件,大体上解释解释
1 CPU_CLK_UNHALTED: (counter: all)
Clock cycles when not halted (min count: 6000)
Unit masks (default 0x0)
----------
0x00: Unhalted core cycles
0x01: Unhalted bus cycles
0x02: Unhalted bus cycles of this core while the other core is halted
在采样的这段时间内,不在halted状态(挂起或停机状态)下的机器周期数,具体有分为一下三种情况:
(1)不在unhalted 状态的cpu的周期数
(2)不在unhalted状态的总线的周期数
(3)当其他的核在halted状态时,本地核上的不在halted状态的周期数
2 INST_RETIRED.ANY_P: (counter: all)
number of instructions retired (min count: 6000)
引退指令的数目,很多资料上说,当一条指令执行完后,把它从指令队列中删除,才叫指令的引退
3 L2_RQSTS: (counter: all)
number of L2 cache requests (min count: 500)
Unit masks (default 0x7f)
----------
0xc0: core: all cores
0x40: core: this core
0x30: prefetch: all inclusive
0x10: prefetch: Hardware prefetch only
0x00: prefetch: exclude hardware prefetch
0x08: (M)ESI: Modified
0x04: M(E)SI: Exclusive
0x02: ME(S)I: Shared
0x01: MES(I): Invalid
L2 cache请求的数目,又分为以下几种情况:
(1)所有核的L2 cache的请求
(2)当前核的L2 cache的请求
(3)所有核预取时对L2 cache的请求
(4)仅包含硬件预取时,L2 cache的请求
(5)预取时,除硬件预取之外的其他的L2 cache请求
(6)访问处于M态的L2 cache请求的次数
(7)访问处于E态的L2 cache请求的次数
下面的其他状态类似
4 LLC_MISSES: (counter: all)
L2 cache demand requests from this core that missed the L2 (min count: 6000)
Unit masks (default 0x41)
----------
0x41: No unit mask
当前核上,L2请求缺失的次数
5 LLC_REFS: (counter: all)
L2 cache demand requests from this core (min count: 6000)
Unit masks (default 0x4f)
----------
0x4f: No unit mask
当前核上L2 请求的数量
6 MISALIGN_MEM_REF: (counter: all)
number of misaligned data memory references (min count: 500)
未对齐的数据内存引用的数量
7 SEGMENT_REG_LOADS: (counter: all)
number of segment register loads (min count: 500)
段寄存器加载的数量
8 DTLB_MISSES: (counter: all)
DTLB miss events (min count: 500)
Unit masks (default 0xf)
----------
0x01: ANY Memory accesses that missed the DTLB.
0x02: MISS_LD DTLB misses due to load operations.
0x04: L0_MISS_LD L0 DTLB misses due to load operations.
0x08: MISS_ST TLB misses due to store operations.
DTLB 未命中的数量,有分为以下几种情况:
(1)所有内存访问的DTLB的缺失
(2)加载操作时的DTLB的缺失
(3)加载操作时L0 DTLB的缺失数
(4)store操作时,TLB的缺失数
9 PAGE_WALKS: (counter: all)
Page table walk events (min count: 500)
Unit masks (default 0x2)
----------
0x01: COUNT Number of page-walks executed.
0x02: CYCLES Duration of page-walks in core cycles.
(1)遍历页表的次数
(2)内核周期中遍历页表的时间
10
MUL: (counter: all)
number of multiplies (min count: 1000)
DIV: (counter: all)
number of divides (min count: 500)
CYCLES_DIV_BUSY: (counter: all)
cycles divider is busy (min count: 1000)
IDLE_DURING_DIV: (counter: all)
cycles divider is busy and all other execution units are idle. (min count: 1000)
上面的事件分别表示:
(1)乘法操作的执行次数
(2)除法操作的执行次数
(3)除法操作执行的时间
(4)乘法操作的执行时间
(5)除法操作执行但是其他运算不见空闲的时间
11 L2_ADS: (counter: all)
Cycles the L2 address bus is in use. (min count: 500)
Unit masks (default 0x40)
----------
0xc0: All cores
0x40: This core
L2 地址线的使用的cycle数
12 L2_DBUS_BUSY_RD: (counter: all)
Cycles the L2 transfers data to the core. (min count: 500)
Unit masks (default 0x40)
----------
0xc0: All cores
0x40: This core
L2 数据总线向内核传送数据的周期数
13 L2_LINES_IN: (counter: all)
number of allocated lines in L2 (min count: 500)
Unit masks (default 0x70)
----------
0xc0: core: all cores
0x40: core: this core
0x30: prefetch: all inclusive
0x10: prefetch: Hardware prefetch only
0x00: prefetch: exclude hardware prefetch
L2 line 分配的次数(应该是写分配的协议)
14 L2_M_LINES_IN: (counter: all)
number of modified lines allocated in L2 (min count: 500)
Unit masks (default 0x40)
----------
0xc0: All cores
0x40: This core
处于M态的L2 line分配的次数
15 L2_LINES_OUT: (counter: all)
number of recovered lines from L2 (min count: 500)
Unit masks (default 0x70)
----------
0xc0: core: all cores
0x40: core: this core
0x30: prefetch: all inclusive
0x10: prefetch: Hardware prefetch only
0x00: prefetch: exclude hardware prefetch
L2 line被覆盖的次数
L2_M_LINES_OUT: (counter: all)
number of modified lines removed from L2 (min count: 500)
Unit masks (default 0x70)
----------
0xc0: core: all cores
0x40: core: this core
0x30: prefetch: all inclusive
0x10: prefetch: Hardware prefetch only
0x00: prefetch: exclude hardware prefetch
处于M态的L2 line被覆盖的次数,这时被覆盖的话,根据不同的cache一致性协议,或者协会内存,或者到别的状态。
16 L2_IFETCH: (counter: all)
number of L2 cacheable instruction fetches (min count: 500)
Unit masks (default 0x4f)
----------
0xc0: core: all cores
0x40: core: this core
0x08: (M)ESI: Modified
0x04: M(E)SI: Exclusive
0x02: ME(S)I: Shared
0x01: MES(I): Invalid
17 L2_LD: (counter: all)
number of L2 data loads (min count: 500)
Unit masks (default 0x7f)
----------
0xc0: core: all cores
0x40: core: this core
0x30: prefetch: all inclusive
0x10: prefetch: Hardware prefetch only
0x00: prefetch: exclude hardware prefetch
0x08: (M)ESI: Modified
0x04: M(E)SI: Exclusive
0x02: ME(S)I: Shared
0x01: MES(I): Invalid
18 L2_ST: (counter: all)
number of L2 data stores (min count: 500)
Unit masks (default 0x4f)
----------
0xc0: core: all cores
0x40: core: this core
0x08: (M)ESI: Modified
0x04: M(E)SI: Exclusive
0x02: ME(S)I: Shared
0x01: MES(I): Invalid
L2_LOCK: (counter: all)
number of locked L2 data accesses (min count: 500)
Unit masks (default 0x4f)
----------
0xc0: core: all cores
0x40: core: this core
0x08: (M)ESI: Modified
0x04: M(E)SI: Exclusive
0x02: ME(S)I: Shared
0x01: MES(I): Invalid
19 L2_REJECT_BUSQ: (counter: all)
Rejected L2 cache requests (min count: 500)
Unit masks (default 0x7f)
----------
0xc0: core: all cores
0x40: core: this core
0x30: prefetch: all inclusive
0x10: prefetch: Hardware prefetch only
0x00: prefetch: exclude hardware prefetch
0x08: (M)ESI: Modified
0x04: M(E)SI: Exclusive
0x02: ME(S)I: Shared
0x01: MES(I): Invalid
L2_NO_REQ: (counter: all)
Cycles no L2 cache requests are pending (min count: 500)
Unit masks (default 0x40)
----------
0xc0: All cores
0x40: This core
20 EIST_TRANS_ALL: (counter: all)
Intel(tm) Enhanced SpeedStep(r) Technology transitions (min count: 500)
THERMAL_TRIP: (counter: all)
Number of thermal trips (min count: 500)
Unit masks (default 0xc0)
----------
0xc0: No unit mask
热断路事件的发生次数,当cpu的温度超过某个值的时候就会触发事件
21 L1 cache的相关操作
L1D_CACHE_LD: (counter: all)
L1 cacheable data read operations (min count: 500)
Unit masks (default 0xf)
----------
0x08: (M)ESI: Modified
0x04: M(E)SI: Exclusive
0x02: ME(S)I: Shared
0x01: MES(I): Invalid
L1D_CACHE_ST: (counter: all)
L1 cacheable data write operations (min count: 500)
Unit masks (default 0xf)
----------
0x08: (M)ESI: Modified
0x04: M(E)SI: Exclusive
0x02: ME(S)I: Shared
0x01: MES(I): Invalid
L1D_CACHE_LOCK: (counter: all)
L1 cacheable lock read operations (min count: 500)
Unit masks (default 0xf)
----------L1D_CACHE_LD: (counter: all)
L1 cacheable data read operations (min count: 500)
Unit masks (default 0xf)
----------
0x08: (M)ESI: Modified
0x04: M(E)SI: Exclusive
0x02: ME(S)I: Shared
0x01: MES(I): Invalid
L1D_CACHE_ST: (counter: all)
L1 cacheable data write operations (min count: 500)
Unit masks (default 0xf)
----------
0x08: (M)ESI: Modified
0x04: M(E)SI: Exclusive
0x02: ME(S)I: Shared
0x01: MES(I): Invalid
L1D_CACHE_LOCK: (counter: all)
L1 cacheable lock read operations (min count: 500)
Unit masks (default 0xf)
----------
0x08: (M)ESI: Modified
0x04: M(E)SI: Exclusive
0x02: ME(S)I: Shared
0x01: MES(I): Invalid
L1D_CACHE_LOCK_DURATION: (counter: all)
Duration of L1 data cacheable locked operations (min count: 500)
Unit masks (default 0x10)
----------
0x10: No unit mask
L1D_ALL_REF: (counter: all)
All references to the L1 data cache (min count: 500)
Unit masks (default 0x10)
----------
0x10: No unit mask
L1D_ALL_CACHE_REF: (counter: all)
L1 data cacheable reads and writes (min count: 500)
Unit masks (default 0x2)
----------
0x02: No unit mask
L1D_REPL: (counter: all)
Cache lines allocated in the L1 data cache (min count: 500)
Unit masks (default 0xf)
----------
0x0f: No unit mask
L1D_M_REPL: (counter: all)
Modified cache lines allocated in the L1 data cache (min count: 500)
L1D_M_EVICT: (counter: all)
Modified cache lines evicted from the L1 data cache (min count: 500)
0x08: (M)ESI: Modified
0x04: M(E)SI: Exclusive
0x02: ME(S)I: Shared
0x01: MES(I): Invalid
L1D_CACHE_LOCK_DURATION: (counter: all)
Duration of L1 data cacheable locked operations (min count: 500)
Unit masks (default 0x10)
----------
0x10: No unit mask
L1D_ALL_REF: (counter: all)
All references to the L1 data cache (min count: 500)
Unit masks (default 0x10)
----------
0x10: No unit mask
L1D_ALL_CACHE_REF: (counter: all)
L1 data cacheable reads and writes (min count: 500)
Unit masks (default 0x2)
----------
0x02: No unit mask
L1D_REPL: (counter: all)
Cache lines allocated in the L1 data cache (min count: 500)
Unit masks (default 0xf)
----------
0x0f: No unit mask
L1D_M_REPL: (counter: all)
Modified cache lines allocated in the L1 data cache (min count: 500)
L1D_M_EVICT: (counter: all)
Modified cache lines evicted from the L1 data cache (min count: 500)
L1D_PEND_MISS: (counter: all)
Total number of outstanding L1 data cache misses at any cycle (min count: 500)
L1D_SPLIT: (counter: all)
Cache line split load/stores (min count: 500)
Unit masks (default 0x1)
----------
0x01: split loads
0x02: split stores
先写到这里吧,硬件监测事件太多了!我们应用的关键是如何灵活应用这些监测事件或是监测事件的不同组合,来达到我们的目的。在后面的文章中,将以具体的实例展示oprofile的使用方法!
阅读(584) | 评论(0) | 转发(0) |