计算循环索引和测试循环条件的开销部分所占比重过大(相对于循环体内执行的指令而言),连续执行一组相同的指令,能提高指令的cache命中率,如下例所示,对LoopTest的一片存储空间进行初始化,分别测试普通循环,和8个一组的展开循环的性能,所用时间与循环次数间的关系如下如下:(clocks/sec = 1000000)
|
8000 |
80000 |
800000 |
8000000 |
80000000 |
loop_time |
10000 |
10000 |
20000 |
210000 |
2140000 |
extend_loop_time |
0 |
0 |
10000 |
30000 |
390000 |
从以上的数据可以看出,但循环长度很小时,是否展开循环影响很小,随着循环长度增加,两者的差别越来越大,接近展开级别(这里8个一组展开,时间差别有7倍左右)。
测试代码如下:(Fedora 11, 虚拟机环境)
#include <iostream> #include <ctime> using namespace std;
class LoopTest { public: LoopTest(unsigned long size) { loopSize = size - size%8; array = new int[loopSize]; } ~LoopTest() { delete []array; } unsigned long loop_time(); unsigned long extend_loop_time(); private: unsigned long loopSize; int *array; };
unsigned long LoopTest::loop_time() { unsigned long start, end; unsigned long i; start = clock(); for(i = 0; i < loopSize; i++) { array[i] = 1; } end = clock(); return end - start; }
unsigned long LoopTest::extend_loop_time() { unsigned long start, end; unsigned long i; start = clock(); for(i = 0; i < loopSize/8; i++) { array[i*8+0] = 2; array[i*8+1] = 2; array[i*8+2] = 2; array[i*8+3] = 2; array[i*8+4] = 2; array[i*8+5] = 2; array[i*8+6] = 2; array[i*8+7] = 2; } end = clock(); return end - start; }
int main() { LoopTest *lt = new LoopTest(8000000); unsigned long t1 = lt->loop_time(); unsigned long t2 = lt->extend_loop_time(); cout << "clocks/sec = " << CLOCKS_PER_SEC << endl; cout << t1 << endl; cout << t2 << endl; return 0; }
|
阅读(5530) | 评论(0) | 转发(0) |