Chinaunix首页 | 论坛 | 博客
  • 博客访问: 829647
  • 博文数量: 168
  • 博客积分: 5431
  • 博客等级: 大校
  • 技术积分: 1560
  • 用 户 组: 普通用户
  • 注册时间: 2007-10-22 11:56
文章存档

2015年(2)

2014年(1)

2013年(12)

2012年(12)

2011年(15)

2010年(5)

2009年(16)

2008年(41)

2007年(64)

分类: LINUX

2007-10-26 09:00:11

ARM Platform Programming Optimizing

            By Stephen Du

            07-5-16

Part 1 Data Types



1.Reason:


    ARM support only 32bits register, so if we don
t choose the proper data type, more instruction will be needed!


2.Detail:


*Use 32bit data types as possible(if mod or % is needed don
t follow this!)

   *Use 32bit data and convert to the type at last

   *Use pointer while not index for array(for short type ldrh dont support shit addressing)

   *unsigned is better for dividing other the same efficiency

  

*Use as narrow as possible for storing!(armv4 can proccess 8 16 32 the same!)

*Use explicit convert as possible

*load and store will convert type automatically!

Part 2 Loop

1.Fixed count loop should use 0 (--)to end while not ++ to for and do while is better than for(but do while at least once!!)2.To unsigend type use !=0 is better than >0

3.If possible break up the inloop sentences to request

4.Reduce the number of variables in loop

 

For 3

Int checksum(int *data,unsigned int N){

unsigned int i;

int sum = 0;

for(i=N/4;i!=0;i--){

sum+=*(data++);

sum+=*(data++);

sum+=*(data++);

sum+=*(data++);

}

for(i=N&3;i!=0;i--){

Sum+=*(data++);

}

Return sum

}

Part 3 Pointer


Alias


1.Pointer alias will cause loading the same data from memory twice or more so avoid it by using a local variable


2.Avoid using local variable address!!



Part 4 Struct


Padding


1.Proper allocating of struct member will save memory but wasting time,__packed option will force to kick out pads! For cross platform should avoid using __packed and enum type! In struct 8bit should before 16 and so is 16 32 64,the bigger are backer! Avoiding big struct with more levels;Adding pad by hand can do good to cross paltform



Part 5 Inline And


Function Parameters


1.Limit the number of parameters to less than 4,because according APCS, the more than 4 or long long alike parameters have to be pushed into  the stack, and these actions make the program cost more time;


2.If parameters are too many, make them a structure and pass the structure pointer is a good idea for avoiding pushing and pop;


3.Don
t use more than 12 local variable so that the parameters can be allocated register while not stack!


3.Put the functions in the same file as possible, and inline the small functions(don
t inline the big ones!!!);

unsigned int dec16_to_hex(unsigned int dec16){

If(dec16<10){

return dec16+0;

}

return dec16-10+A;

}

void uint_to_hex(char*out,unsigned int in){

in = in<<4|in>>28;

*out++ = (char)dev16_to_hex(in&15);

}




Part 6 Bit Field



1.Avoid using bit-field, use #define or enum instead!

Bad example:

void A(); void B(); void C();

typedef struct{

unsigned int a:1;

unsigned int b:1;

unsigned int c:1;

}P;

void bit(P* p){

If(p->a)

A();

If(p->b)

B();

If(p->c)

C();

}

This cause the access of bit field by pointer so that load data more than normal!

Good ones:

nsigned int P;

 #define Ba (1u<<0)

#define Bb (1u<<1)

#define Bc (1u<<2)

void bit(unsigned int p){

If(p&Ba)

A();

If(p&Bb)

B();

If(p&Bc)

C();

}


2.Use and or xor is good idear!

Part 7 Edge Aligned And Endian Related 

1.Be care about __attribute__ ((packed));

It leads to code uncross usage!Use char type pointer to access the data to avoid the edge aligned problems.


2.When data to be processed  is different from our cpu  on endian(cpu is little-endian while data is big endian), just use 8 bit type to access it with 2 times to 16bit origin 4 times to 32 bit origin and also write different sub-code is a good idea!

Sample:

 void read_audio_data(short* out,char* in,unsigned int N){

unsigned short* data;

unsigned int sample,next;

switch(in&1){

case 0:

do{

data = (unsigned short*)in;

sample = *(data++);

#ifdef __BIG_ENDIAN

Sample = sample<<8|sample>>8;

#endif

*(out++) =sample;

}while(N--);

break;

case 1:

data = (unsigned short*)(in-1);

Sample = *(data++);

#ifdef __BIG_ENDIAN

Sample = sample&0xff;

#else sample = sample>>8;

#endif

do{

next =*(data++);

#ifdef __BIG_ENDIAN

*(out++)=(short)(next&0xff00)|sample;

Sample =next&0xff;

#else *(out++)=(short)

((next&<<8)|sample);

Sample =next>>8;

#endif

break

}while(--N);

}

}

Part 8 About Divide

And Floating


1.ARM doesn
t realize divide, so the armcc will use a C library to simulate the divide function if in need! Avoiding using divide is a good idea! If have to, just use unsigned or const divide, because  they work faster, the signed divide cast 50 cycles ,so much time!


2.Sometimes mod or % is better than divide!


3.Make full use of the result a=w/c and b=w%c to avoid  divide;


4.Other logrithum so hard 



Part 9 Embedded

assemble


1.Sometimes embedded assemble works much better, such as the C don
t support types (coprocessor and armv5e extended command!)

asm(mcr cp15,00,c7,c5,0);



Part 10 Cross

Platform


1.On ARM platform char is unsigned and if loop is ended with i>=0,it will turned to be infinite, we need to care about it! We can solve it by giving an gcc option or change the I type!


2.Old platform int is 16 bits so on arm they will be turned into 32 bits some judgment turned false

i = 0xf000


3.Data  align


4.Big and little endian!


5.Function types( parameter types)


6.Bit-field


7.enumeration (use as less as possible)


8.Embedded assemble cause uncross


9.The port needed to use volatile so that the correct type and command used(ldrsh strsh or ldr str)

 

阅读(1346) | 评论(0) | 转发(0) |
0

上一篇:音乐与心情专题

下一篇:Linux下driver开发

给主人留下些什么吧!~~