分类: LINUX
2007-10-26 09:00:11
ARM Platform
Programming Optimizing
By
Part 1 Data Types
•
1.Reason:
ARM support only 32bits register, so if we don’t choose the proper data type, more instruction will
be needed!
2.Detail:
*Use 32bit data types as possible(if mod or % is needed don’t follow this!)
*Use
32bit data and convert to the type at last
*Use
pointer while not index for array(for short type ldrh don’t support shit addressing)
*unsigned is better for dividing other the same efficiency
*Use as narrow as
possible for storing!(armv4 can proccess 8 16 32 the same!)
•*Use explicit convert as possible
•*load and store will convert type
automatically!
Part 2
•1.Fixed count loop should use 0 (--)to end while
not ++ to “for” and “do while” is better
than “for”(but do
while at least once!!)2.To unsigend type use “!=0” is better than “>0”
•3.If possible break up the inloop sentences
to request
•4.Reduce the number of variables in loop
•For 3
•Int checksum(int *data,unsigned int N){
•unsigned int i;
•int sum = 0;
•for(i=N/4;i!=0;i--){
•sum+=*(data++);
•sum+=*(data++);
•sum+=*(data++);
•sum+=*(data++);
•}
•for(i=N&3;i!=0;i--){
•Sum+=*(data++);
•}
•Return sum
•}
Part 3 Pointer
Alias
•
1.Pointer alias will cause loading the same data from memory twice or more so
avoid it by using a local variable
•
2.Avoid using local variable address!!
Part 4 Struct
Padding
1.Proper allocating of struct member will save memory but wasting time,__packed
option will force to kick out pads! For cross platform should avoid using
__packed and enum type! In struct 8bit should before 16 and so is 16 32 64,the
bigger are backer! Avoiding big struct with more levels;Adding pad by hand can
do good to cross paltform
Part 5 Inline And
Function Parameters
•
1.Limit the number of parameters to less than 4,because according APCS, the
more than 4 or long long alike parameters have to be pushed into the
stack, and these actions make the program cost more time;
•
2.If parameters are too many, make them a structure and pass the structure
pointer is a good idea for avoiding pushing and pop;
•
3.Don’t use more than 12 local variable so that
the parameters can be allocated register while not stack!
•
3.Put the functions in the same file as possible, and inline the small
functions(don’t inline the big ones!!!);
•unsigned int dec16_to_hex(unsigned int
dec16){
•If(dec16<10){
•return dec16+’
•}
•return dec16-10+’A’;
•}
•void uint_to_hex(char*out,unsigned int in){
–in = in<<4|in>>28;
–*out++ = (char)dev16_to_hex(in&15);
•}
Part 6 Bit Field
•
1.Avoid using bit-field, use #define or enum instead!
•Bad example:
•void A(); void B(); void C();
•typedef struct{
unsigned int a:1;
unsigned int b:1;
unsigned int c:1;
}P;
•void bit(P* p){
•If(p->a)
•A();
•If(p->b)
•B();
•If(p->c)
•C();
•}
This cause the
access of bit field by pointer so that load data more than normal!
•Good ones:
•nsigned int P;
• #define Ba (1u<<0)
•#define Bb (1u<<1)
•#define Bc (1u<<2)
•void bit(unsigned int p){
•If(p&Ba)
•A();
•If(p&Bb)
•B();
•If(p&Bc)
•C();
•}
•
2.Use and or xor is good idear!
Part 7 Edge
Aligned And Endian Related
•1.Be care about __attribute__ ((packed));
•It leads to code uncross usage!Use char
type pointer to access the data to avoid the edge aligned problems.
•
2.When data to be processed is different from our cpu on endian(cpu
is little-endian while data is big endian), just use 8 bit type to access it
with 2 times to 16bit origin 4 times to 32 bit origin and also write different
sub-code is a good idea!
•
Sample:
void
read_audio_data(short* out,char* in,unsigned int N){
unsigned short*
data;
unsigned int
sample,next;
switch(in&1){
case 0:
do{
data = (unsigned
short*)in;
sample =
*(data++);
#ifdef
__BIG_ENDIAN
Sample =
sample<<8|sample>>8;
#endif
*(out++) =sample;
}while(N--);
break;
case 1:
data = (unsigned
short*)(in-1);
Sample = *(data++);
#ifdef
__BIG_ENDIAN
Sample =
sample&0xff;
#else sample =
sample>>8;
#endif
do{
next =*(data++);
#ifdef
__BIG_ENDIAN
*(out++)=(short)(next&0xff00)|sample;
Sample
=next&0xff;
#else
*(out++)=(short)
((next&<<8)|sample);
Sample
=next>>8;
#endif
break
}while(--N);
}
}
Part 8 About
Divide
And Floating
•
1.ARM doesn’t realize divide, so the armcc will use a C
library to simulate the divide function if in need! Avoiding using divide is a
good idea! If have to, just use unsigned or const divide, because they
work faster, the signed divide cast 50 cycles ,so much time!
•
2.Sometimes mod or % is better than divide!
•
3.Make full use of the result a=w/c and b=w%c to avoid divide;
•
4.Other logrithum so hard
Part 9 Embedded
assemble
•
1.Sometimes embedded assemble works much better, such as the C don’t support types (coprocessor and armv5e extended
command!)
•asm(“mcr
cp15,00,c7,c5,
Part 10 Cross
Platform
•
1.On ARM platform char is unsigned and if loop is ended with i>=0,it will
turned to be infinite, we need to care about it! We can solve it by giving an
gcc option or change the I type!
•
2.Old platform int is 16 bits so on arm they will be turned into 32 bits some
judgment turned false
•i = 0xf000
•
3.Data align
•
4.Big and little endian!
•
5.Function types( parameter types)
•
6.Bit-field
•
7.enumeration (use as less as possible)
•
8.Embedded assemble cause uncross
•
9.The port needed to use volatile so that the correct type and command
used(ldrsh strsh or ldr str)