Chinaunix首页 | 论坛 | 博客
  • 博客访问: 15483419
  • 博文数量: 2005
  • 博客积分: 11986
  • 博客等级: 上将
  • 技术积分: 22535
  • 用 户 组: 普通用户
  • 注册时间: 2007-05-17 13:56
文章分类

全部博文(2005)

文章存档

2014年(2)

2013年(2)

2012年(16)

2011年(66)

2010年(368)

2009年(743)

2008年(491)

2007年(317)

分类:

2009-08-10 13:08:07

 


符合EABI标准交叉编译器: arm-linux-gcc-4.3.2 with EABI
·说明
交 叉编译器在编译的时候,对于浮点运行会预设硬浮点运算FPA(Float Point Architecture),而没有FPA的CPU,比如SAMSUNG S3C2410/S3C2440,会使用FPE(Float Point Emulation 即软浮点),这样在速度上就会遇到极大的限制,使用EABI(Embedded Application Binary
Interface)则可以对此改善处理,ARM EABI有许多革新之处,其中最突出的改进就是Float Point Performance,它使用Vector Float Point(矢量浮点),因此可以极大提高涉及到浮点运算的程序
下面一篇文章对此做了详细的描述。
·为你带来的好处
最新linux软件系统即采用了统一的符合EABI标准的交叉编译器,并使用了新的glibc库2.8,使用同一个编译器,你可以编译
- linux内核(linux-2.6.29)
- qtopia-2.2.0图形系统
- busybox
- vivi(开源的bootloader)
- u-boot(开源的bootloader)
- 其他很多linux应用程序(如web server, boa, madplay等程序)
首先这可以提高程序的浮点运算性能,其次你可以不必把时间花费在切换不同的编译器上。
 
 
·下载
(with EABI) 86MB
 

   
Why ARM's EABI Matters
by Andres Calderon and Nelson Castillo


It's common nowadays to hear of the new ARM EABI (embedded application binary interface) Linux port. There are many motivations to start using it, but there is one we especially like -- it's much faster for floating point operations. Since many ARM cores lack a hardware FPU (floating point unit), any software acceleration is more than welcome.

It might be hard to switch to EABI, though. For instance, for the Debian distribution, EABI is actually considered a new port.

Without EABI

The ARM EABI improves the floating point performance. This is not surprising, if you read how your processor is wasting a lot of cycles now. From the Debian wiki:
The current Debian port creates hardfloat FPA instructions. FPA comes from "Floating Point Accelerator." Since the FPA floating point unit was implemented only in very few ARM cores, these days FPA instructions are emulated in kernel via Illegal instruction faults. This is of course very inefficient: about 10 times slower that -msoftfloat for a FIR test program. The FPA unit also has the peculiarity of having mixed-endian doubles, which is usually the biggest grief for ARM porters, along with structure packing issues.
So, what does this mean? It means that the compilers usually generate instructions for a piece of hardware, namely a Floating Point Unit, that is not actually there! When you make a floating point operation, such at 3.58*x, the CPU runs into an illegal instruction, and it raises an exception. The kernel catches this specific exception and performs the intended float point operation, and then resumes executing the program. And this is slow because it implies a context switch.

The benchmark

We decided to make a simple benchmark using our Open Hardware Free ECB_AT91 ARM(ARMv4t) development board, based on an Atmel processor.

    
The ECB_AT91, top and bottom

We used a simple benchmark we have used before: the dot product of two given vectors, the Euclidean distance of the vectors, and the FFT (fast Fourier transform) algorithm (complex valued, Cooley and Tukey radix-2). The source code we used is available (GPL).

It's common to use the number of floating point operations per second (FLOPS) performed by a given program for benchmarking purposes. However, this can be misleading, because some operations (e.g. division) take more time than others (e.g. addition). To ensure uniformity, we ran the same program in both setups, with similar compiler flags.

First we tried the Old ABI using the Debian distribution (Debian Sid), and an image that we . Then, for the EABI test, we used the , part of the project.

Results


EABI vx. OABI, floating point benchmark (Free_ECP_AT91_V1.5, AT92RM9200)



EABI/OABI speed-up, floating point benchmark (Free_ECB_AT91_V1.5, AT92RM9200)

In each context switch, both the data and instruction cache are flushed, and this hurts the Old ABI's performance. You will notice it in the graphs because the performance with the old ABI does not depend on the size (N) of the input data, whereas in EABI the impact of the cache in the performance is seen clearly. The dot-product performance only goes down when N > 4096 (When we use more than 16KB in memory); the Atmel processor we're using has a 16 Kbyte data cache.
 
本介绍来自于:http://linuxdevices.com/articles/AT5920399313.html
         
         
Linux(开放且统一的BSP和Bootloader,可自适应64M/128M-1GNand Flash mini2440/micro2440)
· ·  
· ·  
· ·  
· ·  
·make yaffs2 image tools(含64M和128M版本) ·binary images  
· ·
       
WindowsCE5(开放且统一的BSP和Bootloader,可自适应64M/128M-1GNand Flash mini2440/micro2440)
·(源代码, 4.8M)(2009-7-19) ·(点右键,另存为即可,适用于mini2440/micro2440)  
·(源代码,支持开机图片加载显示和进度条)(2009-7-19) ·(WinCE启动画面制作工具)(2.5M)  
·(234M) ·(34M)  
·(6.5M) ·WindowsCE5映象文件(含N35,T35,L80,A80,VGA_1024x768,仅供测试学习使用) (2009-7-23)
         
2440test-20090719 (含源代码和烧写文件, 13M)
    2440test是源自三星的一个非操作系统测试程序...由友善之臂改进,可自适应64M/128M-1G Nand Flash mini2440/micro2440
         
uCos2-20090719 (含源代码和烧写文件, 5.5M)
    uCos2由网友提供,仅供学习参考使用...由友善之臂改进,可自适应64M/128M-1G Nand Flash mini2440/micro2440
         
常用工具和软件(Tools)        
· ·(含64M和128M版本)  
· (1.3M) ·  
·(WinCE启动画面制作工具,适用于WinXP/Vista, 2.5M) ·(Linux Logo制作工具, 适用于Fedora 9平台, 4.7M)  
         
用户手册(User Manual)      
·mini2440 : (20M) (2009-7-28)      
·micro2440 : (20M) (2009-7-28)      
         
光盘映象        
·mini2440 : (2.8G) MD5: 4F3E3716E0419018707DFF3007C5DF24
·micro2440 : (2.61G)    
         
阅读(13505) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~