多媒体处理，利用ARM NEON/FPU提升performance-beyondjdg-ChinaUnix博客

小狼

首页　| 　博文目录　| 　关于我

beyondjdg

博客访问： 53193
博文数量： 8
博客积分： 15
博客等级：民兵
技术积分： 120
用户组：普通用户
注册时间： 2010-03-04 20:32

个人简介

aaaaaaaaaaaaaaaaaaaa

文章分类

全部博文（8）

Linux（3）
ARM（1）
Android Deb（4）
未分配的博文（0）

文章存档

2013年（8）

我的朋友

相关博文

多媒体处理，利用ARM NEON/FPU提升performance

分类：嵌入式

2013-09-22 09:44:17

在有些软件中需要大量的浮点运算，举个例子; 音频处理。

如果所用的CPU不带FPU，这些运算就要用软件实现，举个例子：
其中乘法操作，可能会用 __aeabi_dmul 来代替，

The ARM floating-point environment is an implementation of the IEEE 754-1985 standard for
binary floating-point arithmetic.
An ARM system might have:
? a VFP coprocessor
? no floating-point hardware.
If you compile for a system with a hardware VFP coprocessor, the ARM compiler makes use of
it. If you compile for a system without a coprocessor, the compiler implements the computations
in software. For example, the compiler option --fpu=vfp selects a hardware VFP coprocessor
and the option --fpu=softvfp specifies that arithmetic operations are to be performed in
software, without the use of any coprocessor instructions.

具体请参考：

__aeabi_dmul 2 double double Return x times y

这样会很慢。

如果硬件支持FPU，可以直接使用FPU来运算。
例如：上面的double 乘法操作，会直接使用：
vmul.f64 来完成，这样会很快。

我谢了一段code做了一个测试；
volatile double para_1 = 10.10;
volatile double para_2 = 10.10;
volatile double result;
int index;
for(index=0;index<0x1000000;index++)
{
result = para_1 * para_2;
}
同样的10M 次乘法操作，如果不使用FPU，消耗大约 1700ms 如果利用FPU，只需要350ms左右。
除法运算差异更大; 如果不适用FPU,需要6,700 ms ,使用FPU 只需要 515ms

如果CPU有FPU，则尽可能把他们利用起来，可以大幅度提升performance。

另外ARM根据不同CPU给出了另外的优化建议；
例如，Cortex A9 可以参考：
Cortex?-A9 Floating-Point Unit
Revision: r4p1
Technical Reference Manual
==> 1.3 Writing optimal FP code

另外，如果用的是Cortex-A 的ARM，同时可以考虑利用NEON来优化。具体请参考： Introducing NEON Development Article
It extends the SIMD concept by defining groups of instructions operating on vectors stored in 64-bit D, doubleword, registers and 128-bit Q, quadword, vector registers.

NEON这个feature已经集成到了gcc，可以直接使用。

NEON intrinsics with GCC
To use NEON intrinsics in GCC, you must specify -mfpu=neon on the compiler
command line:
arm-none-linux-gnueabi-gcc -mfpu=neon intrinsic.c
Depending on your toolchain, you might also have to add -mfloat-abi=softfp to indicate
to the compiler that NEON variables must be passed in general purpose registers.
A complete list of supported intrinsics can be found at

===>另外注意NEON跟VFP不是绝对存在的，但是ARM建议是有VFP就有NEON (Cortex-A系列都实现了NEON,)

阅读(5525) | 评论(0) | 转发(0) |

上一篇：Android进程间共享数据的讨论

下一篇：insmod 安装kernel module时遇到panic = 分析及解决

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6