Mandated performance optimizations on small to HPC-zzggbb-ChinaUnix博客

zhanggbzzggbb.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

zzggbb

博客访问： 490928
博文数量： 115
博客积分： 3777
博客等级：中校
技术积分： 1070
用户组：普通用户
注册时间： 2009-11-07 09:20

文章分类

全部博文（115）

pwscf（5）
Tools（46）
VASP（12）
Nonlinear（1）
Linux（45）
未分配的博文（6）

文章存档

2015年（1）

2013年（3）

2012年（26）

2011年（30）

2010年（34）

2009年（21）

我的朋友

相关博文

Mandated performance optimizations on small to HPC

分类： LINUX

2011-04-26 17:16:49

Here are the areas that one might need to optimize to achieve the best performance:

A. Optimize Applications
1. Compilers (Intel - Fortran compiler, AMD - AMD’s gnutools)
2. Math Libraries (Intel - Intel Math MKL, AMD - AMD’s ACML)
3. Optimization flags when compiled all the software components that would be used to create the production application binaries
4. For MPI applications: preferably MPICH2 or MPICH1, use OSC’s mpiexec. I have had bad experiences with OpenMPI (openmpi-1.1.1-8 on Rocks) but for the latest version 1.2.5 is ok.

B. Optimize Linux Operating System on nodes
1. NFS (rsize, wrsize, noatime, /etc/exports(async), adjust the number of daemons)
2. Ethernet card devices (MTU = 9000 (jumbo frames), Adaptive interrupt coalescing enabled, TCP Segmentation Offloading enabled, IRQ affinity on SMP (multi core/multi proc), TCP Offload Engine if supported)
3. Linux Kernel (Correct Processor family, Preemption Model (Server), disable preemption big kernel lock, Timer Frequency (100 Hz), CPU Frequence Scaling (Performance))
4. Turn off all unnecessary services

C. Optimize /proc related entries or /etc/sysctl.conf
….
####################################
#global read and write socket buffers (256 K)
net.core.rmem_default = 262144
net.core.wmem_default = 262144
#max size of read and write socket buffers (8 MB, new: 16 MB)
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
#read and write socket buffers specific to TCP
#min default max size (4KB 16MB 16MB) - default # <= rmem_max
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_mem = 786432 1048576 1572864

#turn off sack,dsack and timestamping, since this is local and HPC - not really needed
net.ipv4.tcp_sack=0
net.ipv4.tcp_dsack = 0
net.ipv4.tcp_timestamps=0

#turn on low_latency alg.
net.ipv4.tcp_low_latency=1

net.ipv4.ipfrag_high_thresh = 4194303
net.ipv4.ipfrag_low_thresh = 1048575
# don't cache ssthresh from previous connection
net.ipv4.tcp_no_metrics_save = 1

#max number of incoming packets queued (350) for delivery to device queue
net.core.netdev_max_backlog = 2500

#max accept queue backlog (default: 128)
#net.core.somaxconn = 256
#default 1024, could increase if connections from clients dropped
#net.ipv4.tcp_max_syn_backlog = 1024

#shmmax: max shared mem segment size in bytes
kernel.shmmax = 4294967296
#shmmin: 1byte-2GB: min shared memory segment size in bytes

#shmni: max number of share mem segments
#shmall <= shmmax / shmmni(4096)
#max shared mem segment size in pages system wide
kernel.shmall = 4294967296

#max sem per array, max sem sys wide, max ops per semop call,max # of arrays
#semmsl(8000), semmns, semmop(8000), semmni(32767)
#semmns <= semmsl * semmni
#kernel.semms <= Total system physical memory

#MPI apps use a lot of semaphores
kernel.sem = 1000 51200 128 1024
############################################

D. Do not go cheap on networking switches: From my experiences, everybody should get a non-blocking forwarding switch (Nortel BayStack 5510, Extreme Network Summit x450e)

E. (.bashrc) Minimized glibc’s malloc/free operations for mpi applications, especially on NUMA architecture
export MALLOC_MMAP_MAX_=0
export MALLOC_TRIM_THRESHOLD_=-1
#export MALLOC_TOP_PAD_=2097152

http://blogold.chinaunix.net/u1/50058/showart_1300144.html

阅读(834) | 评论(0) | 转发(0) |

上一篇：intel fortran 并行编译参数

下一篇：linux下使用uuencode+mail发送附件[技术]

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6