分类:
2008-12-22 20:35:52
免費登記註冊,即可下載
現在最新版本是2007.03.21釋出的GotoBLAS-1.14.tar.gz,已完全支援Core 2 Duo係列
其中新版安裝方式非常人性化,與在Linux的source code使用configure方式略有不同
對Linux底下使用x86和x86_64 CPU者,提供一個快速安裝的script,可快速簡測你的SMP和編譯器種類,至於支援的編譯器種類依此順序為依據PathScale→PGI→Intel→gfortran→g95→g77
Step 1:解壓縮
tar –zxvf GotoBLAS-1.14.tar.gz
Step 2:安裝GotoBLAS,在32 bit和64 bit的安裝分別如下
For 32 bit安裝: ./quickbuild.32bit
For 64 bit安裝: ./quickbuild.64bit
安裝完後,即會把函式庫建在剛剛解壓縮後的資料夾內
以Core 2 Duo為例,會主要產生3個檔案
libgoto.a
libgoto_core2p-r1.14.a 系統會自動依你的CPU型式來取名
libgoto_core2p-r1.14.so
若為特殊機器,則需藉由改寫getarch.c和Makefile.rule,將符合本身機器的參數前面註釋拿掉,並重新編譯即可產生函式庫,主要安裝過程請詳閱Quickinstallat.txt
------------------------------------------------------
Note: (安装完后的提示如下)
Done. This library is compiled with following conditions.
Binary ... 64bit
Fortran ... INTEL
SMP ... Enabled. You have to link library with -lpthread option.
-------------------------------------------------------
裝設GotoBLAS無非就是想讓vasp的執行速度加快,如何修改Makefile?步驟如下
Step 1:將所得到的3個重要函式庫其中之一,丟進vasp.4.lib內,或另設資料夾將其置入
/vasp/src/vasp.4.lib/libgoto_core2p-r1.14.so
Step 2:修改Makefile的BLAS路徑,先將全部舊有的BLAS路徑以#符號注釋起來,而後加入新的BLAS路徑
BLAS= ../vasp.4.lib/libgoto_core2p-r1.14.so
可參考量化網頁相關文章:
/Experience/CommonSoftwares/VASP/CompileInstallation/200512/27.html
Step 3:重新編譯Makefile即可
By 阿達仔 國立成功大學 原文地址
二If you compile the blas libraries with the threading turned on but the number of threads set to 1 you gain between 33 and 50% speed when doing larger calculations.
The User Configuration part of Makefile.rule:
#
# Beginning of user configuration
#
# This library's version
REVISION = -r1.26
# Which C compiler do you prefer? Default is gcc.
C_COMPILER = GNU
# C_COMPILER = INTEL
# C_COMPILER = PGI
# Now you don't need Fortran compiler to build library.
# If you don't spcifly Fortran Compiler, GNU g77 compatible
# interface will be used.
# F_COMPILER = G77
# F_COMPILER = G95
# F_COMPILER = GFORTRAN
F_COMPILER = INTEL
# F_COMPILER = PGI
# F_COMPILER = PATHSCALE
# F_COMPILER = IBM
# F_COMPILER = COMPAQ
# F_COMPILER = SUN
# F_COMPILER = F2C
# If you need 64bit binary; some architecture can accept both 32bit and
# 64bit binary(X86_64, SPARC, Power/PowerPC or WINDOWS).
BINARY64 = 1
# If you want to build threaded BLAS
SMP = 1
# You can define maximum number of threads. Basically it should be
# less than actual number of cores. If you don't specify one, it's
# automatically detected by script.
MAX_THREADS = 1
# If you want to use legacy threaded Level 3 implementation.
# Some architecture prefer this algorithm, but it's rare.
# USE_SIMPLE_THREADED_LEVEL3 = 1
# If you want to use GotoBLAS with accerelator like Cell or GPGPU
# This is experimental and currently won't work well.
# USE_ACCERELATOR = 1
# Define accerelator type (won't work)
# USE_CELL_SPU = 1
# Theads are still working for a while after finishing BLAS operation
# to reduce thread activate/deactivate overhead. You can determine
# time out to improve performance. This number should be from 4 to 30
# which corresponds to (1 << n) cycles. For example, if you set to 26,
# thread will be running for (1 << 26) cycles(about 25ms on 3.0GHz
# system). Also you can control this mumber by GOTO_THREAD_TIMEOUT
# CCOMMON_OPT += -DTHREAD_TIMEOUT=26
# If you need cross compiling
# (you have to set architecture manually in getarch.c!)
# Example : HOST ... G5 OSX, TARGET = CORE2 OSX
# CROSS_SUFFIX = i686-apple-darwin8-
# CROSS_VERSION = -4.0.1
# CROSS_BINUTILS =
# If you need Special memory management;
# Using HugeTLB file system(Linux / AIX / Solaris)
# HUGETLB_ALLOCATION = 1
# Using bigphysarea memory instead of normal allocation to get
# physically contiguous memory.
# BIGPHYSAREA_ALLOCATION = 1
# To get maxiumum performance with minimum impact to the system,
# mixing memory allocation may be worth to try. In this case,
# you have to define one of ALLOC_HUGETLB or BIGPHYSAREA_ALLOCATION.
# Another allocation will be done by mmap or static allocation.
# (Not implemented yet)
# MIXED_MEMORY_ALLOCATION = 1
# Using static allocation instead of dynamic allocation
# You can't use it with ALLOC_HUGETLB
# STATIC_ALLOCATION = 1
# If you want to use CPU affinity
# CCOMMON_OPT += -DUSE_CPU_AFFINITY
# If you want to use memory affinity (NUMA)
# You can't use it with ALLOC_STATIC
# NUMA_AFFINITY = 1
# If you want to use interleaved memory allocation.
# Default is local allocation(it only works with NUMA_AFFINITY).
# CCOMMON_OPT += -DINTERLEAVED_MAPPING
# If you want to drive whole 64bit region by BLAS. Not all Fortran
# compiler supports this. It's safe to keep comment it out if you
# are not sure.
# INTERFACE64 = 1
# If you have special compiler to run script to determine architecture.
GETARCH_CC +=
GETARCH_FLAGS +=