编译得到intel的libfftw3xf_intel.a (不知道有没有人对比过不同fft对速度的影响) cd /opt/intel/mkl/10.1.0.015/interfaces/fftw3xf make libem64t compiler=intel ===================================================================== 1: 编译openmpi cd /home/msemsi/share/openmpi-1.3.2 ./configure CC=icc CXX=icpc F77=ifort FC=ifort 这里编译器最好与你后面编译vasp的编译器一致,否则容易编译出错。 then make & make install openmpi的设置可以参考相关资料,注意把ifort, mkl, icc, openmpi的lib加到INCLUDE路径中,否则也可能出错。 比如我的系统上 echo $INCLUDE /opt/intel/mkl/10.1.0.015/include:/opt/intel/cce/10.1.018/include:/opt/intel/Compiler/11.0/074/include:/usr/local/include:/usr/include:/include:/opt/intel/mkl/10.1.0.015/include ===================================================================== vasp编译过程: 2: Install VASP lib mkdir /home/msemsi/share/vasp/vasp.5.lib/ cd /home/msemsi/share/vasp/vasp.5.lib/ tar xvfz /home/msemsi/share/vasp/vasp.5.lib.tar cd vasp.5.lib/ cp makefile.linux_ifc_P4 makefile vim makefile 前几行改成这样: CPP = /opt/intel/cce/10.1.018/bin/icc -E -P -C $*.F >$*.f CC= /opt/intel/cce/10.1.018/bin/icc FC= /opt/intel/Compiler/11.0/074/bin/intel64/ifort CFLAGS = -O OFLAGS = -O3 -align -xT FFLAGS = -I/opt/intel/mkl/10.1.0.015/include/fftw -FR -lowercase -assume byterecl $(OFLAGS) FREE = -FR
DOBJ后面的就不用改了,其实保险起见,只要把原始makefile里CPP,FC改一下就行了
Then make ===================================================================== 3: Compile VASP parallel general-k verision by openmpi cd /home/msemsi/share/vasp/vasp.5.2 tar xvfz /home/msemsi/share/vasp/vasp.5.2.tar cd vasp.5.2 cp makefile.linux_ifc_P4 makefile 修改makefile如下(并行版本), .SUFFIXES: .inc .f .f90 .F #----------------------------------------------------------------------- # Makefile for Intel Fortran compiler for Pentium/Athlon/Opteron # bases systems # we recommend this makefile for both Intel as well as AMD systems # for AMD based systems appropriate BLAS and fftw libraries are # however mandatory (whereas they are optional for Intel platforms) # # The makefile was tested only under Linux on Intel and AMD platforms # the following compiler versions have been tested: # - ifc.7.1 works stable somewhat slow but reliably # - ifc.8.1 fails to compile the code properly # - ifc.9.1 recommended (both for 32 and 64 bit) # - ifc.10.1 partially recommended (both for 32 and 64 bit) # tested build 20080312 Package ID: l_fc_p_10.1.015 # the gamma only mpi version can not be compiles # using ifc.10.1 # # it might be required to change some of library pathes, since # LINUX installation vary a lot # Hence check ***ALL*** options in this makefile very carefully #----------------------------------------------------------------------- # # BLAS must be installed on the machine # there are several options: # 1) very slow but works: # retrieve the lapackage from ftp.netlib.org # and compile the blas routines (BLAS/SRC directory) # please use g77 or f77 for the compilation. When I tried to # use pgf77 or pgf90 for BLAS, VASP hang up when calling # ZHEEV (however this was with lapack 1.1 now I use lapack 2.0) # 2) more desirable: get an optimized BLAS # # the two most reliable packages around are presently: # 2a) Intels own optimised BLAS (PIII, P4, PD, PC2, Itanium) # http://developer.intel.com/software/products/mkl/ # this is really excellent, if you use Intel CPU's # # 2b) probably fastest SSE2 (4 GFlops on P4, 2.53 GHz, 16 GFlops PD, # around 30 GFlops on Quad core) # Kazushige Goto's BLAS # # # #----------------------------------------------------------------------- # all CPP processed fortran files have the extension .f90 SUFFIX=.f90 #----------------------------------------------------------------------- # fortran compiler and linker #----------------------------------------------------------------------- FC=/opt/intel/Compiler/11.0/074/bin/intel64/ifort # fortran linker FCL=$(FC)
#----------------------------------------------------------------------- # whereis CPP ?? (I need CPP, can't use gcc with proper options) # that's the location of gcc for SUSE 5.3 # # CPP_ = /usr/lib/gcc-lib/i486-linux/2.7.2/cpp -P -C # # that's probably the right line for some Red Hat distribution: # # CPP_ = /usr/lib/gcc-lib/i386-redhat-linux/2.7.2.3/cpp -P -C # # SUSE X.X, maybe some Red Hat distributions: CPP_ = ./preprocess <$*.F | /usr/bin/cpp -P -C -traditional >$*$(SUFFIX) #----------------------------------------------------------------------- # possible options for CPP: # NGXhalf charge density reduced in X direction # wNGXhalf gamma point only reduced in X direction # avoidalloc avoid ALLOCATE if possible # PGF90 work around some for some PGF90 / IFC bugs # CACHE_SIZE 1000 for PII,PIII, 5000 for Athlon, 8000-12000 P4, PD # RPROMU_DGEMV use DGEMV instead of DGEMM in RPRO (depends on used BLAS) # RACCMU_DGEMV use DGEMV instead of DGEMM in RACC (depends on used BLAS) #----------------------------------------------------------------------- #CPP = $(CPP_) -DHOST=\"LinuxIFC\" \ -Dkind8 -DCACHE_SIZE=12000 -DPGF90 -Davoidalloc -DNGXhalf \ # -DRPROMU_DGEMV -DRACCMU_DGEMV #----------------------------------------------------------------------- # general fortran flags (there must a trailing blank on this line) # byterecl is strictly required for ifc, since otherwise # the WAVECAR file becomes huge #----------------------------------------------------------------------- FFLAGS = -FR -lowercase -assume byterecl #----------------------------------------------------------------------- # optimization # we have tested whether higher optimisation improves performance # -axK SSE1 optimization, but also generate code executable on all mach. # xK improves performance somewhat on XP, and a is required in order # to run the code on older Athlons as well # -xW SSE2 optimization # -axW SSE2 optimization, but also generate code executable on all mach. # -tpp6 P3 optimization # -tpp7 P4 optimization #----------------------------------------------------------------------- # ifc.9.1, ifc.10.1 recommended OFLAG=-O3 OFLAG_HIGH = $(OFLAG) OBJ_HIGH = OBJ_NOOPT = DEBUG = -FR -O0 INLINE = $(OFLAG) #----------------------------------------------------------------------- # the following lines specify the position of BLAS and LAPACK # VASP works fastest with the libgoto library # so that's what we recommend #----------------------------------------------------------------------- # mkl.10.0 # set -DRPROMU_DGEMV -DRACCMU_DGEMV in the CPP lines #BLAS=-L/opt/intel/mkl100/lib/em64t -lmkl -lpthread # even faster for VASP Kazushige Goto's BLAS # # parallel goto version requires sometimes -libverbs BLAS= /lib64/libgoto_penrynp-r1.26.so # LAPACK, simplest use vasp.5.lib/lapack_double LAPACK= ../vasp.5.lib/lapack_double.o # use the mkl Intel lapack #LAPACK= -lmkl_lapack #----------------------------------------------------------------------- #LIB = -L../vasp.5.lib -ldmy \ ../vasp.5.lib/linpack_double.o $(LAPACK) \ $(BLAS) # options for linking, nothing is required (usually) LINK = #----------------------------------------------------------------------- # fft libraries: # VASP.5.2 can use fftw.3.1.X ( # since this version is faster on P4 machines, we recommend to use it #----------------------------------------------------------------------- #FFT3D = fft3dfurth.o fft3dlib.o # alternatively: fftw.3.1.X is slighly faster and should be used if available #FFT3D = fftw3d.o fft3dlib.o /opt/libs/fftw-3.1.2/lib/libfftw3.a
#======================================================================= # MPI section, uncomment the following lines until # general rules and compile lines # presently we recommend OPENMPI, since it seems to offer better # performance than lam or mpich # # !!! Please do not send me any queries on how to install MPI, I will # certainly not answer them !!!! #======================================================================= #----------------------------------------------------------------------- # fortran linker for mpi #----------------------------------------------------------------------- FC=/usr/local/bin/mpif90 FCL=$(FC) #----------------------------------------------------------------------- # additional options for CPP in parallel version (see also above): # NGZhalf charge density reduced in Z direction # wNGZhalf gamma point only reduced in Z direction # scaLAPACK use scaLAPACK (usually slower on 100 Mbit Net) #----------------------------------------------------------------------- CPP = $(CPP_) -DMPI -DHOST=\"LinuxIFC\" -DIFC \ -Dkind8 -DCACHE_SIZE=16000 -DPGF90 -Davoidalloc -DNGZhalf \ -DMPI_BLOCK=8000 -DRPROMU_DGEMV -DRACCMU_DGEMV #----------------------------------------------------------------------- # location of SCALAPACK # if you do not use SCALAPACK simply leave that section commented out #----------------------------------------------------------------------- #BLACS=$(HOME)/archives/SCALAPACK/BLACS/ #SCA_=$(HOME)/archives/SCALAPACK/SCALAPACK #SCA= $(SCA_)/libscalapack.a \ # $(BLACS)/LIB/blacsF77init_MPI-LINUX-0.a $(BLACS)/LIB/blacs_MPI-LINUX-0.a $(BLACS)/LIB/blacsF77init_MPI-LINUX-0.a SCA= #----------------------------------------------------------------------- # libraries for mpi #----------------------------------------------------------------------- LIB = -L../vasp.5.lib -ldmy \ ../vasp.5.lib/linpack_double.o \ -L/opt/intel/mkl/10.1.0.015/lib/em64t \ -lmkl_em64t -lguide -lpthread -lm \ -L/opt/intel/Compiler/11.0/074/lib/intel64 -lsvml -limf \ -L/opt/intel/cce/10.1.018/lib -lsvml -limf #LINK = -static # FFT: fftmpi.o with fft3dlib of Juergen Furthmueller #FFT3D = fftmpi.o fftmpi_map.o fft3dfurth.o fft3dlib.o # alternatively: fftw.3.1.X is slighly faster and should be used if available FFT3D = fftmpi.o fftmpi_map.o fftw3d.o fft3dlib.o \ /opt/intel/mkl/10.1.0.015/lib/em64t/libfftw3xf_intel.a #----------------------------------------------------------------------- # general rules and compile lines #----------------------------------------------------------------------- BASIC= symmetry.o symlib.o lattlib.o random.o SOURCE= base.o mpi.o smart_allocate.o xml.o \ constant.o jacobi.o main_mpi.o scala.o \ asa.o lattice.o poscar.o ini.o xclib.o xclib_grad.o \ radial.o pseudo.o mgrid.o gridq.o ebs.o \ mkpoints.o wave.o wave_mpi.o wave_high.o \ $(BASIC) nonl.o nonlr.o nonl_high.o dfast.o choleski2.o \ mix.o hamil.o xcgrad.o xcspin.o potex1.o potex2.o \ metagga.o constrmag.o cl_shift.o relativistic.o LDApU.o \ paw_base.o egrad.o pawsym.o pawfock.o pawlhf.o paw.o \ mkpoints_full.o charge.o dipol.o pot.o \ dos.o elf.o tet.o tetweight.o hamil_rot.o \ steep.o chain.o dyna.o sphpro.o us.o core_rel.o \ aedens.o wavpre.o wavpre_noio.o broyden.o \ dynbr.o rmm-diis.o reader.o writer.o tutor.o xml_writer.o \ brent.o stufak.o fileio.o opergrid.o stepver.o \ chgloc.o fast_aug.o fock.o mkpoints_change.o sym_grad.o \ mymath.o internals.o dimer_heyden.o dvvtrajectory.o vdwforcefield.o \ hamil_high.o nmr.o force.o \ pead.o subrot.o subrot_scf.o pwlhf.o gw_model.o optreal.o davidson.o \ electron.o rot.o electron_all.o shm.o pardens.o paircorrection.o \ optics.o constr_cell_relax.o stm.o finite_diff.o elpol.o \ hamil_lr.o rmm-diis_lr.o subrot_cluster.o subrot_lr.o \ lr_helper.o hamil_lrf.o elinear_response.o ilinear_response.o \ linear_optics.o linear_response.o \ setlocalpp.o wannier.o electron_OEP.o electron_lhf.o twoelectron4o.o \ ratpol.o screened_2e.o wave_cacher.o chi_base.o wpot.o local_field.o \ ump2.o bse.o acfdt.o chi.o sydmat.o INC= vasp: $(SOURCE) $(FFT3D) $(INC) main.o rm -f vasp $(FCL) -o vasp main.o $(SOURCE) $(FFT3D) $(LIB) $(LINK) makeparam: $(SOURCE) $(FFT3D) makeparam.o main.F $(INC) $(FCL) -o makeparam $(LINK) makeparam.o $(SOURCE) $(FFT3D) $(LIB) zgemmtest: zgemmtest.o base.o random.o $(INC) $(FCL) -o zgemmtest $(LINK) zgemmtest.o random.o base.o $(LIB) dgemmtest: dgemmtest.o base.o random.o $(INC) $(FCL) -o dgemmtest $(LINK) dgemmtest.o random.o base.o $(LIB) ffttest: base.o smart_allocate.o mpi.o mgrid.o random.o ffttest.o $(FFT3D) $(INC) $(FCL) -o ffttest $(LINK) ffttest.o mpi.o mgrid.o random.o smart_allocate.o base.o $(FFT3D) $(LIB) kpoints: $(SOURCE) $(FFT3D) makekpoints.o main.F $(INC) $(FCL) -o kpoints $(LINK) makekpoints.o $(SOURCE) $(FFT3D) $(LIB) clean: -rm -f *.g *.f *.o *.L *.mod ; touch *.F main.o: main$(SUFFIX) $(FC) $(FFLAGS) $(DEBUG) $(INCS) -c main$(SUFFIX) xcgrad.o: xcgrad$(SUFFIX) $(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcgrad$(SUFFIX) xcspin.o: xcspin$(SUFFIX) $(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcspin$(SUFFIX) makeparam.o: makeparam$(SUFFIX) $(FC) $(FFLAGS)$(DEBUG) $(INCS) -c makeparam$(SUFFIX) makeparam$(SUFFIX): makeparam.F main.F # # MIND: I do not have a full dependency list for the include # and MODULES: here are only the minimal basic dependencies # if one strucuture is changed then touch_dep must be called # with the corresponding name of the structure # base.o: base.inc base.F mgrid.o: mgrid.inc mgrid.F constant.o: constant.inc constant.F lattice.o: lattice.inc lattice.F setex.o: setexm.inc setex.F pseudo.o: pseudo.inc pseudo.F poscar.o: poscar.inc poscar.F mkpoints.o: mkpoints.inc mkpoints.F wave.o: wave.inc wave.F nonl.o: nonl.inc nonl.F nonlr.o: nonlr.inc nonlr.F $(OBJ_HIGH): $(CPP) $(FC) $(FFLAGS) $(OFLAG_HIGH) $(INCS) -c $*$(SUFFIX) $(OBJ_NOOPT): $(CPP) $(FC) $(FFLAGS) $(INCS) -c $*$(SUFFIX) fft3dlib_f77.o: fft3dlib_f77.F $(CPP) $(F77) $(FFLAGS_F77) -c $*$(SUFFIX) .F.o: $(CPP) $(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX) .F$(SUFFIX): $(CPP) $(SUFFIX).o: $(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX) # special rules #----------------------------------------------------------------------- # these special rules are cummulative (that is once failed # in one compiler version, stays in the list forever) # -tpp5|6|7 P, PII-PIII, PIV # -xW use SIMD (does not pay of on PII, since fft3d uses double prec) # all other options do no affect the code performance since -O1 is used fft3dlib.o : fft3dlib.F $(CPP) $(FC) -FR -lowercase -O2 -c -xT -unroll0 -vec_report3 $*$(SUFFIX) fft3dfurth.o : fft3dfurth.F $(CPP) $(FC) -FR -lowercase -O1 -c $*$(SUFFIX) radial.o : radial.F $(CPP) $(FC) -FR -lowercase -O1 -c $*$(SUFFIX) symlib.o : symlib.F $(CPP) $(FC) -FR -lowercase -O1 -c $*$(SUFFIX) symmetry.o : symmetry.F $(CPP) $(FC) -FR -lowercase -O1 -c $*$(SUFFIX) wave_mpi.o : wave_mpi.F $(CPP) $(FC) -FR -lowercase -O1 -c $*$(SUFFIX) wave.o : wave.F $(CPP) $(FC) -FR -lowercase -O1 -c $*$(SUFFIX) dynbr.o : dynbr.F $(CPP) $(FC) -FR -lowercase -O1 -c $*$(SUFFIX) asa.o : asa.F $(CPP) $(FC) -FR -lowercase -O1 -c $*$(SUFFIX) broyden.o : broyden.F $(CPP) $(FC) -FR -lowercase -O2 -c $*$(SUFFIX) us.o : us.F $(CPP) $(FC) -FR -lowercase -O1 -c $*$(SUFFIX) LDApU.o : LDApU.F $(CPP) $(FC) -FR -lowercase -O2 -c $*$(SUFFIX)
Then make
Test: cd /home/msemsi/share/vasp rm -rf bench.Hg mkdir bench.Hg cd bench.Hg tar xvf /home/msemsi/share/vasp/bench.Hg.tar.gz mpirun -np 4 ../vasp.5.2/vasp ======================================================================================= 4.编译针对Gamma点计算的版本,采用下述设置,官方文档认为可提高一倍速度。 串行:CPP = ... cpp ... -DNGXhalf -DwNGXhalf ... 并行:CPP = ... cpp ... -DNGZhalf -DwNGZhalf ... #其实只需要添加-DwNGZhalf,前面的-DwNGZhalf在原始makefile中就有 touch *.F make vasp ======================================================================================= 5: Compile VASP-NEB with openmpi cd /home/msemsi/share/vasp/vtstcode tar xvfz .. /vasp.5.2.tar wget tar xvfz vtstcode.tar.gz mv vtstcode/* .
Almost the same Makefile as the step 3 "Compile VASP parallel general-k verision by openmpi", except to add between steep.o and chain.o with dimer.o dynmat.o neb.o lanczos.o instanton.o sd.o cg.o qm.o lbfgs.o bfgs.o fire.o opt.o
Be warned!!!!! The makefile on this web page will give a very slow VASP executable! It would be nice if INTEL support would post an updated version of this makefile. VASP needs to use the sequential mkl libraries (mkl_sequential), not the multithreaded libraries (mkl_intel_thread). Linking with mkl_intel_thread will generate a VASP executable which will run 2 to 10 times slower, if the environmental variable OMP_NUM_THREADS is not set to 1 另外如果-O2 参数编译的vasp速度不比-O3编译的快多少,为安全起见,尽量使用-O2来编译。 Ref2: VASP安裝編譯fft3dlib.F的問題和加入Core 2 Duo變數
問題2: 串平行編譯時,若遇到FFTW的fft3dlib.F出現以下問題: fortcom: Error: fft3dlib.f90, line 1627: Sharing of a DO termination statement by more than one DO statement is an obsolescent feature in Fortran 95. Use an END DO or CONTINUE statement for each DO statement. [20] 20 CONTINUE ---^ fortcom: Error: fft3dlib.f90, line 1704: The computed GOTO statement is an obsolescent feature in Fortran 95. GOTO (10,50,90,130,170,210,250),IGO ------^ fortcom: Error: fft3dlib.f90, line 2625: The computed GOTO statement is an obsolescent feature in Fortran 95. GOTO (10,50,90,130,170,210,250),IGO ------^ fortcom: Error: fft3dlib.f90, line 3531: The computed GOTO statement is an obsolescent feature in Fortran 95. GOTO (10,50,90,130,170,210,250),IGO ------^ fortcom: Error: fft3dlib.f90, line 4064: The computed GOTO statement is an obsolescent feature in Fortran 95. GOTO (1010,1050,1090,1130,1170,1210,1250),IGO ------^ compilation aborted for fft3dlib.f90 (code 1) make: *** [fft3dlib.o] Error 1 主因為fft3dlib.F主要撰寫語法為F77,但IFC是以F95語法去讀取,所以多少會出現警告訊息! 解決方式: 將第343行 $(FC) -FR -lowercase -O1 -tpp7 -xW -prefetch- -unroll0 -e95 -vec_report3 -c $*$(SUFFIX) 去掉"-e95"變數改成 $(FC) -FR -lowercase -O1 -tpp7 -xW -prefetch- -unroll0 -vec_report3 -c $*$(SUFFIX) 因為"-e95"變數會把F95編譯F77程式碼產生的警告(warning)改成錯誤(error)型式輸出,造成編譯強迫停止無法忽略跳過!