分类:
2011-05-04 23:12:37
VASP Flags Affecting Parallel RuntimeThe primary purpose of the Wolfgang cluster is to run ab-initio first principles calculations, primarily using the VASP (Vienna Ab-initio Simulation Package) program. There are many choices that will affect a calculation's runtime which can be made in VASP, so this page attempts to explain them for VASP users. This page only offers a cursory overview of how these options affect the performance on Wolfgang, and is not a replacement for the VASP , which should be consulted for more in-depth information about these options. The four main flags which affect the runtime for parallel jobs (and don't change the results) are:
Of these,
Notice, that The flag which influences the runtime the most is After selecting the algorithm, the next most important flag is
Selecting the Number of CPUsThe most important aspect of running VASP on Wolfgang is that the runtime does not scale linearly with the number of CPUs. It scales a little worse than linearly. The exact scaling depends on the particulars of the job, but there is an initial jump in the total CPU hours used of about 20% when going from 2 CPUs to 4 CPUs. Another way to say this is that the wall time for 4 CPUs is 60% of the wall time for 2 CPUs, instead of the linear 50%. Subsequent jumps in the total CPU hours seem to be about 4% for every 2 extra CPUs, so using 8 CPUs would increase the total CPU hours by 20% + 2 * 4% = 28%. In other words, 8 CPUs would give a wall time of 32% of the 2 CPU wall time instead of the linear 25%, and it would be 52% of the 4 CPU wall time instead of 50%. This non-linear scaling is primarily the result of using Gig-E ethernet as the nodal interconnect for the dual-CPU nodes. On the 2x2 nodes, there is a performance penalty for running all of the cores at the same time, caused by bottlenecks in the memory access and CPU internals. The penalty for using larger numbers of compute nodes is lower, because the communications become more distributed, and there are performance gains found in reducing the problem size on each CPU. But on Wolfgang, you will always consume more CPU hours as you use more CPUs, although your result will arrive in less time. So what does this non-linear scaling mean when submitting jobs? Let's say you have two equal jobs to run. Each job will take 20 hours if run on two processors. You have two choices to run the jobs, each on 2 processors concurrently, or sequentially on 4. Running concurrently, the jobs will complete in 20 hours. Run sequentially, each job will take 12 hours, so both will be complete in 24 hours. If you're pushing a deadline, that's 4 extra hours out of a working day. It also means 4 fewer hours on 4 CPUs (16 CPU hours) for the other people using Wolfgang. Also to consider when submitting these jobs is that when you submit the jobs concurrently, they run. If the jobs are sequential, then you may find that somebody else's job started before your second job. All of this does not mean that you shouldn't use a lot of CPUs, just that you should balance time against CPU usage. In other words, think about when you need the results, and chose a number of CPUs per job that will enable you to finish by that time, with a preference to minimize the number of CPUs per job. This helps your tasks consume less time as well as ensures other users the ability to run their tasks as well. Another consideration are the memory requirements of your jobs. Each node only has 2GB of memory per CPU core. As you increase the number of CPU's, the memory per CPU is reduced. Other Flags Affecting Runtime (and possibly results)The vast majority of the CPU time consumed by VASP is in the diagonalization of the bands. This time is proportional to the number of bands times the number of plane waves squared ( There is one caveat to this; the RMM-DIIS algorithm needs some extra bands to increase the rate at which it converges. In this case, there is a trade-off between the extra time per minimization step for each extra band, and the reduction in the number of minimization steps. The explains this more in-depth and gives a general rule of thumb for the number of bands. Another tag to investigate is The runtime is also linear with the number of [irreducible] k-points. Of course, the primary factor in the determination of ENCUT, NBANDS, and your k-point set is based on the accuracy you need in your calculations. The number of electrons can also be determined by pseudopotential selection, which is again determined by the accuracy you need. The only way to determine if you have the required accuracy is testing. Start with the predicted settings and then run a set of test calculations with different settings to see if you can get addequate accuracy with shorter runtimes. Of course, this is only practical if you will use the same basic system for multiple calculations. For a system you will use only a few times, you can frequenctly get a good idea of the settings to use by testing with a small cell of the bulk material, and extrapolating. |