コンテンツにスキップ

[Advance/NeuralMD Pro] Benchmarks on the cloud with 32 GPUs#

We made benchmarks of molecular dynamics (MD) calculation using Neural Network Potential via LAMMPS on a cloud with 32 GPUs. We executed MD calculations of the system used in the benchmarking on Mat3ra, which is the supercell model of sulfide-type lithium ion conductor Li10GeP2S12 including 21600 atoms. Also, we used similar supercell model including 98000 atoms.

Calculation Environment#

We show the spec for each node of the cloud environment below. The cloud environment has 4 nodes.

  • CPU:AMD EPYC 7J13 (64cores, 2.55GHz) x2
  • Memory:2048GB
  • GPU:NVIDIA A100 ×8
  • Storage:120GB NVMe + 27.2TB NVMe SSD (7.68TB ×4)
  • Network:2 ×50Gbps + 16×100Gbps RDMA
  • Compiler:Intel oneAPI 2021.7.1
  • MPI library:OpenMPI 4.1.3
  • Linear algebra operation library:Intel MKL 2022.2.1
  • CUDA:11.4

HPC Systems cooperated in the preparation and use of the computing environment.

Results of Benchmarking#

The conditions and results of the calculations are shown below. In the calculation environment, 8 GPU devices are available per 1 node. We set number of MPI processes on each GPU device as 4 and number of OpenMP threads as 1 for all calculations. MD calculations were executed for 100 steps, and the time step was 0.5 fs. Moreover, we show the durations to execute similar MD simulations for 1 ns, which are estimated from the calculation results.

                                          GPU × 1 GPU × 2 GPU × 4 GPU × 8 GPU × 16 GPU × 24 GPU × 32
Number of node 1 1 1 1 2 3 4
MPI
Process
4 8 16 32 64 96 128
Calculation time(sec)
(21600 atoms)
15.35 7.84 4.16 2.29 1.41 1.14 1.03
Calculation time(sec)
(98000 atoms)
69.12 35.64 18.08 9.84 4.97 3.42 2.63
day/ns
(21600 atoms)
3.55 1.82 0.96 0.53 0.33 0.26 0.24
day/ns
(98000 atoms)
16.00 8.25 4.19 2.28 1.15 0.79 0.61


We show the relative calculation speed to the calculation of benchmarking on Mat3ra executed by 1 CPU device (Intel Xeon Platinum/72core) on AWS below. However, we didn't executed calculation of the system including 98000 atoms using 1 CPU device on AWS. So, we estimated the calculation time for the system including 98000 atoms from the result of the system including 12600 atoms assuming that calculation time is proportional to the number of atoms. We can see that the calculation is accelerated about 260 times faster by the cloud with 32 GPUs from the figure below.

Also, we show the relative calculation speed to the calculation executed by 1 GPU device for each system below. In addition, we plotted the ideal relative calculation speed. We can see that the parallelization efficiency for the system including 21600 atoms decreases as number of GPU increases. On the other hand, the parallelization using 32 GPU devices accelerates the calculation of the system including 98000 atoms efficiently. When 32 GPU devices are used, the number of atoms assigned to each GPU device is about 700 for the system including 12600 atoms and about 3000 atoms for the system including 98000 atoms, respectively. It seems that so many atoms are assigned to each devices for the system including 98000 atoms compared to the system including 12600 atoms that the parallelization becomes efficient.

The results of benchmarking show that it takes about 15 hours to execute the MD calculation of the system including about 100000 atoms for 1ns using Neural Network Potential on the cloud with 32GPUs. Clouds with GPUs will enable you to execute MD simulations of more realistic models with short calculation time and high accuracy.

関連ページ#