Benchmark of Molecular Dynamics Simulations with the Multi-GPU Enabled General-purpose GNN Force Field SevenNet#
SevenNet (Scalable EquiVariance Enabled Neural Network)12 is a general-purpose force field that uses graph neural networks (GNNs) to compute interatomic interactions and perform molecular dynamics simulations for systems with various compositions. Among the GNN force fields, a notable feature of SevenNet is its ability to achieve high-speed parallel execution using multiple GPUs by utilizing CUDA-aware OpenMPI.
SevenNet is publicly available in the form of an implementation for LAMMPS. However, with the integrated nanomaterial analysis GUI Advance/NanoLabo, calculations using SevenNet can be performed through a graphical user interface.
In this case study, we benchmarked large-scale molecular dynamics simulations of the sulfide-type lithium ion conductor Li10GeP2S12 by performing parallel execution using multiple GPUs.
Computational Environment#
The specifications of the computational environment used in this case study are as follows:
- CPU: Intel Xeon Gold 6338 (32 cores) × 2
- GPU: NVIDIA A100 (VRAM 40GB) × 8
- Memory: 1024GB
- CUDA: 12.2
The computational environment was created with the cooperation of Fujitsu Limited, utilizing the "Fujitsu Cloud Service HPC" from Fujitsu Computing as a Service (CaaS).
Computational Conditions#
Starting from the structural file (mp-696128) for Li10GeP2S12 obtained from the Materials Project, we generated supercells to prepare large-scale systems with 21600 atoms and 98000 atoms. For these systems, we performed molecular dynamics simulations for 100 steps using the pre-trained model "parallel SevenNet-0", with an NVT ensemble (500K) and a time step of 0.5 fs. The same calculation was repeated while varying the number of GPUs used from 1 to 8, and the computation times were measured. The number of MPI processes per GPU was set to 1, and the number of threads per process was set to 1.
Benchmark Results#
The table below shows the computation times for different numbers of GPUs used, as well as the simulation times for running under the same conditions for one day. For the 21600-atom system, an out-of-memory (OOM) error occurred when using 1 GPU, and for the 98000-atom system, OOM errors occurred when using 4 or fewer GPUs. It appears that multiple GPUs are essential for systems of this scale.
Computation Time for 100 Steps (s) | Simulation Time for 1 Day Run (ns/day) | |||
---|---|---|---|---|
21600-atom system | 98000-atom system | 21600-atom system | 98000-atom system | |
GPU×1 | OOM | OOM | OOM | OOM |
GPU×2 | 74 | OOM | 0.060 | OOM |
GPU×3 | 51 | OOM | 0.088 | OOM |
GPU×4 | 40 | OOM | 0.112 | OOM |
GPU×5 | 34 | 204 | 0.132 | 0.022 |
GPU×6 | 28 | 115 | 0.163 | 0.039 |
GPU×7 | 25 | 101 | 0.184 | 0.044 |
GPU×8 | 23 | 87 | 0.204 | 0.052 |
Note that the SevenNet paper1 reported a computational speed of approximately 0.1 ns/day3 for a 0.1-million-atom Si3N4 system using 8 A100 80GB GPUs, and our results are generally consistent with that.
To examine the parallel efficiency with respect to the number of GPUs used, we plot the relative computational speed. Typically, the computation time with 1 GPU is used as the reference, but since that was not available in our case, we use half the computation time for the 21600-atom system with 2 GPUs as the reference. For the 98000-atom system, we use the same reference, assuming that the computation time scales linearly with the number of atoms.
Ideally, the relative computational speed should scale linearly with the number of GPUs used, and the closer it is to this linear scaling, the better the parallel efficiency. For the 21600-atom system, the parallel efficiency gradually decreases as more GPUs are used. However, for the 98000-atom system, the relative speed remains nearly constant from 6 to 8 GPUs, suggesting that with larger atom counts, multiple GPUs can be utilized more efficiently.
Additionally, we observed a lower computational speed when using 5 GPUs. Initially, we suspected that an imbalance in the atom distribution among GPUs might be causing inefficiency. However, upon verification, the atoms were evenly distributed. This decrease in speed is likely due to factors such as the shape of the cell decomposition, which may have increased the communication overhead between GPUs.
Overall, these results demonstrate that molecular dynamics simulations with SevenNet can be significantly accelerated by utilizing parallel execution across multiple GPUs.
関連ページ#
- ナノ材料解析統合GUI Advance/NanoLabo
- 解析分野:ナノ・バイオ
- 産業分野:材料・化学
- Advance/NanoLabo Product Information
- Advance/NanoLabo Documentation
-
Y. Park, J. Kim, S. Hwang, and S. Han, "Scalable Parallel Algorithm for Graph Neural Network Interatomic Potentials in Molecular Dynamics Simulations". arXiv, arXiv:2402.03789. (2024) (https://arxiv.org/abs/2402.03789) ↩↩
-
S. Batzner, A. Musaelian, L. Sun, M. Geiger, J. P. Mailoa, M. Kornbluth, N. Molinari, T. E. Smidt, and B. Kozinsky, "E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials". Nat. Commun., 13, 2453. (2022) (https://www.nature.com/articles/s41467-022-29939-5) ↩
-
The simulation time depends on the time step, so direct comparison is not possible. ↩