Benchmark of Molecular Dynamics Simulations with the Multi-GPU Enabled General-purpose GNN Force Field SevenNet#

SevenNet (Scalable EquiVariance Enabled Neural Network)¹² is a general-purpose force field that uses graph neural networks (GNNs) to compute interatomic interactions and perform molecular dynamics simulations for systems with various compositions. Among the GNN force fields, a notable feature of SevenNet is its ability to achieve high-speed parallel execution using multiple GPUs by utilizing CUDA-aware OpenMPI.

SevenNet is publicly available in the form of an implementation for LAMMPS. However, with the integrated nanomaterial analysis GUI Advance/NanoLabo, calculations using SevenNet can be performed through a graphical user interface.

In this case study, we benchmarked large-scale molecular dynamics simulations of the sulfide-type lithium ion conductor Li₁₀GeP₂S₁₂ by performing parallel execution using multiple GPUs.

Computational Environment#

The specifications of the computational environment used in this case study are as follows:

CPU: Intel Xeon Gold 6338 (32 cores) × 2
GPU: NVIDIA A100 (VRAM 40GB) × 8
Memory: 1024GB
CUDA: 12.2

The computational environment was created with the cooperation of Fujitsu Limited, utilizing the "Fujitsu Cloud Service HPC" from Fujitsu Computing as a Service (CaaS).

Computational Conditions#

Starting from the structural file (mp-696128) for Li₁₀GeP₂S₁₂ obtained from the Materials Project, we generated supercells to prepare large-scale systems with 21600 atoms and 98000 atoms. For these systems, we performed molecular dynamics simulations for 100 steps using the pre-trained model "parallel SevenNet-0", with an NVT ensemble (500K) and a time step of 0.5 fs. The same calculation was repeated while varying the number of GPUs used from 1 to 8, and the computation times were measured. The number of MPI processes per GPU was set to 1, and the number of threads per process was set to 1.

Benchmark Results#

The table below shows the computation times for different numbers of GPUs used, as well as the simulation times for running under the same conditions for one day. For the 21600-atom system, an out-of-memory (OOM) error occurred when using 1 GPU, and for the 98000-atom system, OOM errors occurred when using 4 or fewer GPUs. It appears that multiple GPUs are essential for systems of this scale.

	Computation Time for 100 Steps (s)		Simulation Time for 1 Day Run (ns/day)
	21600-atom system	98000-atom system	21600-atom system	98000-atom system
GPU×1	OOM	OOM	OOM	OOM
GPU×2	74	OOM	0.060	OOM
GPU×3	51	OOM	0.088	OOM
GPU×4	40	OOM	0.112	OOM
GPU×5	34	204	0.132	0.022
GPU×6	28	115	0.163	0.039
GPU×7	25	101	0.184	0.044
GPU×8	23	87	0.204	0.052

Note that the SevenNet paper¹ reported a computational speed of approximately 0.1 ns/day³ for a 0.1-million-atom Si₃N₄ system using 8 A100 80GB GPUs, and our results are generally consistent with that.

To examine the parallel efficiency with respect to the number of GPUs used, we plot the relative computational speed. Typically, the computation time with 1 GPU is used as the reference, but since that was not available in our case, we use half the computation time for the 21600-atom system with 2 GPUs as the reference. For the 98000-atom system, we use the same reference, assuming that the computation time scales linearly with the number of atoms.

plot of relative computational speed against the number of GPUs used

Ideally, the relative computational speed should scale linearly with the number of GPUs used, and the closer it is to this linear scaling, the better the parallel efficiency. For the 21600-atom system, the parallel efficiency gradually decreases as more GPUs are used. However, for the 98000-atom system, the relative speed remains nearly constant from 6 to 8 GPUs, suggesting that with larger atom counts, multiple GPUs can be utilized more efficiently.

Additionally, we observed a lower computational speed when using 5 GPUs. Initially, we suspected that an imbalance in the atom distribution among GPUs might be causing inefficiency. However, upon verification, the atoms were evenly distributed. This decrease in speed is likely due to factors such as the shape of the cell decomposition, which may have increased the communication overhead between GPUs.

Overall, these results demonstrate that molecular dynamics simulations with SevenNet can be significantly accelerated by utilizing parallel execution across multiple GPUs.

Benchmark of Molecular Dynamics Simulations with the Multi-GPU Enabled General-purpose GNN Force Field SevenNet#

Computational Environment#

Computational Conditions#

Benchmark Results#

関連ページ#