Advance/NeuralMD Documentation
[Advance/NeuralMD Pro] Benchmarks on the machine with 8 NVIDIA H200 GPUs#
We performed a benchmark of molecular dynamics calculations using LAMMPS with a Neural Network potential on a machine equipped with 8 GPUs (NVIDIA H200).
The subjects were the same as in previous benchmarks: 21,600-atom and 98,000-atom supercell models of the sulfide lithium-ion conductor Li10GeP2S12.
Computational Environment and MD Calculation Conditions#
The specifications of the computer used in this case study are shown below.
- CPU: Intel Xeon Platinum 8480+ (56 cores) ×2
- GPU: NVIDIA H200 ×8
- CUDA: 12.4
The computational environment was created using the GPU cloud service "GPUSOROBAN" with the cooperation of HIGHRESO Co., Ltd.
Using LAMMPS 2Aug2023 (AdvanceSoft-modified version, bundled with Advance/NanoLabo Tool), we ran molecular dynamics calculations for 21,600-atom and 98,000-atom systems of Li10GeP2S12 with a force field created by NeuralMD. The pre-trained force field files, including the one used in this study, are available in the Force Field Database.
We performed a 100-step calculation in the NVT ensemble at 500 K, with a time step of 0.5 fs. Additionally, we calculated the number of days required to perform a similar molecular dynamics calculation for 1 ns based on the results.
Benchmark Results#
The calculation conditions and results are shown in the table below. Calculations were performed under five conditions: CPU only, and using 1 to 8 GPU devices. The number of MPI processes was set to 4 per GPU device.
CPU | GPU×1 | GPU×2 | GPU×4 | GPU×8 | |
---|---|---|---|---|---|
Number of MPI processes | 56 | 4 | 8 | 16 | 32 |
Number of OpenMP threads | 1 | 2 | 2 | 2 | 2 |
Number of GPU devices | 0 | 1 | 2 | 4 | 8 |
Calculation time (Looptime/s) 21600 atoms | 6.78 | 3.34 | 1.86 | 1.03 | 0.72 |
Calculation time (Looptime/s) 98000 atoms | 27.92 | 15.07 | 7.65 | 4.11 | 2.43 |
ns/day 21600 atoms | 0.64 | 1.29 | 2.32 | 4.20 | 6.04 |
ns/day 98000 atoms | 0.16 | 0.29 | 0.57 | 1.05 | 1.78 |
The figure below shows the relative calculation speed with the CPU-only calculation speed set to 1. An acceleration of about 2 times was observed with 1 GPU device, and about 10-12 times with 8 devices.
Comparison with A100#
For this benchmark, we also performed calculations using a single NVIDIA A100 80GB device and have calculated its relative computational speed on the same basis for comparison.
While the A100 achieves a computational speed comparable to the 56-core CPU, the H200 shows a steady improvement in performance, demonstrating that NeuralMD can leverage the performance of the newer generation GPU to achieve high calculation speeds. This indicates that a system equipped with multiple new-generation GPUs is extremely effective for molecular dynamics calculations using NeuralMD, showing it to be a tool that enables larger and more complex simulations.