[Advance/NeuralMD Pro] Benchmarks of multi-GPU environments on Mat3ra#

Mat3ra#

Mat3ra is a cloud environment provided by Exabyte.io, which provides computational resources specialized for simulations such as first principles calculation and molecular dynamics calculation. Amazon AWS and Microsoft Azure are available, so in this time, we make a benchmark of GPU-accelerated Advance/NeuarlMD on both of the clouds. We are cooperating with ITOCHU Techno-Solutions Corporation (CTC), a Mat3ra agent, to arrange computational resources.

Benchmark of Neural Network Potential#

Using a Neural Network Potential (NNP) made with Advence/NeuralMD, we made benchmark of Molecular Dynamics (MD) calculation with LAMMPS. LAMMPS is compiled using GCC11.2.0, OpenBLAS, OpenMPI4.1.1, CUDA11.5.
The system used for the calculation is the supercell model of sulfide-type lithium ion conductor Li₁₀GeP₂S₁₂ (the below figure). The number of atoms are 21600 which is relatively large system to apply NNPs. The calculation conditions of the NNP and the MD calculation are shown in the below table. The calculation times were measured after running the 100 step MD calculation.

Calculation Condition	Set Value
Symmetric Function	Chebyshev Polynomial
Radial Component of Symmetric Function	50
Angular Component of Symmetric Function	30
Cut-off Radius	6.0 Å
Δ-NNP Method	Enabled
Structure of NN	2 layers x 40 nodes (twisted tanh)
Ensemble	NVT (T = 500K)
Time Step	0.5 fs
MD Step Number	100

Results of Benchmark on Amazon AWS#

The calculation conditions and times on AWS are shown in the below table. The calculation ran on 5 cases which used only CPU and 1 – 8 GPU devices. 4 MPI processes were activated per 1 GPU device.

	only CPU	GPU x 1	GPU x 2	GPU x 4	GPU x 8
Job Queue	OFplus	GOF	G4OF	G4OF	G8OF
CPU Type	Intel Xeon Platinum / 72core	Intel Xeon E5-2686-v4
GPU Type	-	NVIDIA V100
MPI Processes	72	4	8	16	32
OpenMP Threads	1	2	2	2	2
GPU Devices	-	1	2	4	8
Calculation Time / sec	150.8	21.6	11.2	6.9	3.9

The relative calculation speed when that using only CPU equal to 1 is shown in below figure.

Benchmark on Microsoft Azure#

The calculation conditions and times on Azure are shown in the below table. The calculation ran on 5 cases which used only CPU, and 1 – 4 GPU devices. 3 MPI processes were activated per 1 GPU device.

	only CPU	GPU x 1	GPU x 2	GPU x 4
Job Queue	OFplus	GPOF	GP2OF	GP4OF
CPU Type	Intel Xeon Platinum 8168 / 44core	Intel Xeon E5-2690-v4
GPU Type	-	NVIDIA P100
MPI Processes	44	3	6	12
OpenMP Threads	1	2	2	2
GPUデバイス数	-	1	2	4
Calculation time / sec	195.4	30.5	16.4	8.5