Linear Atomic Cluster Expansion Force Fields for Organic Molecules: Beyond RMSE

We demonstrate that fast and accurate linear force fields can be built for molecules using the atomic cluster expansion (ACE) framework. The ACE models parametrize the potential energy surface in terms of body-ordered symmetric polynomials making the functional form reminiscent of traditional molecular mechanics force fields. We show that the four- or five-body ACE force fields improve on the accuracy of the empirical force fields by up to a factor of 10, reaching the accuracy typical of recently proposed machine-learning-based approaches. We not only show state of the art accuracy and speed on the widely used MD17 and ISO17 benchmark data sets, but we also go beyond RMSE by comparing a number of ML and empirical force fields to ACE on more important tasks such as normal-mode prediction, high-temperature molecular dynamics, dihedral torsional profile prediction, and even bond breaking. We also demonstrate the smoothness, transferability, and extrapolation capabilities of ACE on a new challenging benchmark data set comprised of a potential energy surface of a flexible druglike molecule.


MD17
1.1 Details of the fits 1.1.1

ACE fits
The ACE fits used 0.77 Å inner and 4.4 Å outer cut-off for the many-body part and 5.5 Å outer cutoff for the longer range pair potential fitted together with the ACE.The only exception was naphthalene where the outer cutoffs were doubled to account for the longer range effects of the extended conjugated system.The loss function used had a weight 30 on the energies and 1 on the forces when fitting to both in both the isolated atom One-body and the average energy One-body case.
The regularized linear least squares problem was solved either by rank revealing QR factorisation or by the iterative LSQR algorithm.To be able to use the LSQR algorithm we have to rewrite eq (20) in the scaled coordinates Γc by rescaling the design matrix as Writing the problem in this form allowed us to use the standard implementation of the algorithm in the IterativeSolvers.jlpackage.
The exact parameters used for each of the MD17 fits are shown in Table S1, where λ denotes the weight on the ridge penalty for LSQR or the tolerance parameter for RRQR.

ANI training
The ANI models were trained using the Torchani framework. 1For the learning we followed the tutorial in the Documentation using the default parameters for the cutoffs and the optimization of the weights.We trained two versions of the potential for each molecule, one where the weights were initialized randomly, and another one where we applied pre-training by starting from the weights of the ANI-2x model.A comparison of the mean absolute errors is shown in Table S2.The pre-trained model achieves much lower errors in every case.We included the pre-trained ANI only in the comparison table in the main manuscript.It is important to note though, that comparing to the randomly initialised model would be more fair, as the other models were all trained from scratch.: GPU speed up for ANI The timing of force calls per molecule remains constant using a GPU as long as the system fits into memory.This scaling is in sharp contrast compared to the CPU performance, which on the other hand could be sped up using parallel computing.

sGDML
To fit the sGDML models we used the command line tool sgdml all of the sGDML package. 2 For example: sgdml all train1 1000.npz925 75

GAP
To fit the GAP models we used the gap fit command line tool of the GAP package. 3As a descriptor a 2B plus double SOAP was used.The 2B descriptor had a cutoff of 6 Å, the short range SOAP kernel had a 2.5 Å and the longer range SOAP kernel had 4.5 Å cutoff.
For both SOAP kernels we used n max = 6, l max = 12 and selected 750 sparse points.The atom sigma was set to 0.3 and 0.5, and the cutoff transition width to 0.5 and 1.0, for the short and long ranged SOAP respectively.The zeta parameter was 4, and the delta 0.1 for both of them.

Classical Force Field
The modified GAFF force field was fit using the ForceBalance program with Amber17. 4,5nd, angle and dihedral terms were reparametrized, whilst non-bonded and improper terms remained unchanged.Regularization was not used as overfitting is unlikely given the size of the dataset and the simplicity of the functional form.Both energies and forces were used in the fitting process and the weighting of force to energy was set to 1:1.The parameter optimization was performed using the Newton-Raphson algorithm with a search tolerance of 0.001.

Table S1 :
Table of ACE fit parameters

Table S2 :
Pre-trained and randomly initialized ANI models.Mean Absolute Error of the energy (meV) and force (meV / Å) predictions of the pre-trained and randomly initialized ANI models.

Table S3 :
The mean absolute error of energies (meV) and forces (meV / A) of ACE models trained on energies and forces using the average energy shift, the isolated atom energy shift and trained on forces only and then shifted to minimize training energy error.