Learning Design Rules for Selective Oxidation Catalysts from High-Throughput Experimentation and Artificial Intelligence

The design of heterogeneous catalysts is challenged by the complexity of materials and processes that govern reactivity and by the fact that the number of good catalysts is very small in comparison to the number of possible materials. Here, we show how the subgroup-discovery (SGD) artificial-intelligence approach can be applied to an experimental plus theoretical data set to identify constraints on key physicochemical parameters, the so-called SG rules, which exclusively describe materials and reaction conditions with outstanding catalytic performance. By using high-throughput experimentation, 120 SiO2-supported catalysts containing ruthenium, tungsten, and phosphorus were synthesized and tested in the catalytic oxidation of propylene. As candidate descriptive parameters, the temperature and 10 parameters related to the composition and chemical nature of the catalyst materials, derived from calculated free-atom properties, were offered. The temperature, the phosphorus content, and the composition-weighted electronegativity are identified as key parameters describing high yields toward the value-added oxygenate products acrolein and acrylic acid. The SG rules not only reflect the underlying processes particularly associated with high performance but also guide the design of more complex catalysts containing up to five elements in their composition.

charge density is less than 10 −5 , the change in the sum of the eigenvalues is less than 10 −3 , and the change in the total energy is less than 10 −5 . All other settings were taken as default in FHI-aims. " , and #$% are defined as the radius of the valence s and the highest occupied state, respectively. These radii also correspond to the maximum and minimum radii of the filled valence shell of an atom. The highest occupied and lowest unoccupied states are defined based on the partial occupancy of the electronic states. and are calculated by the energy difference between the neutral and charged systems. The values of the elemental properties for the elements considered in our study are shown in Table S1.

Subgroup-discovery approach
The starting point for the SGD approach is a data set with each data point corresponding to a different material and/or applied external condition. For each of the data points, the value of a target of interest, e.g., a materials property or function, as well as the values for several candidate descriptive parameters, + , … , . , , are known. The candidate descriptive parameters are physicochemical parameters describing the materials and external conditions that are potentially related to the processes governing the target. From this data set, the SGD approach identifies subsets of data, i.e., subgroups, which present an outstanding distribution of the target values. The distribution of the target values in the SG might be outstanding because it is, for instance, narrower (i.e., it has lower standard deviation) or shifted, towards lower or larger values, with respect to the whole data set. The function measuring how outstanding a SG is with respect to the whole data set is called quality function. The SG search is performed in two steps. Firstly, a number of selectors, i.e., combination of statements about the data, are generated. These conjunctions are Boolean functions having the form: where "⋀" denotes the "AND" operator and each statement 9 , referred to as proposition, is an inequality constrain on one of the descriptive parameter, for instance 9 ( ) ≡ 9 ≥ 9 or < ≡ < < < , for some constant 9 to be determined in the analysis. The selectors describe simple convex regions ({ ∈ : = true}) in the descriptive parameter space which define SGs of the data set. To keep the number of 9 values computationally tractable, a finite set of cut-offs is determined using k-means clustering. Secondly, a search algorithm is employed to identify SGs defined by the generated selectors that maximize the quality function. The outcome of this search is a list of SGs ranked according to the quality-function values. The most relevant SGs are those presenting the highest quality-function values. The selectors defining each of these SGs depend on key descriptive parameters associated to the outstanding behavior in each subset of materials. The propositions entering the selectors can be seen as rules determining the outstanding SG performance. SGD is a supervised descriptive rule-induction technique, since it uses the labels assigned to the data points, i.e., the target values, to identify patterns in the descriptive parameter space. Furthermore, SGD is based on the maximization of a function that focuses on specific subselections of the dataset. For this reason, it is suitable to detect exceptional local types of behavior. This contrasts with artificial-intelligence methods such as decision-tree regression, which are based on the minimization of the error across the whole data set and thus provide a description of the global behavior. Although global-modelling approaches are suitable to understand the general trend, they might fail in detecting statistically exceptional, interesting regions of the materials space for which only few observations are available. This is because these regions of the materials space do not significantly impact the optimized loss function, and thus the final model. In our SGD analysis, we use HIJ the cumulative-distribution function formulation of IJ , the Jensen-Shannon divergence, to quantify, along with the coverage term (see Eq. 3), how outstanding a SG is. IJ is a measure of dissimilarity between two distributions (e.g., and ′) defined by and MN is the Kulback-Leibler divergence. Thus, the IJ is a symmetrized version of MN and the same divergence value is obtained irrespective of the choice of and ′. MN is defined, in the case of discrete distributions, by where indicates the probability space. MN is also called relative entropy, due to the similarity of its expression with the Shannon entropy for a random variable : In order to get an intuition on how the IJ influences the SGD approach, we evaluated the IJ between a fixed normal distribution (shown in orange in Fig. S1) and several normal distributions ′ (shown in black in Fig. S1) whose mean value or narrowness were modified with respect to the . In the context of SGD can be seen as the distribution of the target over the whole data set, whereas ′ is analogous to the several possible SGs of the data set. The narrowness corresponds to the standard deviation of the distributions. Figure S1 shows that when and ′ are the exact same distribution, IJ is equal to zero. The more ′ mean value is shifted with respect to , the highest the IJ gets ( Fig S1, horizontal panels). Similarly, the narrower ′ is with respect to , the highest the IJ gets ( Fig S1, vertical panels).
Because the quality function is maximized during the SG search, the more shifted and narrow the SG is, the more outstanding it will be considered. When most of the materials in the data set have poor performance, the use of IJ in the quality function favors the identification of SG containing the few materials with high performance, i.e., "the needles on the haystack". Finally, HIJ is used in our SGD approach because it can be efficiently calculated from the data without the need for estimations.
We performed the SGD analysis using the CREEDO code version 0.5.1 as implemented in realkd. 3 In order to generate the propositions about the data, we used a k-means clustering with 20 clusters. The "randomized exceptional subgroup discovery" algorithm based on a Monte-Carlo search was used for the SG search. In spite of the stochastic nature of the search algorithm, we highlight that the qualityfunction values for each identical SGs will always assume the same value. The outcome of the SGD approach is a list of SGs ranked by their quality-function values. Based on the same data set and SGD settings, different SGs can be identified presenting near-optimal quality-function values. In this work, we took into account SGs presenting quality-function values within 40% of the maximum observed (optimum) value. The diversity of SGs that can be identified with similar quality-function values is illustrated in Fig. S3. These SG are often defined by similar selectors, i.e., with similar descriptive parameters and thresholds, but they present rather different distributions of target values. We focused the discussion on the identified SG with the highest HIJ values -and rather low coverage -among all identified SGs with near-optimal quality-function values because thes SGs contain exclusively materials and conditions with exceptional performance. However, we note that other SGs with higher coverage and lower HIJ values also present similar quality-function value, see for instance the SGs shown in Fig. S3. It should be also observed that some of the candidate descriptive parameters might be significantly correlated with each other (Fig. S2). Thus, the same subselection of data points (and therefore the same quality-function value) can be obtained by using different descriptive parameters and thresholds.
We have also applied the SGD approach to subselections of the three-component materials data set to assess the variability of the SG rules with respect to the data used for training and to verify the performance of the SG rules on unseen three-component materials data. For this purpose, we split the data set (1220 data points) according to a 10-fold cross-validation split, i.e., the data set is split in 10 portions and each of these portions is excluded from training at a time. Thus, the SGD is applied to training data sets containing 1098 data points and the excluded portions (122 data points) are used as test sets. The SG rules obtained using this scheme and using the yield of oxygenates as target (Table S2), constrain, in all cases, the descriptive parameters and to intermediate ranges. Additionally, either a , , N , c , or " appear in one additional rule, depending on the training data used. We note that , N , c , or " are all significantly correlated with a . The absolute (pairwise) Pearson correlation score between each of these parameters with a is higher than 0.92 -see Fig. S2A. Thus, the key parameters shown in Table S2 can be considered equivalent to those discussed in Fig. 2 ( ,  and a ). Furthermore, in most of the cases the thresholds entering the SG rules obtained by training with 90% of the data set are similar to those obtained with the whole data set (Fig. 2C). In particular, the lower and higher bounds for are in the ranges [3.882, 3.910 eV] and [3.989, 4.031 eV], respectively, while the respective values obtained with the whole data set are 3.910 and 4.002 eV, respectively. These results indicate that the SG rules are not significantly affected by the input data used for their derivation. Finally, the average yield of oxygenates calculated on the test-set points selected by the SG rules are much higher than the average value of efghij$kil calculated over the entire test sets (Table S2), indicating that the SG rules effectively identify the exceptional materials on unseen three-component materials data.
Finally, we also applied the SGD to a reduced data set of three-component catalysts, from which the data points providing less than 3% yield of oxygenate were excluded. This reduced data set contains 519 data points. Among the identified SGs with near-optimal qualityfunction values, the SG corresponding to the largest Janson-Shannon divergence (0.638) contains 12 data points and it is described by the selector 300 ≤ ≤ 305℃ ∧ < 4.002 eV ∧ a > 0.55. Such conjunction is similar to that discussed in Fig. 2, with the exception that it does not impose a lower bound for . Therefore, the SG rules are not significantly affected by the exclusion of the low-performant materials from the data set used for training.

Choice of compatible E1 and E2 elements for the four-and five-component materials
In order to select compatible elements in four-and five-component materials, we looked at the HT4CAT database 4 and searched for structures containing phosphorus and oxygen elements in their composition. Among the elements present on these structures, we selected those showing octahedral coordination and a maximum atomic radius difference compared to tungsten of 0.10 Å. This resulted in the elements shown in Table S4. Table S1. Elemental properties used to derive the composition-dependent candidate descriptive parameter. These are free-atom properties evaluated using DFT-PBEsol and the FHI-aims code. element      ).