9From Molecules to Materials to Market: A Rational Framework for Products Formulation and Design

Publication Details

Venkat Venkatasubramanian1

Purdue University

I will address the early part of the innovation cycle—the discovery in the early stages of a project where the design space is explored for improvements on formulation based on the original idea and how that explosive research base is managed using modeling and knowledge-based techniques. I will start with some background on product formulation and design, because that phrase means different things to different people. Then, with the aid of some industrial design case studies, I will argue the need for a rational automated framework through examples of different design problems. Finally, I will summarize the lessons we have learned.

When I speak of product formulation and design, I refer to the systematic identification of the molecular structure or material formulation that would meet a specifically defined need. In other words, you know what you want, but you don't know what structure or formulation will take you there. This fairly broad definition is applicable to a wide variety of situations. For example, engineering materials, polymer composites, catalysts and fuel additives, agrochemicals, and pharmaceutical problems all fit into this framework.

I have three examples and will start with the design of fuel additives, which is a somewhat simpler molecular structure design problem. Then I will mention rubber compounds formulation, a material design issue. Finally, I will mention some ongoing effort in the area of catalysts.

Overall, a company is interested in the move from molecules to materials, the use of those materials in components, and the integration of the components into a final product. Typically, chemists and chemical engineers work at the early stages of the chain of making materials; then the materials are tossed over a proverbial brick wall where mechanical and industrial engineers make components and particular products.

The chemists and chemical engineers on one side of the wall do not have much interest in or understanding of the constraints of manufacturing and design, and the mechanical and industrial engineers on the other side of the wall certainly do not worry about the more basic research issues. Design choices and manufacturing decisions are made subject to some constraints, some of which could have been avoided, if the decision makers were aware of what those constraints are. There is inefficiency in the overall design process in going from molecules to engineering materials to markets.

The first example is a case study in the molecular design of fuel additives. Through the combustion process, undesirable largely carbon-based molecular fragments are created that are deposited on the surface of the intake valve. Over time these deposits accumulate and eventually inhibit the proper opening and closing of the valve, resulting in suboptimal combustion and noxious gas releases.

Therefore, the U.S. Environmental Protection Agency (EPA) has mandated a test. Before a fuel can be sold, it needs to be tested in a standardized engine. Previously the engine was from BMW, but now EPA is using American models. The BMW engine is run for 10,000 miles; then it is taken apart. Deposits on the valve are measured to determine the intake valve deposit (IVD). The IVD needed to be less than 100 milligrams before the fuel could be sold. There is a whole market for fuel additives, which trap undesirable molecular fragments and prevent them from depositing on the surface of the valve. The problem is how to design these fuel additives to minimize the IVD and ensure it is 100 milligrams or less.

To exhaustively test all possibilities is very expensive, because for every 10,000-mile test the engine must be disassembled to measure the IVD. Every single data point costs about $8,000 to $10,000 and a considerable amount of time. Therefore, we were asked to develop a model-based approach to this problem in 1995.

A second problem involves Caterpillar, which sells earth-moving equipment. This equipment is largely made of metal—in fact, 99 percent of Caterpillar's machinery is iron—however, there are more than 1,000 rubber components in these machines, including tires, hoses, engine mount gaskets, and other parts. This equipment is used in oil wells in Siberia in the winter as well as oil fields in Kuwait in the summer. When this kind of equipment fails, it is usually the rubber components that give way because of the extreme operating conditions they face; hence a multimillion-dollar machine is sitting idle because a $1,000 rubber component has failed. This is a major product, liability, and warranty headache for Caterpillar.

Component failure is so crucial that Caterpillar does not trust any other company to make these rubber products—not even Goodyear or Firestone. Caterpillar makes its own rubber component formulations. Rubber component failure is a multilevel issue: performance depends on the rubber parts, which depend on the rubber component-based materials. This, in turn, depends on the failure mechanics properties of these materials, which are affected by rubber curing chemistry. In the end, the design- and manufacturing-related issues depend on quantum chemistry of sulfur links. This is another problem in which the transformation process goes from molecules to materials to market and has the proverbial brick wall in between.

The challenges and designs we typically see involve very complex chemistry and highly nonlinear systems and processes. There is usually some understanding of first principles and fundamental physics and chemistry but not enough to complete the parts design or the molecular design.

One other problem is combinatorially large search spaces. There are 100 million potential candidates for rubber components formulation that are possible. Some other examples we have worked at Purdue have involved 1020 to 1030 different molecules. Another issue is that typically there are limited and uncertain data. Most often, combinatorial chemistry approaches do not succeed in these cases because obtaining the results is time and labor intensive. Fuel additives design, for example, requires dismantling the engine for every test of a new formula.

What does this mean with respect to design? The traditional approach has been to give a senior experienced engineer or scientist some design objectives and have that person hypothesize a particular molecule or formulation. Guesswork, intuition, and experience are used when the new molecule is synthesized in the laboratory.

After the molecule or formulation is made, it is evaluated to see whether it meets the objectives. If the process is not successful, it begins again. This typical guess-and-test methodology yields a very long and expensive cycle (see Figure 9.1). Clearly there is a need for a more rational approach, which will remove some of the guess-and-test elements. These problems are so complex that guessing and testing cannot be completely eliminated, but the development of a system that can increase efficiency can help.

FIGURE 9.1. The traditional design method is a lengthy and expensive process.


The traditional design method is a lengthy and expensive process.

For an automated approach, two problems need to be solved. The first is how to predict the macroscopic properties given the structure or the formulation. The second is how to identify a structure based on a given desired set of properties.

There are three options for modeling choices. Fundamental models depicting the chemistry and physics of the problem can be used to predict material properties, although this type of model is uncommon. A second option is to depend on the experience of formulation scientists by using a rule-based model such as qualitative reasoning or expert systems, as with the guess-and-test approach. The final method is a data-driven approach, where data are used to make correlations, largely ignoring the physics and chemistry.

Historically these problems have been examined using one single approach, while we feel a combination of all three is needed. Understanding the physics and chemistry can provide a base, expertise can guide the search, and data can refine it. The question is how to develop a hybrid framework that mixes all three.

We have used physics and chemistry (including quantum mechanics) to build a primary model. From this and the experience base, some intermediate-level structural descriptors have been developed and mapped to the performance using data-driven techniques, whether they are statistics or neural networks. In this way the model can be validated. With the hybrid model we can predict the properties given a structure, and for the inverse problem we can search through design space for properties using a genetic algorithm and obtain the molecular structure or the rubber components formulation.

Expecting the EPA to lower the IVD value in the future, we were asked by Lubrizol to design a fuel additive for an intake valve deposit of 10 milligrams. We used a genetic algorithm in the hybrid model (see Figure 9.2) to predict the properties of some designed molecules.

FIGURE 9.2. Computer-aided design, in this case of fuel additives, makes the design cycle much shorter and more efficient by more narrowly directing the molecular search.


Computer-aided design, in this case of fuel additives, makes the design cycle much shorter and more efficient by more narrowly directing the molecular search.

One structure we discovered that came close to meeting our needs (99.3 percent fitness, 12 milligrams IVD) had been already discovered by the Lubrizol scientists through their intuitive guess-and-test approach. However, the hybrid model discovered two other better structures. The best of the three had completely novel chemistry and was a combination of molecules that had never been thought of. The hybrid model used “out of the box thinking” that opened up possibilities of new chemistry for generating leads in a much shorter time frame.

I would like to return to the rubber components situation again. Many things go into rubber, including activators, sulfurs, retarders, accelerators, and so on. A very interesting and complex set of approximately 820 reactions occur that result in curing. Current models cannot handle so much information; we need a more complex modeling environment for this type of situation.

Of the top three rubber formulations the model designed, one had already been found by the formulators at Caterpillar. It meets the design criteria, but it degrades much more quickly than desired. The two other formulations have much better degradation kinetics. The model found better formulations in a matter of hours when it would have typically taken 2 to 4 weeks. This is a 10- to 50-fold reduction in the time it takes to design a better formulation.

The third area that we started working on about a year ago in collaboration with ExxonMobil is the design of catalysts. This is a different type of product design from the other two examples because here combinatorial chemistry can have an impact. In the traditional approach of product development through experimentation, measurements were made one at a time, so there was time to think about how to develop the models, the hypothesis, the mechanisms, and the candidates to fix the data that you are getting. The current approach obtains a lot of data, but the thinking process—the model development process—is still slow and methodical. To get the most out of combinatorial chemistry, the ability to extract knowledge, not just data, is needed. That knowledge can lead you to understanding of the process.

Previously, experimental chemistry and modeling were in sync. They were like a horse and buggy on a dirt road. Now, combinatorial chemistry has provided the experiments with a Ferrari, but it can't be driven at 200 mph because we still have the dirt roads that can accommodate only 20 mph traffic. Modeling capabilities are not on par with experimental capabilities. Initially, there will be successes that are obvious and easy to find. Once you have exhausted those, the next solutions will require true knowledge and understanding. Product development will be limited by the dirt road. At Purdue we believe that the interstate is needed, and we see ourselves developing that infrastructure—the modeling infrastructure—to handle the combinatorial chemistry data explosion. We need a modeling superhighway to get the most out of combinatorial chemistry.

What do we mean by a modeling infrastructure? Given the situation in which some high-level chemistry is hypothesized, too much data exist for one scientist to analyze. For example, the 820 reactions of the rubber components formulation involve over 100 chemical species. It is impossible for one person to write and solve over 100 coupled differential equations without making mistakes. It would take nearly 3 months to explore one scenario. So we have built an environment where the scenario is specified and the information is automatically translated into equations, the parameters are optimized, and the modeled results are compared with data. This way a scenario can be analyzed in a few hours instead of 3 months. This is the type of modeling highway that can get the most out of combinatorial chemistry situations.

For catalysis development our modeling highway can be explored to see what scenarios fit the data. Predictions of new catalysts are based on that information; this may give the desired performance for those catalysts, or at the very least there are new data that could improve the model. The data are useful to revise the model, whether they indicate negative or positive results. Guided experimental design will indicate what part of combinatorial space should be explored.

The guess-and-test approach to product design and formulation is too slow. Combinatorial chemistry can yield much data, but I believe these are not data that we want. Knowledge and understanding are what we desire, and they provide motivation for a model-based framework. I have tried to illustrate these concepts with three examples of actual industrial design problems that we have worked on at Purdue: fuel additives, rubber components formulation, and the design of catalysts. So far our results have been able to reduce the design time or the formulation time. More importantly, modeling has also led to better formulation, new chemistry, and the understanding of driving forces for all of these problems. Nevertheless, we have just scratched the surface of this complex problem domain.


Richard A. Sachleben, U.S. House of Representatives: You have shown us three examples of how you used computational methods to address real-world problems. To follow your highway analogy, is it going to require building a new highway every time you have a target or are you aiming toward a generic modeling system that you can utilize regardless of the target? Designing a new modeling system every time you have a new problem to solve is too difficult.

Venkat Venkatasubramanian: That is a very good point, and as you may suspect, we do have a general framework. While the overall tools and the software architecture are the same for each application, a certain amount of customization will be required. We hope to decrease the model customization time from months to weeks, but there will always be some time needed because the chemistry of each problem is different.

Hans Thomann, ExxonMobil: I have two questions. The first regards multiscale modeling. As you probably know, there have recently been tremendous advances in computational metallurgy, particularly in linking length and timescales through parameter passing or imbedding. I didn't hear you mention that. Are you using these approaches?

Venkat Venkatasubramanian: Yes. In the Caterpillar work we use these methods because we are going through different scale levels. The details of how we do it are somewhat different from some of the work that has been done in the computational metallurgy, but the spirit of it is the same.

Hans Thomann: For my second question, I am curious to know a little more about the tradeoff between the different components of the hybrid when you start out with a constitutive relationship and then use some expert knowledge. There must be some weighting value because there is a tradeoff between the time you allocate both the computation and simply putting a weighting factor on the experts' opinion. How do you handle that?

Venkat Venkatasubramanian: Due to time constraints, I didn't get a chance to talk about that. This is an interactive framework. It is not a one-task deal. You are interacting and guiding the search at any given time, and based on your intuition, you can direct the search and change the weights and so on.

Participant: [Comment off microphone]

Venkat Venkatasubramanian: No. There are two ways we handle the knowledge-based guidance. It can be done in real time with the modeler going back and forth between iterations and then guiding the iterations either in the forward model development method, for which the modeler proposes different scenarios, or in the inverse model method, for which the algorithm does the search. The modeler can actually stop the search and force it to go some other direction, based on intuition and experience in how the molecular structure evolves.

The second way to handle knowledge-based guidance is imbedded in both the forward and the inverse model development but typically has fuzzy logic parameters. Now, there you do need some tuning. But you are not bound by that mix alone. That is why it works to first sit down and interact and then overrule where the direction the system is moving.

Participant: [Comment off microphone]

Venkat Venkatasubramanian: Limited visualization. Right now we don't have these fairly sophisticated visualization tools that have come up. At this point we can watch how well we are attaining various properties as molecular evolution proceeds, in addition to observing how we are approaching acceptable performance levels. In some other cases we are undershooting or overshooting our goals, but we can change the weight given to the different functions in real time to better reach our goals. However, our methods are not based on these different kind of predictions that folks are working on for visualization. We are not using that yet.

Richard C. Alkire, University of Illinois, Urbana-Champaign: In addition to the chemical engineering department, I have an appointment at the computing group at Illinois, and I play the piano. I have been thinking of your work and about the way the eyes see data visually when playing the piano. The fingers touch the keys and the muscles drive them. There is integration to a considerable extent, but it is all connected to the brain.

We have a very large and still growing computational infrastructure in the United States, and we have fingers and eyes and data coming together to solve problems or create solutions. Could you comment in a forward-looking way on how all of these pieces will actually be integrated, how the data will be structured, so the most people can access the data in a proper way on computers for which they weren't originally intended and compiled, and how it can be accessible in a way that allows us to solve problems over and over and learn from them, just as you have envisioned? What is needed between all those fingers and the nerve endings that you have described and the brain that coordinates all the pieces that keeps them straight?

Venkat Venkatasubramanian: Certainly we are nowhere near such a level of complexity. That would involve database management and security issues, which we are not looking at right now. Eventually, when these kinds of systems are sitting in companies and institutions, both issues will be somewhat important. We have a long way to go to reach that point.



Venkat Venkatasubramanian is a professor of chemical engineering at Purdue University. He has been a consultant to several major global corporations and institutions, such as Air Products, Arthur D. Little, Amoco, Caterpillar, DowAgro Sciences, Exxon, Lubrizol, United Nations (UNIDO and UNDP), Indian Oil, ICI (U.K.), Nova Chemicals, and G.D. Searle.