In an earlier article, we discussed how the laws of physics were being derived using AI techniques. In that, the primary question was if AI could discover physical laws alone. Now, we consider a hybrid situation where there is some understanding of the physical law governing a system, but one has to accommodate the data-driven insights without losing any information that the data could provide and be able to solve for the evolution of state-space. Is this possible?
To answer this, we must consider the three general types of relations in physics.
Physical laws are well corroborated with numerous experiments and offer full-blown mathematical machinery to develop the analysis further. Examples include those based on conservation principles of charge and energy. In the context of epidemiological modelling, this conservation principle is simply the person-count at any given time accommodating for new births and new-deaths. Physical laws apply to all systems where they are relevant.
Constitutive relations approximate the response of a system to external stimuli. These are macroscopic mathematical relations which have broad applicability. They are invoked when there is no “fundamental” way of understanding the observed relations but rely on a considerable number of correlations observed across multiple variables.
Numerous examples include the relation between stress-strain of a material (Hooke’s Law), friction factor charts inflow, and modelling incidence rates in epidemiology. These relations are mostly observed as a general relation that is applicable for a wide range of cases with some variation only in the numerical constants associated. They are always constrained by the given domain of applicability and bounds of variation. For example, Hooke’s Law applies only to those materials which are in the elastic zone. Constitutive relations are sometimes arrived at using dimensional analysis (Buckingham Pi-Theorem).
These relations do not have any physical basis. They are simply curve-fits but are handy in engineering predictions.
CAN PREDICT, CANNOT EXPLAIN
Unfortunately, both constitutive and empirical relations are very limited in applicability. Unlike the rigorous physical law-based models which come from first principles, these two come under phenomenological models which quantify the relation between the variables involved but do not explain why the variables relate to each other as observed. The empirical relations are especially limited. We may fit a curve to the observed relation, but the curve-fitting itself presupposes some underlying relation thereby limiting the capture of complete dependence between the variables (assuming here that the observations are the truth devoid of any measurement error).
Sounds familiar in the field of AI? In the AI parlance, we generally encounter “Can predict, Cannot explain” situations in many cases, especially with neural networks.
Chris Rackauckas at MIT wrote an interesting paper regarding this titled “Universal Differential Equations for Scientific Machine Learning“. He asked a simple question: “Given the framework provided by differential equations (deterministic or stochastic) based on conservation principles, is it possible to replace the constitutive and/or empirical relations with a Neural Network and solve them?”
He answered this question thoroughly with both math and software. If one has to use an empirical relationship to model the behaviour between certain variables, why not use the universal approximation capability of the neural network and solve the differential equations?
So, what could be the real attraction in running behind a neural network rather than a simple polynomial curve fit which has a definitive form and facilitates straightforward interpretation?
The real power of the neural network is in its ability to be a universal approximator for any function. While this does not mean we can compute the function to its exactitude, it offers a hope that there is a method which has – backed by theory – an ability to capture the complex dependence of variables in the multi-dimensional space without having to resort to any presupposition on the form of dependence.
In fact, in a 2017 paper, it was shown that ReLU networks with width n+1 were sufficient to approximate any continuous function of n-dimensional input variables. By increasing the number of hidden neurons, we improve the approximation.
Of course, like any good research paper, Chris builds on the excellent work done by predecessors, including the work of Lagaris et al, Chen et al and more. (Check out this insightful video by Prof. Duvenaud.)
APPLICATION OF A COMBINED METHODOLOGY
This methodology has many interesting applications in different domains.
Currently, the need to use the power of data analytics is most required in the fight against COVID-19. (The Department of Health – Abu Dhabi and Saal have launched a website to visualize the latest COVID-19 insights and trends.)
Epidemiological modelling has greatly assisted in predicting trends, thereby aiding capacity management and ensuring the spread is contained. Models can vary largely from deterministic mathematical models through to complex spatially-explicit stochastic simulations (e.g. GLEAM Project ) and decision support systems for hospitals. Whatever form they may be, the objective is usually to predict:
. How many will be infected
. How many will recover
. When the spread will stop
In this context, another team from MIT have published a paper which uses this approach and predicts the COVID-19 trends for the United States. The paper by Dandekar et al titled “Quantifying the effect of quarantine control in Covid-19 infectious spread using machine learning” has defined a new quarantined state in the compartmental models already available in epidemiology using the above methodology. In a nutshell, they use an S-I-R-Q model where the quarantined population is used to evaluate the strength of the quarantine function, which is captured using a neural network. The neural network-augmented SIR ODE system was trained by minimizing the mean square error loss function that includes the neural network’s weights W.
In most battery models, the state-space is modelled using a set of partial differential equations in time. Whether it is lithium batteries or ultra batteries, when the battery is on load, the equilibrium cell-voltage of the system (informally and incorrectly called as Open Circuit Voltage OCV of the system) is that number which the system would exhibit at that state of activity of the reactants provided the current on the battery is zero.
This depends on several factors including state-of-charge, chemical potentials, age, and the charge-discharge protocols of the battery. In lithium batteries, this is a complicated function of the state of intercalation/deintercalation of the lithium ion. This is usually curve-fitted and then used along with the charge, mass, species, energy, and momentum conservation equations to model the state of the battery at any time. For example in 2014, the paper by Weng et al which was titled “A Unified Open-Circuit-Voltage Model of Lithium-ion Batteries for State-of-Charge Estimation and State-of-Health Monitoring” had the following empirical model proposed for OCV:
Clearly, the authors had done quite a brain-squeeze to come up with the proposed model, which has six coefficients at least. Such empirical OCV relations must be coupled with balance equations (See for example “Governing equations for a two-scale analysis of Li-ion battery cells” by Salvadori et al. and many papers by Prof. Venkat Subramanian) and a system of differential-algebraic-equations are solved to predict the state-space evolution of the system. The current method using a neural-net along with the PDEs could greatly benefit the modelling methodology.
In conclusion, using Neural Nets along with ODEs gives a mixed power of both the restructured understanding of the system augmented and good predictive power. This could significantly benefit the scientific community as a hybrid technique.