Neurons meet Newton

This is Part 2 on my series of posts on physics-informed machine learning; for backstory, see Part 1.

In Book 1 of the Principia Mathematica, Newton puts forth his celebrated Laws of Motion. He uses them to provide qualitative explanations to a staggering number of measured phenomena (including the inverse square behavior of gravity, Kepler’s laws of planetary motion, the basis of tides, the precession of equinoxes, and the orbits of comets, among others). In particular, the Second Law of Motion assumes the form of an ordinary differential equation (ODE):

\[ \frac{d^2 x(t)}{dt^2} = \frac{1}{m} F(t) \]

where \(x(t)\) is the instantaneous position of a body of mass \(m\) and \(F(t)\) is the force acting on it.

Despite the tag “ordinary”, ODEs can be simple to write down but become quickly complex. For the Second Law, analytical solutions for \(x(t)\) are available only if the force function is well-behaved. If not, one has to resort to numerical methods that involve appropriate discretization of the differential operators involved in the ODE. Natural questions start to emerge here: how does one do the discretization? How fine should we discretize? Does the approximation error induced by the discretization converge to zero, and if so, at what rate?

All these matters are doubly/triply exacerbated when we start talking about partial differential equations (PDEs). Here, the variable is an unknown multivariate function \(u\). Let us be concrete and limit ourselves to the variables being space and time, so that \(u = u(x,t)\). The equation to be solved now involves an arbitrary operator with partial derivatives:

\[ \mathcal{N}(u) = f \]

where \(f = f(x,t)\) is called the forcing function. Let us assume this is deterministically fixed. If not, then the above equation is called a stochastic PDE.

Unlike ODEs, there is no general understanding of when (or whether) a generic PDE even admits a solution. The celebrated Navier-Stokes equation is an example of a system of PDEs whose theoretical understanding is incomplete:

\[ \frac{\partial u}{\partial t} + (u \cdot \Delta - \nu) u + \frac{1}{\rho} \nabla p = f . \]

where \( \Delta, \nabla \) represent the Laplacian and the gradient respectively. Things have already become hairy, since the above PDE is nonlinear in its unknown variables. So even a heuristic application of numerical methods may not always work well.

Let us now briefly set aside the 300+ year history of solving ODEs/PDEs, and instead imagine a completely different approach. Suppose we parameterize the solution using a deep feedforward neural network. In concrete terms, if \(\Theta\) represents the weight and biases of the network, we write down:

\[ u = u_\Theta(x,t) \]

and formulate the physics-informed loss function:

\[ L(\Theta) = \sum_{(x_i,t_i) \in S_{\text{int}}}^{} (\mathcal{N} (u_\Theta(x_i, t_i)) - f(x_i, t_i) )^2 + \lambda \sum_{(x_j,t_j) \in S_{\text{bdry}}}^{} (u_\Theta(x_i, t_i) - u_0(x_i, t_i))^2 \]

where \(S_{\text{int}}\) denotes a set of collocation points in the interior and \(S_{\text{bdry}}\) denotes a set of boundary points. The weights can now be learned using standard neural training paraphernalia (autodiff, ADAM, batch normalization, etc). Once the weights are learned, the solution can be reconstructed by evaluating \(u_\Theta(x,t)\) over the entire domain.

Variants of this idea seem to have been floating around since (at least) the 1990s. As with most ideas based on neural networks, they didn’t gain steam until much later, starting with an inspirational series of papers from George Karniadakis and co-authors starting from 2017. They call this PINN, short for Physics-Informed Neural Networks.

The simplicity of the above formulation lends itself to a number of extensions. Among many others:

  • Additional training data (in the form of, say, values of the solution at a pre-identified set of collocation points) can be incorporated by throwing in new loss terms.
  • The PDE operator \(\mathcal{N}\) itself can involve unknown parameters (say, \(\lambda\)) in which case both \(\Theta, \lambda\) are jointly estimated.
  • Extension to stochastic PDEs can be achieved by taking the expectation of the physics-informed loss function over appropriately defined probability measures.

In this manner, the considerable advances in neural network learning over the last five years (and the democratization of software tools for learning neural nets, including powerful packages like TensorFlow and PyTorch) can now directly be ported to the field of numerical PDE analysis. The results are very impressive.

It leads me to wonder more broadly: what other fields in science are waiting for such a clean connection to be made to neural nets?

However, despite all these exciting advances, there are several open questions here.

First, why should the above neural approach to solving PDEs be any better than a standard numerical method? The issues of discretization, solution uniqueness, and convergence continue to persist (note that the standard PINN formulation does not explicitly discretize the domain, but there is an implicit level of discretization achieved by how the collocation points in \(S_{\text{int}}\) are distributed).

Second, how do we know that we have obtained the right solution? One answer is to do a post hoc check: if we see low/zero training loss, then we are good. But a priori there do not seem to be any guarantees on how to achieve low loss, and I find this a bit unsatisfying.

Third, does the solution given by the PINN generalize to all points in the domain? In other words, how can we control the generalization error? See here and here for interesting generalization upper bounds. But I am not entirely sure how powerful these are. In any case getting non-vacuous bounds on neural net generalization is a challenging problem in itself.

Fourth, somewhat unfortunately we have to learn a different network from scratch for each new set of boundary conditions and/or PDE system.

In a later post, I will describe potential avenues towards addressing some of these questions.