# Non-intrusive nonlinear model reduction via machine learning approximations to low-dimensional operators – Advanced Modeling and Simulation in Engineering Sciences

#### ByZhe Bai and Liqian Peng

Dec 30, 2021

To assess the proposed non-intrusive ROM, we consider two parameterized PDEs: (i) 1D inviscid Burgers’ equation, and (ii) 2D convection–diffusion equation. We implement explicit integrator, including 4th-order Runge–Kutta solvers, and Newton–Raphson, fixed-point iteration in backward Euler as the implicit methods.

### 1D Burgers’ equation

The experiment first employs a 1D parameterized inviscid Burgers’ equation. The input parameters (varvec{mu }=(a,b)) in the space ([1.5, 2]times [0.02, 0.025]). In the current setup, the parameters for online test are fixed to be (varvec{mu }=(1.8, 0.0232)). In the FOM, the problem is often solved using a conservative finite-volume formulation and Backward Euler in time. The 1D domain is discretized using a grid with 501 nodes, corresponding to (x=itimes (100/500), i=0,ldots ,500). The solution (u(x,t)) is computed in the time interval (tin [0,25]) using different time step sizes considering the convergence in each time integrator.

begin{aligned} frac{partial u(x,t)}{partial t} + frac{1}{2} frac{partial (u^2(x,t))}{partial x}&= 0.02e^{bx}, end{aligned}

(18a)

begin{aligned} u(0,t)&= a, forall t>0, end{aligned}

(18b)

begin{aligned} u(x,0)&= 1, forall x in [0,100]. end{aligned}

(18c)

The solution (u({{x}}, t)) is computed in the time interval (tin [0,25]) using a uniform computational time-step size (varDelta t).

#### Data collection

We investigate time step verification on choosing an appropriate (varDelta t) of the time integrator for the problem. We collect the solutions under an increasing number of time steps (N_t= [25, 50, 100, 200, 400, 800, 1600, 3200, 6400]) using both Runge–Kutta as well as backward Euler integrator. Throughout the paper, we select the time step at (99%) of the asymptotic rate of convergence. The verification results in Fig. 1 show that (N_t= 200) is a reasonable number of time steps to use for the 4th-order Runge–Kutta and (N_t=800) for backward Euler method. During the offline stage, we run four full simulations corresponding to the corner parameters of the space ([1.5, 2]times [0.02, 0.025]). Then, we sample the training data from Latin-hypercube. In the sampling, (N_text {training}) and (N_text {validation}) instances of the state, time and parameters are generated following the criterion that the minimum distances between the data points are maximized. For this study, the default size of the training set is (N_text {training}= 1000) and the default size of the validation set is (N_text {validation}= 500). The reduced vector field ({{f}}_r) is computed for each input pairs ((hat{{{x}}},t;varvec{mu })). Note that both the training and validation stage only involves pure machine learning. Then in the test stage, we evaluate the proposed ROM. The parameters are fixed to be (mu =(1.8,0.0232)) for testing purpose.

#### Model validation

We use SVR with kernel functions (2nd, 3rd order polynomials and radial basis function), kNN, Random Forest, Boosting, VKOGA, and SINDy as regression models to approximate reduced velocity. In particular, for each regression method, we change the model hyperparameters and plot the relative training error and validation error. The relative error (continuous-time model) is defined by

begin{aligned} err = frac{Vert {hat{{{f}}_r}}(hat{{{x}}},t;varvec{mu }) – {{f}}_r(hat{{{x}}},t;varvec{mu })Vert }{Vert {{f}}_r(hat{{{x}}},t;varvec{mu })Vert }. end{aligned}

(19)

We then plot the learning curve of each regression method and compare the performance of each model on training and validation data over a varying number of training instances in “Cross-validation and hyperparameter tuning” section. By properly choosing the hyperparameters and the number of training instances, our regression models can effectively balance bias and variance.

#### Simulation of the surrogate ROM

We can now solve the problem using the surrogate model along the trajectory of the dynamical system. After applying time integration to the regression-based ROM, we compute the relative error of the proposed models as a function of time. We investigate both Newton–Raphson, fixed-point iteration in backward Euler and 4th-order Runge–Kutta in explicit methods. Let ({{x}}(t)), (hat{{{x}}_{b}}(t)) and (hat{{{x}}}(t)) denote the solution of the FOM, the Galerkin ROM, and non-intrusive ROMs respectively. We define the relative error with respect to FOM ({{e}}_textit{FOM}(t)) and Galerkin ROM ({{e}}_textit{ROM}(t)) as

begin{aligned} {{e}}_textit{FOM}(t)= & {} frac{Vert ({varvec{V}}hat{{{x}}}(t) – {{x}}(t))Vert }{Vert {{x}}(t)Vert },end{aligned}

(20)

begin{aligned} {{e}}_textit{ROM}(t)= & {} frac{Vert ( hat{{{x}}}(t) – hat{{{x}}_{b}}(t))Vert }{Vert hat{{{x}}_{b}}(t)Vert }. end{aligned}

(21)

The corresponding averaged relative error over the entire time domain ({{e}}_textit{FOM}) and ({{e}}_textit{ROM}) can be computed as

begin{aligned} {{e}}_textit{FOM}= & {} frac{1}{T}int _{t=0}^T {{e}}_textit{FOM}(t)dt,end{aligned}

(22)

begin{aligned} {{e}}_textit{ROM}= & {} frac{1}{T} int _{t=0}^T {{e}}_textit{ROM}(t)dt. end{aligned}

(23)

Let (t_{FOM}), (t_{ROM}), and (tau ) denote the running time of FOM, Galerkin ROM, and non-intrusive ROM respectively. Define the relative running time with respect to the FOM and the Galerkin ROM:

begin{aligned} {varvec{tau }}_textit{FOM}= frac{{varvec{tau }}}{t_textit{FOM}}, end{aligned}

(24)

and

begin{aligned} {varvec{tau }}_textit{ROM}= frac{{varvec{tau }}}{t_textit{ROM}}. end{aligned}

(25)

The following are the simulation results from backward Euler method with (N_t=800). Figure 2 plots the state-space error with respect to the FOM and ROM using the backward Euler integrator. As validation results predict, SVR3 and SINDy behave better than the other models, achieving a relative ROM error below 1e-4 over the entire time domain, and the relative error in terms of FOM is well bounded. Figure 3 plots the Pareto frontier error as a function of the relative running time using the backward Euler integrator. For differential models, the relative time is calculated using the less expensive approach, i.e. Newton’s method. By comparison, SINDy requires much less relative time than SVR3, at a comparable level of relative error in both FOM and ROM. Table 1 summarizes (i) online running time of all methods, (ii) mean time-integrated error with respect to the FOM, and (iii) mean time-integrated error with respect to ROMs using the backward Euler integrator. For differentiable models, we compare the computational time of both Newton’s method and fixed point iteration in backward Euler. For those models that are non-differentiable, i.e. random forest, boosting and kNN, only the running time of the fixed-point iteration method is reported. All the ROM and FOM solutions are computed at the verified backward Euler time step (varDelta {t} = 3.12e)-2. Note that the non-intrusive ROM, e.g. SINDy with Newton’s method can accelerate the solver by (10.4{times }) relative to the FOM and (2.5{times }) compared to the Galerkin ROM at a relative error of 0.0346 and 4.36e-5 respectively.

We examine simulation results from 4th-order Runge–Kutta method with (N_t=200). Figure 4 shows the state-space error with respect to the FOM and ROM using the Runge–Kutta integrator. SVR2, SVR3 and SINDy have a comparable performance, and result in a bigger ROM error relative to the backward Euler solver. We notice that the random forest model begins to diverge after (t>10) in the explicit scheme. This can be explained by the corresponding performance in model evaluation in “Special regression models” section. Figure 5 plots the Pareto frontier error with respect to the relative running time using the backward Euler integrator. VKOGA has the smallest relative time in both the FOM and ROM comparison. SINDy requires slightly more running time while the accuracy outperforms VKOGA. Table 2 shows the online running time of all methods, the mean time-integrated error with respect to the FOM, and the mean time-integrated error with respect to the Galerkin ROM using the Runge–Kutta solver. For a fair comparison, all the ROM and FOM solutions are computed at the verified Runge–Kutta time step as selected in Fig. 1b. The results show that SVR based models, e.g. SVR2 and SVR3, yield the smallest relative errors, however the computational cost is more expensive than the FOM. Note that the non-intrusive ROM (VKOGA) can speed up the solver by (6.9{times }) relative to the FOM and (3.3{times }) over the Galerkin ROM at a relative error of 0.0353 and 0.0164 respectively.

### 2D convection–diffusion

We consider a 2D parametric nonlinear heat equation. Given a state variable (u = u(x, y, t)), the governing equation is described as

begin{aligned} frac{partial u(x, y, t)}{partial t}&= -{varvec{mu }}_0 bigtriangledown ^2 u – frac{{varvec{mu }}_0 {varvec{mu }}_1}{{varvec{mu }}_2} left( e^{mu _2 u} – 1right) + cos (2{varvec{pi }} x) cos (2{varvec{pi }} y), end{aligned}

(26a)

begin{aligned} u(x, y, 0)&= 0. end{aligned}

(26b)

The parameters are given by ({varvec{mu }}_0 = 0.01), and (({varvec{mu }}_1, {varvec{mu }}_2) in [9, 10]^2). The spatial domain is ([0,1]^2) and Dirichlet boundary conditions are applied. The FOM uses a finite difference discretization with (51 times 51) grid points. The full time domain is [0, 2] and we evaluate both the backward Euler and Runge–Kutta methods for time integration with uniform time steps. Figure 6 shows the solution profile at (t = 2) with input parameter ((mu _1, mu _2) = (9.5, 9.5)).

#### Data collection

Similar to the 1D case, first we investigate the appropriate time step (varDelta t) for solving the ODE. We collect the solutions of a sequence number of time steps (N_t= [25, 50, 100, 200, 400, 800, 1600, 3200, 6400]) for (i) explicit Runge–Kutta and (ii) implicit backward-Euler integrator. The verification results in Fig. 7 shows that (N_t= 200) is a reasonable number of time steps to use for Runge–Kutta and (N_t=800) for backward Euler method. During the offline stage, we run four full simulations corresponding to the corner parameters of the space ([9, 10]^2). Then, the training data are sampled from a Latin-hypercube for better covering the parameter space. In the sampling, (N_text {training}) and (N_text {validation}) instances of the state, time and parameters are generated following the criterion that the minimum distances between the data points are maximized. We use the default size of the training set (N_text {training}= 1000) and of the validation set (N_text {validation}= 500). Then the reduced vector field ({{f}}_r) is computed for each input pairs ((hat{{{x}}},t;varvec{mu })). In the training and validation stage, we regress the reduced vector field ({{f}}_r) by the input ((hat{{{x}}},t;varvec{mu })); in the test stage, we evaluate the ROM. The parameters are fixed to be ((mu _1, mu _2) = (9.5, 9.5)).

#### Model validation

We report the performance of SVR (2nd, 3rd poly and rbf), kNN, Random Forest, Boosting, VKOGA, and SINDy as regression models to approximate reduced velocity. In particular, as in “Model validation” section, for each regression method, we change the model hyperparameters and plot the relative training and validation error. The relative error is defined by Eq. (19). Similarly, we plot the learning curve of each regression method and compare the performance of each model on training and validation data over a varying number of training instances in “Cross-validation and hyperparameter tuning” section. We aim to balance bias and variance in each regression model, by properly choosing hyperparameters and the number of training instances.

#### Simulation of the surrogate ROM

We can now solve this 2D problem using the surrogate model along the trajectory of the dynamical system. After applying time integration to the regression-based ROM, we compute the relative error of the proposed models as a function of time. As in “Simulation of the surrogate ROM” section, we investigate both the backward Euler and Runge–Kutta integrators. The following are the simulation results from backward Euler method with (N_t=800). Figure 8 plots the state-space error with respect to FOM and ROM using the backward Euler integrator. VKOGA outperforms the other models, achieving a relative ROM error below 6e-2 over the entire time domain, and the accuracy is closest to Galerkin ROM.

Figure 9 plots the Pareto frontier error as a function of the relative running time using the backward Euler integrator. VKOGA performs best in terms of both accuracy and time efficiency, when comparing with the Galerkin ROM.

Table 3 presents the online running time of all methods, the mean time-integrated error with respect to FOM, and the mean time-integrated error with respect to the Galerkin ROM using the backward Euler integrator for the 2D convection–diffusion equation. To compare, all the ROM and FOM solutions are computed at the verified backward Euler time step. Note that the non-intrusive ROM, e.g. VKOGA with Newton’s method, can improve the solve time by three orders of magnitude over the FOM and (111.2{times }) compared to the Galerkin ROM at a relative error of 0.0059 and 0.0041 respectively.

We examine the simulation results from the 4th-order Runge–Kutta method with (N_t=200). Figure 10 shows the state-space error with respect to FOM and ROM using the Runge–Kutta integrator. SVR2, SVR3, VKOGA, and SINDy have a comparable performance, and result in a smaller ROM error relative to in the backward Euler solver. We notice that the random forest, boosting models and kNN begin to diverge quickly in the second half of the time domain. Figure 11 plots the Pareto frontier error as a function of the relative running time using the backward Euler integrator. VKOGA and SINDy outperform the other models in terms of both computation accuracy and time cost. The relative error compared to Galerkin ROM and FOM is below 1e-2. Table 4 includes online running time of all methods, the mean time-integrated error with respect to the FOM, and the mean time-integrated error with respect to the Galerkin ROM using the Runge–Kutta integrator for the 2D convection–diffusion equation. In the comparison, all the ROM and FOM solutions are computed at the verified backward Euler time step. We observe that the computational efficiency of the non-intrusive ROM performs significantly better than Galerkin ROM. VKOGA using Runge–Kutta can speed up the solver (3406.9{times }) that of the FOM and (468.3{times }) that of the Galerkin ROM at a relative error of 0.0083 and 0.0069. SINDy using Runge–Kutta can accelerate the solver (2524.1{times }) over the FOM and (347.0{times }) compared to the Galerkin ROM at a relative error of 0.0077 and 0.0066 respectively.