From 2bf03a9e36922281e8b9938a1796355b343250a2 Mon Sep 17 00:00:00 2001
From: Sharv <vishal.study@gmail.com>
Date: Thu, 12 Jun 2025 03:37:29 -0700
Subject: [PATCH 01/10] Create gray_scott.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Tutorial for Learning Non-Linear Reaction Dynamics for the Gray–Scott Reaction–Diffusion Model using Universal Differential Equations
---
 docs/src/examples/pde/gray_scott.md | 251 ++++++++++++++++++++++++++++
 1 file changed, 251 insertions(+)
 create mode 100644 docs/src/examples/pde/gray_scott.md

diff --git a/docs/src/examples/pde/gray_scott.md b/docs/src/examples/pde/gray_scott.md
new file mode 100644
index 000000000..c58635f96
--- /dev/null
+++ b/docs/src/examples/pde/gray_scott.md
@@ -0,0 +1,251 @@
+# Learning Non-Linear Reaction Dynamics for the Gray–Scott Reaction–Diffusion Model using Universal Differential Equations
+
+## Introduction
+The Gray–Scott model is a prototypical reaction–diffusion system known for generating a wide variety of spatial patterns — from spots and stripes to labyrinthine structures — driven purely by simple chemical kinetics and diffusion. 
+
+In this tutorial, we’ll employ a Universal Differential Equation (UDE) framework: embedding a small neural network within the PDE’s reaction term to learn unknown dynamics from data, while retaining the known diffusion physics.
+
+
+## Equations of the Gray-Scott Model
+The system is governed by the coupled PDEs:
+```math
+\frac{\partial u}{\partial t} = D_1\,\nabla^2 u + \frac{a\,u^2}{v} + \bar{u} - \alpha
+```
+
+```math
+\frac{\partial v}{\partial t} = D_2\,\nabla^2 v + a\,u^2 + \beta\,v 
+```
+
+where $u$ and $v$ are the two chemical concentrations, $D_1$ and $D_2$ are diffusion coefficients, and $a$, $\bar{u}$, $\alpha$, $\beta$ are reaction parameters.
+
+In its spatially discretized form (using Neumann boundary conditions and the tridiagonal stencil $[1, -2, 1]$), the Gray–Scott PDE reduces to:
+
+```math
+du = D1 * (A_y u + u A_x) + \frac{a u^2}{v} + \bar{u} - \alpha u
+```
+
+```math
+dv = D2 (A_y v + v A_x) + a u^2 + \beta v
+````
+Here $A_x$ and $A_y$ are the 1D Laplacian matrices for x- and y-directions, respectively.
+
+Now we will dive into the implementation of the UDE. 
+
+## Ground-truth data generation.
+```@example gray_scott
+using DifferentialEquations
+using Lux, Random, Optim, ComponentArrays, Statistics 
+using LinearAlgebra
+using Plots, SciMLBase
+using Optimization, OptimizationOptimisers 
+using SciMLSensitivity
+
+const N = 32 # Smaller grid for faster training
+
+# Constants for the "true" model
+p_true = (a=1.0, α=1.0, ubar=1.0, β=10.0, D1=0.001, D2=0.1)
+
+# Laplacian operator with Neumann (zero-flux) boundary conditions
+Ax = Array(Tridiagonal([1.0 for i in 1:(N - 1)], [-2.0 for i in 1:N], [1.0 for i in 1:(N - 1)]))
+Ay = copy(Ax)
+Ax[1, 2] = 2.0
+Ax[end, end - 1] = 2.0
+Ay[1, 2] = 2.0
+Ay[end, end - 1] = 2.0
+
+# Initial condition
+const uss = (p_true.ubar + p_true.β) / p_true.α
+const vss = (p_true.a / p_true.β) * uss^2
+const r0 = zeros(N, N, 2)
+r0[:, :, 1] .= uss
+r0[:, :, 2] .= vss
+r0[div(N,2)-5:div(N,2)+5, div(N,2)-5:div(N,2)+5, 1] .+= 0.1 .* rand.()
+r0[div(N,2)-5:div(N,2)+5, div(N,2)-5:div(N,2)+5, 2] .+= 0.1 .* rand.()
+```
+
+Having set up the grid, parameters, and initial condition, we now generate “ground truth” data by integrating the pure physics Gray–Scott model. This dataset will serve as the target that our UDE aims to learn.
+
+```@example gray_scott
+function true_model!(dr, r, p, t)
+    a, α, ubar, β, D1, D2 = p
+    u = @view r[:, :, 1]
+    v = @view r[:, :, 2]
+    Du = D1 .* (Ay * u + u * Ax)
+    Dv = D2 .* (Ay * v + v * Ax)
+    react_u = a .* u .* u ./ v .+ ubar .- α * u
+    react_v = a .* u .* u .- β * v
+    @. dr[:, :, 1] = Du + react_u
+    @. dr[:, :, 2] = Dv + react_v
+end
+
+tspan = (0.0, 0.1)
+tsteps = 0.0:0.01:0.1
+prob_true = ODEProblem(true_model!, r0, tspan, p_true)
+solution_true = solve(prob_true, Tsit5(), saveat=tsteps)
+data_to_train = Array(solution_true)
+```
+
+With the ground-truth solutions computed, we can now proceed to visualize the spatiotemporal evolution of $u$ and $v$ on the grid.
+
+```@example gray_scott
+println("Plotting ground truth concentrations at the grid center...")
+center = N ÷ 2
+p1 = plot(tsteps, solution_true[center,center,1,:], lw=2, label="U True", color=:blue)
+title!(p1, "Ground Truth: Center U")
+xlabel!("Time")
+ylabel!("Concentration")
+
+p2 = plot(tsteps, solution_true[center,center,2,:], lw=2, label="V True", color=:red)
+title!(p2, "Ground Truth: Center V")
+xlabel!("Time")
+
+p_center_combined = plot(p1, p2, layout=(1,2), size=(900,350))
+display(p_center_combined)
+```
+
+## Defining the UDE
+Now that we have an understanding of the data and its visualization, we can define the neural network and the UDE structure. We replace the $\frac{a u^2}{v} + \bar{u} - \alpha u$ term with a neural network, giving the resultant ODEs.
+
+```math
+du = D1 * (A_y u + u A_x) + \mathcal{N}_\theta(u,v)
+```
+
+```math
+dv = D2 (A_y v + v A_x) + a u^2 + \beta v
+````
+
+The first step is to do data normalization, or compute the statistical properties of the dataset. We normalize the inputs to the neural network to make the training process more stable and efficient. Neural networks learn best when their input data is scaled to a consistent range, typically centered around zero, which helps prevent the gradients from becoming too large or too small during training. This leads to faster convergence and allows the optimizer to find a good solution more reliably.
+
+```@example gray_scott
+# Calculate mean and std dev for u and v across all space and time
+u_data = @view data_to_train[:, :, 1, :]
+v_data = @view data_to_train[:, :, 2, :]
+
+u_mean = mean(u_data)
+u_std = std(u_data)
+v_mean = mean(v_data)
+v_std = std(v_data)
+
+norm_stats = (u_mean=u_mean, u_std=u_std, v_mean=v_mean, v_std=v_std)
+```
+
+Next, we define the neural network structure.
+```@example gray_scott
+rng = Random.default_rng()
+nn = Lux.Chain(Lux.Dense(2 => 16, tanh), Lux.Dense(16 => 1))
+p_nn, st_nn = Lux.setup(rng, nn)
+
+# Add the normalization stats to the non-trainable parameters
+p_ude = ComponentArray(
+    p_physics=(β=p_true.β, D1=p_true.D1, D2=p_true.D2, a=p_true.a),
+    p_nn=p_nn,
+    p_norm=norm_stats # Add normalization stats here
+)
+```
+
+The following code describes the UDE formulation with the neural network predictions embedded.
+
+```@example gray_scott
+function create_ude_model(nn_model, nn_state)
+    function ude_model!(dr, r, p, t)
+        β, D1, D2, a = p.p_physics
+        u_mean, u_std, v_mean, v_std = p.p_norm
+        
+        u = @view r[:, :, 1]
+        v = @view r[:, :, 2]
+        Du = D1 .* (Ay * u + u * Ax)
+        Dv = D2 .* (Ay * v + v * Ax)
+        react_v = a .* u .* u .- β * v
+        @. dr[:, :, 2] = Dv + react_v
+        
+        nn_reaction_u = similar(u)
+        for i in 1:N, j in 1:N
+            # Normalize the input to the NN
+            u_norm = (u[i, j] - u_mean) / (u_std + 1f-8) # Add epsilon for stability
+            v_norm = (v[i, j] - v_mean) / (v_std + 1f-8)
+            input = [u_norm, v_norm]
+            
+            # The NN receives normalized data
+            nn_reaction_u[i, j] = nn_model(input, p.p_nn, nn_state)[1][1]
+        end
+        @. dr[:, :, 1] = Du + nn_reaction_u
+    end
+    return ude_model!
+end
+
+ude_model! = create_ude_model(nn, st_nn)
+prob_ude = ODEProblem(ude_model!, r0, tspan, p_ude)
+```
+
+## Loss Function and Optimization
+The loss function is defined as:
+```@example gray_scott
+function loss(params_to_train)
+    prediction = predict(params_to_train)
+    if prediction.retcode != SciMLBase.ReturnCode.Success
+        return Inf
+    end
+    return sum(abs2, Array(prediction) .- data_to_train)
+end
+```
+The function used to predict is defined as below. We have explicitly defined the sensealg here to ensure the correct AD is used, which is the `QuadratureAdjoint` here. It is a method that provides a correct gradient without errors is infinitely better than a slightly faster one that fails. It represents a pragmatic engineering choice to ensure the optimization can proceed reliably.
+
+```@example gray_scott
+function predict(params_to_train)
+    return solve(prob_ude, Tsit5(), p=params_to_train, saveat=solution_true.t,
+                 sensealg=QuadratureAdjoint(autojacvec=ReverseDiffVJP(true)))
+end
+```
+
+## Training
+Next, we start the training loop.
+```@example gray_scott
+loss_history = []
+callback = function (p, l)
+    push!(loss_history, l)
+    println("Current loss: ", l)
+    return false
+end
+```
+Here, two training stages were used. The first stage uses a high learning rate to quickly move the model's parameters across the loss landscape towards the general area of a good solution. The second stage then uses a much lower learning rate to carefully fine-tune the parameters, allowing the model to settle precisely into a deep minimum without the risk of overshooting it. This two-phase approach combines the benefits of rapid initial progress with a stable and accurate final convergence.
+
+```@example gray_scott
+adtype = Optimization.AutoZygote()
+optf = Optimization.OptimizationFunction((p_train, p) -> loss(p_train), adtype)
+optprob = Optimization.OptimizationProblem(optf, p_ude)
+
+println("Phase 1: Training with ADAM (learning rate 0.01)...")
+result = Optimization.solve(optprob, ADAM(0.01), callback=callback, maxiters=100)
+
+println("\nPhase 2: Refining with ADAM (learning rate 0.001)...")
+optprob2 = Optimization.OptimizationProblem(optf, result.u)
+result2 = Optimization.solve(optprob2, ADAM(0.001), callback=callback, maxiters=100)
+println("Training complete.")
+```
+
+## Results and conclusion
+We can visualize the final results of the UDE and the ground-truth data with the following code.
+
+```@example gray_scott
+p_trained = result2.u
+final_prediction = predict(p_trained)
+
+avg_u_true = [mean(data_to_train[:, :, 1, i]) for i in 1:length(tsteps)]
+avg_v_true = [mean(data_to_train[:, :, 2, i]) for i in 1:length(tsteps)]
+avg_u_pred = [mean(final_prediction[i][:, :, 1]) for i in 1:length(tsteps)]
+avg_v_pred = [mean(final_prediction[i][:, :, 2]) for i in 1:length(tsteps)]
+
+p_comp_u = plot(tsteps, avg_u_true, label="True", color=:blue, lw=2, title="Comparison: Avg. U")
+plot!(tsteps, avg_u_pred, label="UDE", color=:black, linestyle=:dash, lw=2)
+xlabel!("Time")
+ylabel!("Avg. Concentration")
+
+p_comp_v = plot(tsteps, avg_v_true, label="True", color=:red, lw=2, title="Comparison: Avg. V")
+plot!(tsteps, avg_v_pred, label="UDE", color=:black, linestyle=:dash, lw=2)
+ xlabel!("Time")
+
+p_comp_combined = plot(p_comp_u, p_comp_v, layout=(1, 2), size=(900, 350))
+display(p_comp_combined)
+```
+
+Now, as we can see, the UDE predictions match the ground-truth data very well, indicating the model has successfully learned the non-linear term.

From 8c983c0b8b0483f56a687ad09dbf75c3f433765d Mon Sep 17 00:00:00 2001
From: Sharv <vishal.study@gmail.com>
Date: Thu, 12 Jun 2025 03:48:33 -0700
Subject: [PATCH 02/10] Update gray_scott.md

---
 docs/src/examples/pde/gray_scott.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/src/examples/pde/gray_scott.md b/docs/src/examples/pde/gray_scott.md
index c58635f96..2f3ab46cc 100644
--- a/docs/src/examples/pde/gray_scott.md
+++ b/docs/src/examples/pde/gray_scott.md
@@ -107,7 +107,7 @@ display(p_center_combined)
 Now that we have an understanding of the data and its visualization, we can define the neural network and the UDE structure. We replace the $\frac{a u^2}{v} + \bar{u} - \alpha u$ term with a neural network, giving the resultant ODEs.
 
 ```math
-du = D1 * (A_y u + u A_x) + \mathcal{N}_\theta(u,v)
+du = D1 (A_y u + u A_x) + \mathcal{N}_\theta(u,v)
 ```
 
 ```math

From aa6f71a75a61ec17ce0541365ed6824cb390f6e3 Mon Sep 17 00:00:00 2001
From: Sharv <vishal.study@gmail.com>
Date: Fri, 1 Aug 2025 22:11:33 +0530
Subject: [PATCH 03/10] Create gray-scott.md

---
 docs/src/examples/pde/gray-scott.md | 423 ++++++++++++++++++++++++++++
 1 file changed, 423 insertions(+)
 create mode 100644 docs/src/examples/pde/gray-scott.md

diff --git a/docs/src/examples/pde/gray-scott.md b/docs/src/examples/pde/gray-scott.md
new file mode 100644
index 000000000..3454a2c11
--- /dev/null
+++ b/docs/src/examples/pde/gray-scott.md
@@ -0,0 +1,423 @@
+# Learning Non-Linear Reaction Dynamics for the Gray–Scott Reaction–Diffusion Model using Universal Differential Equations
+
+## Introduction
+The Gray–Scott model is a prototypical reaction–diffusion system known for generating a wide variety of spatial patterns — from spots and stripes to labyrinthine structures — driven purely by simple chemical kinetics and diffusion. 
+
+In this tutorial, we’ll employ a Universal Differential Equation (UDE) framework: embedding a small neural network within the PDE’s reaction term to learn unknown dynamics from data, while retaining the known diffusion physics.
+
+
+## Equations of the Gray-Scott Model
+The system is governed by the coupled PDEs:
+```math
+\frac{\partial u}{\partial t} = D_1\,\nabla^2 u + \frac{a\,u^2}{v} + \bar{u} - \alpha
+```
+
+```math
+\frac{\partial v}{\partial t} = D_2\,\nabla^2 v + a\,u^2 + \beta\,v 
+```
+
+where $u$ and $v$ are the two chemical concentrations, $D_1$ and $D_2$ are diffusion coefficients, and $a$, $\bar{u}$, $\alpha$, $\beta$ are reaction parameters.
+
+In its spatially discretized form (using Neumann boundary conditions and the tridiagonal stencil $[1, -2, 1]$), the Gray–Scott PDE reduces to:
+
+```math
+du = D1 * (A_y u + u A_x) + \frac{a u^2}{v} + \bar{u} - \alpha u
+```
+
+```math
+dv = D2 (A_y v + v A_x) + a u^2 + \beta v
+````
+Here $A_x$ and $A_y$ are the 1D Laplacian matrices for x- and y-directions, respectively.
+
+Now we will dive into the implementation of the UDE. 
+
+## Ground-truth data generation.
+```@example gray_scott
+using LinearAlgebra, DifferentialEquations
+using Plots
+
+const N = 16
+a = 1.0
+α = 1.0
+ubar = 1.0
+β = 10.0 
+D1 = 0.001
+D2 = 100.0
+
+Ax = Array(Tridiagonal([1.0 for i in 1:(N - 1)], [-2.0 for i in 1:N],
+    [1.0 for i in 1:(N - 1)]))
+Ay = copy(Ax)
+Ax[2, 1] = 2.0
+Ax[end - 1, end] = 2.0
+Ay[1, 2] = 2.0
+Ay[end, end - 1] = 2.0
+
+uss = (ubar + β) / α
+vss = (a / β) * uss^2
+r0 = zeros(N, N, 2)
+r0[:, :, 1] .= uss .+ 0.1 .* rand.()
+r0[:, :, 2] .= vss
+```
+
+Having set up the grid, parameters, and initial condition, we now generate “ground truth” data by integrating the pure physics Gray–Scott model. This dataset will serve as the target that our UDE aims to learn.
+
+```@example gray_scott
+function fast_gm!(du, u, p, t)
+    p = nothing
+    @inbounds for j in 2:(N - 1), i in 2:(N - 1)
+        du[i, j, 1] = D1 *
+                      (u[i - 1, j, 1] + u[i + 1, j, 1] + u[i, j + 1, 1] + u[i, j - 1, 1] -
+                       4u[i, j, 1]) +
+                      a * u[i, j, 1]^2 / u[i, j, 2] + ubar - α * u[i, j, 1]
+    end
+
+    @inbounds for j in 2:(N - 1), i in 2:(N - 1)
+        du[i, j, 2] = D2 *
+                      (u[i - 1, j, 2] + u[i + 1, j, 2] + u[i, j + 1, 2] + u[i, j - 1, 2] -
+                       4u[i, j, 2]) +
+                      a * u[i, j, 1]^2 - β * u[i, j, 2]
+    end
+
+    @inbounds for j in 2:(N - 1)
+        i = 1
+        du[1, j, 1] = D1 *
+                      (2u[i + 1, j, 1] + u[i, j + 1, 1] + u[i, j - 1, 1] - 4u[i, j, 1]) +
+                      a * u[i, j, 1]^2 / u[i, j, 2] + ubar - α * u[i, j, 1]
+    end
+    @inbounds for j in 2:(N - 1)
+        i = 1
+        du[1, j, 2] = D2 *
+                      (2u[i + 1, j, 2] + u[i, j + 1, 2] + u[i, j - 1, 2] - 4u[i, j, 2]) +
+                      a * u[i, j, 1]^2 - β * u[i, j, 2]
+    end
+    @inbounds for j in 2:(N - 1)
+        i = N
+        du[end, j, 1] = D1 *
+                        (2u[i - 1, j, 1] + u[i, j + 1, 1] + u[i, j - 1, 1] - 4u[i, j, 1]) +
+                        a * u[i, j, 1]^2 / u[i, j, 2] + ubar - α * u[i, j, 1]
+    end
+    @inbounds for j in 2:(N - 1)
+        i = N
+        du[end, j, 2] = D2 *
+                        (2u[i - 1, j, 2] + u[i, j + 1, 2] + u[i, j - 1, 2] - 4u[i, j, 2]) +
+                        a * u[i, j, 1]^2 - β * u[i, j, 2]
+    end
+
+    @inbounds for i in 2:(N - 1)
+        j = 1
+        du[i, 1, 1] = D1 *
+                      (u[i - 1, j, 1] + u[i + 1, j, 1] + 2u[i, j + 1, 1] - 4u[i, j, 1]) +
+                      a * u[i, j, 1]^2 / u[i, j, 2] + ubar - α * u[i, j, 1]
+    end
+    @inbounds for i in 2:(N - 1)
+        j = 1
+        du[i, 1, 2] = D2 *
+                      (u[i - 1, j, 2] + u[i + 1, j, 2] + 2u[i, j + 1, 2] - 4u[i, j, 2]) +
+                      a * u[i, j, 1]^2 - β * u[i, j, 2]
+    end
+    @inbounds for i in 2:(N - 1)
+        j = N
+        du[i, end, 1] = D1 *
+                        (u[i - 1, j, 1] + u[i + 1, j, 1] + 2u[i, j - 1, 1] - 4u[i, j, 1]) +
+                        a * u[i, j, 1]^2 / u[i, j, 2] + ubar - α * u[i, j, 1]
+    end
+    @inbounds for i in 2:(N - 1)
+        j = N
+        du[i, end, 2] = D2 *
+                        (u[i - 1, j, 2] + u[i + 1, j, 2] + 2u[i, j - 1, 2] - 4u[i, j, 2]) +
+                        a * u[i, j, 1]^2 - β * u[i, j, 2]
+    end
+
+    @inbounds begin
+        i = 1
+        j = 1
+        du[1, 1, 1] = D1 * (2u[i + 1, j, 1] + 2u[i, j + 1, 1] - 4u[i, j, 1]) +
+                      a * u[i, j, 1]^2 / u[i, j, 2] + ubar - α * u[i, j, 1]
+        du[1, 1, 2] = D2 * (2u[i + 1, j, 2] + 2u[i, j + 1, 2] - 4u[i, j, 2]) +
+                      a * u[i, j, 1]^2 - β * u[i, j, 2]
+
+        i = 1
+        j = N
+        du[1, N, 1] = D1 * (2u[i + 1, j, 1] + 2u[i, j - 1, 1] - 4u[i, j, 1]) +
+                      a * u[i, j, 1]^2 / u[i, j, 2] + ubar - α * u[i, j, 1]
+        du[1, N, 2] = D2 * (2u[i + 1, j, 2] + 2u[i, j - 1, 2] - 4u[i, j, 2]) +
+                      a * u[i, j, 1]^2 - β * u[i, j, 2]
+
+        i = N
+        j = 1
+        du[N, 1, 1] = D1 * (2u[i - 1, j, 1] + 2u[i, j + 1, 1] - 4u[i, j, 1]) +
+                      a * u[i, j, 1]^2 / u[i, j, 2] + ubar - α * u[i, j, 1]
+        du[N, 1, 2] = D2 * (2u[i - 1, j, 2] + 2u[i, j + 1, 2] - 4u[i, j, 2]) +
+                      a * u[i, j, 1]^2 - β * u[i, j, 2]
+
+        i = N
+        j = N
+        du[end, end, 1] = D1 * (2u[i - 1, j, 1] + 2u[i, j - 1, 1] - 4u[i, j, 1]) +
+                          a * u[i, j, 1]^2 / u[i, j, 2] + ubar - α * u[i, j, 1]
+        du[end, end, 2] = D2 * (2u[i - 1, j, 2] + 2u[i, j - 1, 2] - 4u[i, j, 2]) +
+                          a * u[i, j, 1]^2 - β * u[i, j, 2]
+    end
+end
+
+prob = ODEProblem(fast_gm!, r0, (0.0, 10), p = nothing)
+solution_true = solve(prob, FBDF(), saveat=0.5, abstol=1e-6, reltol=1e-6);
+```
+
+With the ground-truth solutions computed, we can now proceed to visualize the spatiotemporal evolution of $u$ and $v$ on the grid.
+
+```@example gray_scott
+tsteps = 0:0.5:10
+println("Plotting ground truth concentrations at the grid center...")
+c = N ÷ 2
+p1 = plot(tsteps, solution_true[c,c,1,:], lw=2, label="U True", color=:blue, title="Ground Truth: Center U")
+xlabel!("Time"); ylabel!("Concentration")
+p2 = plot(tsteps, solution_true[c,c,2,:], lw=2, label="V True", color=:red, title="Ground Truth: Center V")
+xlabel!("Time")
+p_center_combined = plot(p1, p2, layout=(1,2), size=(900,350))
+display(p_center_combined)
+```
+
+## Defining the UDE
+Now that we have an understanding of the data and its visualization, we can define the neural network and the UDE structure. We replace the $\frac{a u^2}{v} + \bar{u} - \alpha u$ term with a neural network, giving the resultant ODEs.
+
+```math
+du = D1 (A_y u + u A_x) + \mathcal{N}_\theta(u,v)
+```
+
+```math
+dv = D2 (A_y v + v A_x) + a u^2 + \beta v
+````
+
+The first step is to define the neural network structure.
+
+```@example gray_scott
+using Lux
+using Random
+using ComponentArrays
+neural_network = Lux.Chain(Dense(2, 16, tanh), Dense(16, 16, tanh), Dense(16, 1, tanh))
+ps, st = Lux.setup(Random.default_rng(), neural_network)
+ps = ComponentArray(ps)
+global global_st = st
+```
+
+The following code describes the UDE formulation with the neural network predictions embedded, as well as a function wrapper to make sure the arrays work with `Optimization.jl`
+
+```@example gray_scott
+function ude!(du, u, p, t)
+    @inbounds for j in 2:(N - 1), i in 2:(N - 1)
+        u_val = u[i, j, 1]
+        v_val = u[i, j, 2]
+        input = [u_val, v_val]
+        nn_out, _ = Lux.apply(neural_network, input, p, global_st)
+        du[i, j, 1] = D1 *
+                      (u[i - 1, j, 1] + u[i + 1, j, 1] + u[i, j + 1, 1] + u[i, j - 1, 1] -
+                       4u[i, j, 1]) +
+                      nn_out[1]
+    end
+
+    @inbounds for j in 2:(N - 1), i in 2:(N - 1)
+        du[i, j, 2] = D2 *
+                      (u[i - 1, j, 2] + u[i + 1, j, 2] + u[i, j + 1, 2] + u[i, j - 1, 2] -
+                       4u[i, j, 2]) +
+                      a * u[i, j, 1]^2 - β * u[i, j, 2]
+    end
+
+    @inbounds for j in 2:(N - 1)
+        i = 1
+        u_val = u[i, j, 1]
+        v_val = u[i, j, 2]
+        input = [u_val, v_val]
+        nn_out, _ = Lux.apply(neural_network, input, p, global_st)
+        du[1, j, 1] = D1 *
+                      (2u[i + 1, j, 1] + u[i, j + 1, 1] + u[i, j - 1, 1] - 4u[i, j, 1]) +
+                      nn_out[1]
+    end
+    @inbounds for j in 2:(N - 1)
+        i = 1
+        du[1, j, 2] = D2 *
+                      (2u[i + 1, j, 2] + u[i, j + 1, 2] + u[i, j - 1, 2] - 4u[i, j, 2]) +
+                      a * u[i, j, 1]^2 - β * u[i, j, 2]
+    end
+    @inbounds for j in 2:(N - 1)
+        i = N
+        u_val = u[i, j, 1]
+        v_val = u[i, j, 2]
+        input = [u_val, v_val]
+        nn_out, _ = Lux.apply(neural_network, input, p, global_st)
+        du[end, j, 1] = D1 *
+                        (2u[i - 1, j, 1] + u[i, j + 1, 1] + u[i, j - 1, 1] - 4u[i, j, 1]) +
+                                nn_out[1]
+    end
+    @inbounds for j in 2:(N - 1)
+        i = N
+        du[end, j, 2] = D2 *
+                        (2u[i - 1, j, 2] + u[i, j + 1, 2] + u[i, j - 1, 2] - 4u[i, j, 2]) +
+                        a * u[i, j, 1]^2 - β * u[i, j, 2]
+    end
+
+    @inbounds for i in 2:(N - 1)
+        j = 1
+        u_val = u[i, j, 1]
+        v_val = u[i, j, 2]
+        input = [u_val, v_val]
+        nn_out, _ = Lux.apply(neural_network, input, p, global_st)
+        du[i, 1, 1] = D1 *
+                      (u[i - 1, j, 1] + u[i + 1, j, 1] + 2u[i, j + 1, 1] - 4u[i, j, 1]) +
+                      nn_out[1]
+    end
+    @inbounds for i in 2:(N - 1)
+        j = 1
+        du[i, 1, 2] = D2 *
+                      (u[i - 1, j, 2] + u[i + 1, j, 2] + 2u[i, j + 1, 2] - 4u[i, j, 2]) +
+                      a * u[i, j, 1]^2 - β * u[i, j, 2]
+    end
+    @inbounds for i in 2:(N - 1)
+        j = N
+        u_val = u[i, j, 1]
+        v_val = u[i, j, 2]
+        input = [u_val, v_val]
+        nn_out, _ = Lux.apply(neural_network, input, p, global_st)
+        du[i, end, 1] = D1 *
+                        (u[i - 1, j, 1] + u[i + 1, j, 1] + 2u[i, j - 1, 1] - 4u[i, j, 1]) +
+                        nn_out[1]
+    end
+    @inbounds for i in 2:(N - 1)
+        j = N
+        du[i, end, 2] = D2 *
+                        (u[i - 1, j, 2] + u[i + 1, j, 2] + 2u[i, j - 1, 2] - 4u[i, j, 2]) +
+                        a * u[i, j, 1]^2 - β * u[i, j, 2]
+    end
+
+    @inbounds begin
+        i = 1
+        j = 1
+        u_val = u[i, j, 1]
+        v_val = u[i, j, 2]
+        input = [u_val, v_val]
+        nn_out, _ = Lux.apply(neural_network, input, p, global_st)
+        du[1, 1, 1] = D1 * (2u[i + 1, j, 1] + 2u[i, j + 1, 1] - 4u[i, j, 1]) +
+                      nn_out[1]
+        du[1, 1, 2] = D2 * (2u[i + 1, j, 2] + 2u[i, j + 1, 2] - 4u[i, j, 2]) +
+                      a * u[i, j, 1]^2 - β * u[i, j, 2]
+
+        i = 1
+        j = N
+        u_val = u[i, j, 1]
+        v_val = u[i, j, 2]
+        input = [u_val, v_val]
+        nn_out, _ = Lux.apply(neural_network, input, p, global_st)
+        du[1, N, 1] = D1 * (2u[i + 1, j, 1] + 2u[i, j - 1, 1] - 4u[i, j, 1]) +
+                      nn_out[1]
+        du[1, N, 2] = D2 * (2u[i + 1, j, 2] + 2u[i, j - 1, 2] - 4u[i, j, 2]) +
+                      a * u[i, j, 1]^2 - β * u[i, j, 2]
+
+        i = N
+        j = 1
+        u_val = u[i, j, 1]
+        v_val = u[i, j, 2]
+        input = [u_val, v_val]
+        nn_out, _ = Lux.apply(neural_network, input, p, global_st)
+        du[N, 1, 1] = D1 * (2u[i - 1, j, 1] + 2u[i, j + 1, 1] - 4u[i, j, 1]) +
+                      nn_out[1]
+        du[N, 1, 2] = D2 * (2u[i - 1, j, 2] + 2u[i, j + 1, 2] - 4u[i, j, 2]) +
+                      a * u[i, j, 1]^2 - β * u[i, j, 2]
+
+        i = N
+        j = N
+        u_val = u[i, j, 1]
+        v_val = u[i, j, 2]
+        input = [u_val, v_val]
+        nn_out, _ = Lux.apply(neural_network, input, p, global_st)
+        du[end, end, 1] = D1 * (2u[i - 1, j, 1] + 2u[i, j - 1, 1] - 4u[i, j, 1]) +
+                          nn_out[1]
+        du[end, end, 2] = D2 * (2u[i - 1, j, 2] + 2u[i, j - 1, 2] - 4u[i, j, 2]) +
+                          a * u[i, j, 1]^2 - β * u[i, j, 2]
+    end
+end
+
+# Create wrapper for ODEProblem to work with Optimization.jl
+const nstate = N*N*2
+function ude_wrapper!(du, u, p, t)
+
+    # take a view of just the real state‐portion
+    du_state = @view du[1:nstate]
+    u_state  = @view u[1:nstate]
+
+    # reshape those into your 3-D arrays
+    @views du3 = reshape(du_state, N, N, 2)
+    @views u3  = reshape(u_state,  N, N, 2)
+
+    # call your in-place UDE
+    ude!(du3, u3, p, t)
+
+    return nothing
+end
+```
+
+## Loss Function and Optimization
+The loss function is defined as:
+```@example gray_scott
+function loss_fn(θ)
+    sol = solve(prob_ude, FBDF(), saveat=tsteps, p = θ, abstol=1e-6, reltol=1e-6)
+    if sol.retcode != :Success
+        println("Solver failed at some iteration.")
+        return Inf
+    end
+    pred_u = [mean(reshape(s, N, N, 2)[:, :, 1]) for s in sol.u]
+    loss = sum(abs2, pred_u .- data_u)
+    return loss
+end
+```
+
+The callback function is defined as:
+```@example gray_scott
+function training_callback(state, loss)
+    push!(loss_hist, loss)
+    epoch = length(loss_hist)
+    println("Callback @ Epoch = $(epoch), Loss = $(round(loss, digits=5))")
+    return false  # don't stop training
+end
+```
+We reshape the initial condition tensor `r0` into a 1D vector to define the initial state of the ODE system. Then, we construct the `ODEProblem` using the UDE dynamics and extract the time series of average `U` and `V` concentrations from the ground-truth solution to serve as training targets.
+
+```@example gray_scott
+u0_vec = reshape(r0, :)
+ode_fn = ODEFunction(ude_wrapper!, jac = nothing)
+prob_ude = ODEProblem(ode_fn, u0_vec, (0.0, 10.0), ps)
+
+using Statistics
+using SciMLSensitivity
+data_u = [mean(s[:, :, 1]) for s in solution_true.u]
+data_v = [mean(s[:, :, 2]) for s in solution_true.u]
+```
+
+Finally, we create the optimization problem, and solve the UDE.
+
+```@example gray_scott
+using Optimization, OptimizationOptimisers, BSON, Zygote
+
+# Define the OptimizationFunction using EnzymeVJP for AD
+optf = OptimizationFunction((θ, p) -> loss_fn(θ), Optimization.AutoZygote())
+
+# Create the optimization problem with current ps
+optprob = OptimizationProblem(optf, ps)
+
+# Solve using Adam
+res = solve(optprob, Optimisers.Adam(0.001); maxiters=5000, callback=training_callback)
+```
+
+## Results and conclusion
+We can visualize the final results of the UDE and the ground-truth data with the following code.
+
+```@example gray_scott
+# Final model prediction and plot
+sol_pred = solve(prob_ude, FBDF(), saveat=tsteps, abstol=1e-6, reltol=1e-6)
+pred_u = [mean(s[:, :, 1]) for s in sol_pred.u]
+
+using Plots
+plot(tsteps, data_u; lw=2, label="True ⟨U⟩", color=:blue)
+plot!(tsteps, pred_u; lw=2, label="UDE ⟨U⟩", color=:red, ls=:dash)
+xlabel!("Time"); ylabel!("Mean U"); title!("Training Fit")
+```
+
+Now, as we can see, the UDE predictions match the ground-truth data very well, indicating the model has successfully learned the non-linear term.

From 0107eb175e4000e1f75ffede14a5538c76144db4 Mon Sep 17 00:00:00 2001
From: Sharv <72983931+sharvmurgai@users.noreply.github.com>
Date: Fri, 8 Aug 2025 02:24:04 -0700
Subject: [PATCH 04/10] Update gray-scott.md

Final Fix
---
 docs/src/examples/pde/gray-scott.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/docs/src/examples/pde/gray-scott.md b/docs/src/examples/pde/gray-scott.md
index 3454a2c11..47cf203b2 100644
--- a/docs/src/examples/pde/gray-scott.md
+++ b/docs/src/examples/pde/gray-scott.md
@@ -403,7 +403,8 @@ optf = OptimizationFunction((θ, p) -> loss_fn(θ), Optimization.AutoZygote())
 optprob = OptimizationProblem(optf, ps)
 
 # Solve using Adam
-res = solve(optprob, Optimisers.Adam(0.001); maxiters=5000, callback=training_callback)
+res = solve(optprob, Optimisers.Adam(0.003); maxiters=10000, callback=training_callback)
+res.objective
 ```
 
 ## Results and conclusion
@@ -411,7 +412,7 @@ We can visualize the final results of the UDE and the ground-truth data with the
 
 ```@example gray_scott
 # Final model prediction and plot
-sol_pred = solve(prob_ude, FBDF(), saveat=tsteps, abstol=1e-6, reltol=1e-6)
+sol_pred = solve(prob_ude, FBDF(), saveat=tsteps, p = res.u, abstol=1e-6, reltol=1e-6)
 pred_u = [mean(s[:, :, 1]) for s in sol_pred.u]
 
 using Plots

From aa5274d28c624f258b218bdb9fdf686d060466db Mon Sep 17 00:00:00 2001
From: Sharv <72983931+sharvmurgai@users.noreply.github.com>
Date: Thu, 14 Aug 2025 09:08:09 -0700
Subject: [PATCH 05/10] Delete docs/src/examples/pde/gray_scott.md

---
 docs/src/examples/pde/gray_scott.md | 251 ----------------------------
 1 file changed, 251 deletions(-)
 delete mode 100644 docs/src/examples/pde/gray_scott.md

diff --git a/docs/src/examples/pde/gray_scott.md b/docs/src/examples/pde/gray_scott.md
deleted file mode 100644
index 2f3ab46cc..000000000
--- a/docs/src/examples/pde/gray_scott.md
+++ /dev/null
@@ -1,251 +0,0 @@
-# Learning Non-Linear Reaction Dynamics for the Gray–Scott Reaction–Diffusion Model using Universal Differential Equations
-
-## Introduction
-The Gray–Scott model is a prototypical reaction–diffusion system known for generating a wide variety of spatial patterns — from spots and stripes to labyrinthine structures — driven purely by simple chemical kinetics and diffusion. 
-
-In this tutorial, we’ll employ a Universal Differential Equation (UDE) framework: embedding a small neural network within the PDE’s reaction term to learn unknown dynamics from data, while retaining the known diffusion physics.
-
-
-## Equations of the Gray-Scott Model
-The system is governed by the coupled PDEs:
-```math
-\frac{\partial u}{\partial t} = D_1\,\nabla^2 u + \frac{a\,u^2}{v} + \bar{u} - \alpha
-```
-
-```math
-\frac{\partial v}{\partial t} = D_2\,\nabla^2 v + a\,u^2 + \beta\,v 
-```
-
-where $u$ and $v$ are the two chemical concentrations, $D_1$ and $D_2$ are diffusion coefficients, and $a$, $\bar{u}$, $\alpha$, $\beta$ are reaction parameters.
-
-In its spatially discretized form (using Neumann boundary conditions and the tridiagonal stencil $[1, -2, 1]$), the Gray–Scott PDE reduces to:
-
-```math
-du = D1 * (A_y u + u A_x) + \frac{a u^2}{v} + \bar{u} - \alpha u
-```
-
-```math
-dv = D2 (A_y v + v A_x) + a u^2 + \beta v
-````
-Here $A_x$ and $A_y$ are the 1D Laplacian matrices for x- and y-directions, respectively.
-
-Now we will dive into the implementation of the UDE. 
-
-## Ground-truth data generation.
-```@example gray_scott
-using DifferentialEquations
-using Lux, Random, Optim, ComponentArrays, Statistics 
-using LinearAlgebra
-using Plots, SciMLBase
-using Optimization, OptimizationOptimisers 
-using SciMLSensitivity
-
-const N = 32 # Smaller grid for faster training
-
-# Constants for the "true" model
-p_true = (a=1.0, α=1.0, ubar=1.0, β=10.0, D1=0.001, D2=0.1)
-
-# Laplacian operator with Neumann (zero-flux) boundary conditions
-Ax = Array(Tridiagonal([1.0 for i in 1:(N - 1)], [-2.0 for i in 1:N], [1.0 for i in 1:(N - 1)]))
-Ay = copy(Ax)
-Ax[1, 2] = 2.0
-Ax[end, end - 1] = 2.0
-Ay[1, 2] = 2.0
-Ay[end, end - 1] = 2.0
-
-# Initial condition
-const uss = (p_true.ubar + p_true.β) / p_true.α
-const vss = (p_true.a / p_true.β) * uss^2
-const r0 = zeros(N, N, 2)
-r0[:, :, 1] .= uss
-r0[:, :, 2] .= vss
-r0[div(N,2)-5:div(N,2)+5, div(N,2)-5:div(N,2)+5, 1] .+= 0.1 .* rand.()
-r0[div(N,2)-5:div(N,2)+5, div(N,2)-5:div(N,2)+5, 2] .+= 0.1 .* rand.()
-```
-
-Having set up the grid, parameters, and initial condition, we now generate “ground truth” data by integrating the pure physics Gray–Scott model. This dataset will serve as the target that our UDE aims to learn.
-
-```@example gray_scott
-function true_model!(dr, r, p, t)
-    a, α, ubar, β, D1, D2 = p
-    u = @view r[:, :, 1]
-    v = @view r[:, :, 2]
-    Du = D1 .* (Ay * u + u * Ax)
-    Dv = D2 .* (Ay * v + v * Ax)
-    react_u = a .* u .* u ./ v .+ ubar .- α * u
-    react_v = a .* u .* u .- β * v
-    @. dr[:, :, 1] = Du + react_u
-    @. dr[:, :, 2] = Dv + react_v
-end
-
-tspan = (0.0, 0.1)
-tsteps = 0.0:0.01:0.1
-prob_true = ODEProblem(true_model!, r0, tspan, p_true)
-solution_true = solve(prob_true, Tsit5(), saveat=tsteps)
-data_to_train = Array(solution_true)
-```
-
-With the ground-truth solutions computed, we can now proceed to visualize the spatiotemporal evolution of $u$ and $v$ on the grid.
-
-```@example gray_scott
-println("Plotting ground truth concentrations at the grid center...")
-center = N ÷ 2
-p1 = plot(tsteps, solution_true[center,center,1,:], lw=2, label="U True", color=:blue)
-title!(p1, "Ground Truth: Center U")
-xlabel!("Time")
-ylabel!("Concentration")
-
-p2 = plot(tsteps, solution_true[center,center,2,:], lw=2, label="V True", color=:red)
-title!(p2, "Ground Truth: Center V")
-xlabel!("Time")
-
-p_center_combined = plot(p1, p2, layout=(1,2), size=(900,350))
-display(p_center_combined)
-```
-
-## Defining the UDE
-Now that we have an understanding of the data and its visualization, we can define the neural network and the UDE structure. We replace the $\frac{a u^2}{v} + \bar{u} - \alpha u$ term with a neural network, giving the resultant ODEs.
-
-```math
-du = D1 (A_y u + u A_x) + \mathcal{N}_\theta(u,v)
-```
-
-```math
-dv = D2 (A_y v + v A_x) + a u^2 + \beta v
-````
-
-The first step is to do data normalization, or compute the statistical properties of the dataset. We normalize the inputs to the neural network to make the training process more stable and efficient. Neural networks learn best when their input data is scaled to a consistent range, typically centered around zero, which helps prevent the gradients from becoming too large or too small during training. This leads to faster convergence and allows the optimizer to find a good solution more reliably.
-
-```@example gray_scott
-# Calculate mean and std dev for u and v across all space and time
-u_data = @view data_to_train[:, :, 1, :]
-v_data = @view data_to_train[:, :, 2, :]
-
-u_mean = mean(u_data)
-u_std = std(u_data)
-v_mean = mean(v_data)
-v_std = std(v_data)
-
-norm_stats = (u_mean=u_mean, u_std=u_std, v_mean=v_mean, v_std=v_std)
-```
-
-Next, we define the neural network structure.
-```@example gray_scott
-rng = Random.default_rng()
-nn = Lux.Chain(Lux.Dense(2 => 16, tanh), Lux.Dense(16 => 1))
-p_nn, st_nn = Lux.setup(rng, nn)
-
-# Add the normalization stats to the non-trainable parameters
-p_ude = ComponentArray(
-    p_physics=(β=p_true.β, D1=p_true.D1, D2=p_true.D2, a=p_true.a),
-    p_nn=p_nn,
-    p_norm=norm_stats # Add normalization stats here
-)
-```
-
-The following code describes the UDE formulation with the neural network predictions embedded.
-
-```@example gray_scott
-function create_ude_model(nn_model, nn_state)
-    function ude_model!(dr, r, p, t)
-        β, D1, D2, a = p.p_physics
-        u_mean, u_std, v_mean, v_std = p.p_norm
-        
-        u = @view r[:, :, 1]
-        v = @view r[:, :, 2]
-        Du = D1 .* (Ay * u + u * Ax)
-        Dv = D2 .* (Ay * v + v * Ax)
-        react_v = a .* u .* u .- β * v
-        @. dr[:, :, 2] = Dv + react_v
-        
-        nn_reaction_u = similar(u)
-        for i in 1:N, j in 1:N
-            # Normalize the input to the NN
-            u_norm = (u[i, j] - u_mean) / (u_std + 1f-8) # Add epsilon for stability
-            v_norm = (v[i, j] - v_mean) / (v_std + 1f-8)
-            input = [u_norm, v_norm]
-            
-            # The NN receives normalized data
-            nn_reaction_u[i, j] = nn_model(input, p.p_nn, nn_state)[1][1]
-        end
-        @. dr[:, :, 1] = Du + nn_reaction_u
-    end
-    return ude_model!
-end
-
-ude_model! = create_ude_model(nn, st_nn)
-prob_ude = ODEProblem(ude_model!, r0, tspan, p_ude)
-```
-
-## Loss Function and Optimization
-The loss function is defined as:
-```@example gray_scott
-function loss(params_to_train)
-    prediction = predict(params_to_train)
-    if prediction.retcode != SciMLBase.ReturnCode.Success
-        return Inf
-    end
-    return sum(abs2, Array(prediction) .- data_to_train)
-end
-```
-The function used to predict is defined as below. We have explicitly defined the sensealg here to ensure the correct AD is used, which is the `QuadratureAdjoint` here. It is a method that provides a correct gradient without errors is infinitely better than a slightly faster one that fails. It represents a pragmatic engineering choice to ensure the optimization can proceed reliably.
-
-```@example gray_scott
-function predict(params_to_train)
-    return solve(prob_ude, Tsit5(), p=params_to_train, saveat=solution_true.t,
-                 sensealg=QuadratureAdjoint(autojacvec=ReverseDiffVJP(true)))
-end
-```
-
-## Training
-Next, we start the training loop.
-```@example gray_scott
-loss_history = []
-callback = function (p, l)
-    push!(loss_history, l)
-    println("Current loss: ", l)
-    return false
-end
-```
-Here, two training stages were used. The first stage uses a high learning rate to quickly move the model's parameters across the loss landscape towards the general area of a good solution. The second stage then uses a much lower learning rate to carefully fine-tune the parameters, allowing the model to settle precisely into a deep minimum without the risk of overshooting it. This two-phase approach combines the benefits of rapid initial progress with a stable and accurate final convergence.
-
-```@example gray_scott
-adtype = Optimization.AutoZygote()
-optf = Optimization.OptimizationFunction((p_train, p) -> loss(p_train), adtype)
-optprob = Optimization.OptimizationProblem(optf, p_ude)
-
-println("Phase 1: Training with ADAM (learning rate 0.01)...")
-result = Optimization.solve(optprob, ADAM(0.01), callback=callback, maxiters=100)
-
-println("\nPhase 2: Refining with ADAM (learning rate 0.001)...")
-optprob2 = Optimization.OptimizationProblem(optf, result.u)
-result2 = Optimization.solve(optprob2, ADAM(0.001), callback=callback, maxiters=100)
-println("Training complete.")
-```
-
-## Results and conclusion
-We can visualize the final results of the UDE and the ground-truth data with the following code.
-
-```@example gray_scott
-p_trained = result2.u
-final_prediction = predict(p_trained)
-
-avg_u_true = [mean(data_to_train[:, :, 1, i]) for i in 1:length(tsteps)]
-avg_v_true = [mean(data_to_train[:, :, 2, i]) for i in 1:length(tsteps)]
-avg_u_pred = [mean(final_prediction[i][:, :, 1]) for i in 1:length(tsteps)]
-avg_v_pred = [mean(final_prediction[i][:, :, 2]) for i in 1:length(tsteps)]
-
-p_comp_u = plot(tsteps, avg_u_true, label="True", color=:blue, lw=2, title="Comparison: Avg. U")
-plot!(tsteps, avg_u_pred, label="UDE", color=:black, linestyle=:dash, lw=2)
-xlabel!("Time")
-ylabel!("Avg. Concentration")
-
-p_comp_v = plot(tsteps, avg_v_true, label="True", color=:red, lw=2, title="Comparison: Avg. V")
-plot!(tsteps, avg_v_pred, label="UDE", color=:black, linestyle=:dash, lw=2)
- xlabel!("Time")
-
-p_comp_combined = plot(p_comp_u, p_comp_v, layout=(1, 2), size=(900, 350))
-display(p_comp_combined)
-```
-
-Now, as we can see, the UDE predictions match the ground-truth data very well, indicating the model has successfully learned the non-linear term.

From 30e26c9e200bdd38b01bf8ccbf75afa5d80da5a5 Mon Sep 17 00:00:00 2001
From: Christopher Rackauckas <accounts@chrisrackauckas.com>
Date: Fri, 15 Aug 2025 05:42:05 -0400
Subject: [PATCH 06/10] Update gray-scott.md

---
 docs/src/examples/pde/gray-scott.md | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/docs/src/examples/pde/gray-scott.md b/docs/src/examples/pde/gray-scott.md
index 47cf203b2..a48a11016 100644
--- a/docs/src/examples/pde/gray-scott.md
+++ b/docs/src/examples/pde/gray-scott.md
@@ -5,6 +5,11 @@ The Gray–Scott model is a prototypical reaction–diffusion system known for g
 
 In this tutorial, we’ll employ a Universal Differential Equation (UDE) framework: embedding a small neural network within the PDE’s reaction term to learn unknown dynamics from data, while retaining the known diffusion physics.
 
+!!! note
+
+    The data generation and training process specifically avoids the chaotic regime of this equation in order to avoid the potential fitting issues involved in that case. See
+    [How chaotic is chaos? How some AI for Science / SciML papers are overstating accuracy claims](https://www.stochasticlifestyle.com/how-chaotic-is-chaos-how-some-ai-for-science-sciml-papers-are-overstating-accuracy-claims/) or
+    [Scientific Machine Learning of Chaotic Systems Discovers Governing Equations for Neural Populations](https://arxiv.org/abs/2507.03631) for more details
 
 ## Equations of the Gray-Scott Model
 The system is governed by the coupled PDEs:

From 82e2670c66f8c41f2747a4dbe8a6062e316ec390 Mon Sep 17 00:00:00 2001
From: Christopher Rackauckas <accounts@chrisrackauckas.com>
Date: Fri, 15 Aug 2025 05:42:46 -0400
Subject: [PATCH 07/10] Update pages.jl

---
 docs/pages.jl | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/pages.jl b/docs/pages.jl
index b8173eb07..c0efb5d62 100644
--- a/docs/pages.jl
+++ b/docs/pages.jl
@@ -21,9 +21,9 @@ pages = ["index.md",
             "examples/sde/SDE_control.md"],
         "Delay Differential Equations (DDEs)" => Any["examples/dde/delay_diffeq.md"],
         "Partial Differential Equations (PDEs)" => Any[
-            "examples/pde/pde_constrained.md",
-            "examples/pde/brusselator.md"
-        ],
+                "examples/pde/pde_constrained.md",
+                "examples/pde/gray_scott.md"
+            ],
         "Hybrid and Jump Equations" => Any["examples/hybrid_jump/hybrid_diffeq.md",
             "examples/hybrid_jump/bouncing_ball.md"],
         "Bayesian Estimation" => Any["examples/bayesian/turing_bayesian.md"],

From e77df4a5eb2ff37294a218828ae5193dfd8eded4 Mon Sep 17 00:00:00 2001
From: Christopher Rackauckas <accounts@chrisrackauckas.com>
Date: Fri, 15 Aug 2025 05:43:32 -0400
Subject: [PATCH 08/10] Delete docs/src/examples/pde/brusselator.md

---
 docs/src/examples/pde/brusselator.md | 273 ---------------------------
 1 file changed, 273 deletions(-)
 delete mode 100644 docs/src/examples/pde/brusselator.md

diff --git a/docs/src/examples/pde/brusselator.md b/docs/src/examples/pde/brusselator.md
deleted file mode 100644
index fea9aaa14..000000000
--- a/docs/src/examples/pde/brusselator.md
+++ /dev/null
@@ -1,273 +0,0 @@
-# Learning Nonlinear Reaction Dynamics in the 2D Brusselator PDE Using Universal Differential Equations
-
-## Introduction
-
-The Brusselator is a mathematical model used to describe oscillating chemical reactions and spatial pattern formation, capturing how concentrations of chemical species evolve over time and space. In this documentation, we simulate the two-dimensional Brusselator partial differential equation (PDE) on a periodic square domain, generate time-resolved data using a finite difference discretization, and use this data to train a **Universal Differential Equation (UDE)**. Specifically, we replace the known nonlinear reaction term with a neural network, enabling us to learn complex dynamics directly from the generated data while preserving the known physical structure of the system.
-
-## The Brusselator PDE
-
-The Brusselator PDE is defined on a unit square periodic domain as follows:
-
-$$\frac{\partial U}{\partial t} = B + U^2V - (A+1)U + \alpha \nabla^2 U + f(x, y, t)$$
-
-$$\frac{\partial V}{\partial t} = AU - U^2V + \alpha \nabla^2 V$$
-
-where $A=3.4, B=1$ and the forcing term is:
-
-$$f(x, y, t) =
-\begin{cases}
-5 & \text{if } (x - 0.3)^2 + (y - 0.6)^2 \leq 0.1^2 \text{ and } t \geq 1.1 \\
-0 & \text{otherwise}
-\end{cases}$$
-
-and the Laplacian operator is:
-
-$$\nabla^2 = \frac{\partial^2}{\partial x^2} + \frac{\partial^2}{\partial y^2}$$
-
-These equations are solved over the time interval:
-
-$$t \in [0, 11.5]$$
-
-with the initial conditions:
-
-$$U(x, y, 0) = 22 \cdot \left( y(1 - y) \right)^{3/2}$$
-
-$$V(x, y, 0) = 27 \cdot \left( x(1 - x) \right)^{3/2}$$
-
-and the periodic boundary conditions:
-
-$$U(x + 1, y, t) = U(x, y, t)$$
-
-$$V(x, y + 1, t) = V(x, y, t)$$
-
-## Numerical Discretization
-
-To numerically solve this PDE, we discretize the unit square domain using $N$ grid points along each spatial dimension. The variables $U[i,j]$ and $V[i,j]$ then denote the concentrations at the grid point $(i, j)$ at a given time $t$.
-
-We represent the spatially discretized fields as:
-
-$$U[i,j] = U(i \cdot \Delta x, j \cdot \Delta y), \quad V[i,j] = V(i \cdot \Delta x, j \cdot \Delta y),$$
-
-where  $\Delta x = \Delta y = \frac{1}{N}$ for a grid of size  $N \times N$. To organize the simulation state efficiently, we store both $ U $ and $ V $ in a single 3D array:
-
-$$u[i,j,1] = U[i,j], \quad u[i,j,2] = V[i,j],$$
-
-giving us a field tensor of shape $(N, N, 2)$. This structure is flexible and extends naturally to systems with additional field variables.
-
-## Finite Difference Laplacian and Forcing
-
-For spatial derivatives, we apply a second-order central difference scheme using a three-point stencil. The Laplacian is discretized as:
-
-$$[\ 1,\ -2,\ 1\ ]$$
-
-in both the $ x $ and $ y $ directions, forming a tridiagonal structure in both the x and y directions; applying this 1D stencil (scaled appropriately by $\frac{1}{Δx^2}$ or $\frac{1}{Δy^2}$) along each axis and summing the contributions yields the standard 5-point stencil computation for the 2D Laplacian. Periodic boundary conditions are incorporated by wrapping the stencil at the domain edges, effectively connecting the boundaries. The nonlinear interaction terms are computed directly at each grid point, making the implementation straightforward and local in nature.
-
-## Generating Training Data
-
-This provides us with an `ODEProblem` that can be solved to obtain training data.
-
-```@example bruss
-import ComponentArrays as CA, Random, Plots, OrdinaryDiffEq as ODE
-import SciMLBase
-
-N_GRID = 16
-XYD = range(0.0f0, stop = 1.0f0, length = N_GRID)
-dx = step(XYD)
-T_FINAL = 11.5f0
-SAVE_AT = 0.5f0
-tspan = (0.0f0, T_FINAL)
-t_points = range(tspan[1], stop = tspan[2], step = SAVE_AT)
-A, B, alpha = 3.4f0, 1.0f0, 10.0f0
-
-brusselator_f(x, y, t) = (((x - 0.3f0)^2 + (y - 0.6f0)^2) <= 0.01f0) * (t >= 1.1f0) * 5.0f0
-limit(a, N) = a == 0 ? N : a == N+1 ? 1 : a
-
-function init_brusselator(xyd)
-    println("[Init] Creating initial condition array...")
-    u0 = zeros(Float32, N_GRID, N_GRID, 2)
-    for I in CartesianIndices((N_GRID, N_GRID))
-        x, y = xyd[I[1]], xyd[I[2]]
-        u0[I, 1] = 22.0f0 * (y * (1.0f0 - y))^(3.0f0/2.0f0)
-        u0[I, 2] = 27.0f0 * (x * (1.0f0 - x))^(3.0f0/2.0f0)
-    end
-    println("[Init] Done.")
-    return u0
-end
-u0 = init_brusselator(XYD)
-
-function pde_truth!(du, u, p, t)
-    A, B, alpha, dx = p
-    αdx = alpha / dx^2
-    for I in CartesianIndices((N_GRID, N_GRID))
-        i, j = Tuple(I)
-        x, y = XYD[i], XYD[j]
-        ip1, im1 = limit(i+1, N_GRID), limit(i-1, N_GRID)
-        jp1, jm1 = limit(j+1, N_GRID), limit(j-1, N_GRID)
-        U, V = u[i, j, 1], u[i, j, 2]
-        ΔU = u[im1, j, 1] + u[ip1, j, 1] + u[i, jp1, 1] + u[i, jm1, 1] - 4.0f0 * U
-        ΔV = u[im1, j, 2] + u[ip1, j, 2] + u[i, jp1, 2] + u[i, jm1, 2] - 4.0f0 * V
-        du[i, j, 1] = αdx*ΔU + B + U^2 * V - (A+1.0f0)*U + brusselator_f(x, y, t)
-        du[i, j, 2] = αdx*ΔV + A*U - U^2 * V
-    end
-end
-
-p_tuple = (A, B, alpha, dx)
-@time sol_truth = ODE.solve(ODE.ODEProblem(pde_truth!, u0, tspan, p_tuple), ODE.FBDF(), saveat = t_points)
-u_true = Array(sol_truth)
-```
-
-## Visualizing Mean Concentration Over Time
-
-We can now use this code for training our UDE, and generating time-series plots of the concentrations of species of U and V using the code:
-
-```@example bruss
-import Plots, Statistics
-
-# Compute average concentration at each timestep
-avg_U = [Statistics.mean(snapshot[:, :, 1]) for snapshot in sol_truth.u]
-avg_V = [Statistics.mean(snapshot[:, :, 2]) for snapshot in sol_truth.u]
-
-# Plot average concentrations over time
-Plots.plot(
-    sol_truth.t, avg_U, label = "Mean U", lw = 2, xlabel = "Time", ylabel = "Concentration",
-    title = "Mean Concentration of U and V Over Time")
-Plots.plot!(sol_truth.t, avg_V, label = "Mean V", lw = 2, linestyle = :dash)
-```
-
-With the ground truth data generated and visualized, we are now ready to construct a Universal Differential Equation (UDE) by replacing the nonlinear term  $U^2V$ with a neural network. The next section outlines how we define this hybrid model and train it to recover the reaction dynamics from data.
-
-## Universal Differential Equation (UDE) Formulation
-
-In the original Brusselator model, the nonlinear reaction term \( U^2V \) governs key dynamic behavior. In our UDE approach, we replace this known term with a trainable neural network \( \mathcal{N}_\theta(U, V) \), where \( \theta \) are the learnable parameters.
-
-The resulting system becomes:
-
-$$
-\frac{\partial U}{\partial t} = 1 + \mathcal{N}_\theta(U, V) - 4.4U + \alpha \nabla^2 U + f(x, y, t)
-$$
-
-$$
-\frac{\partial V}{\partial t} = 3.4U - \mathcal{N}_\theta(U, V) + \alpha \nabla^2 V
-$$
-
-Here, $\mathcal{N}_\theta(U, V)$ is trained to approximate the true interaction term $U^2V$ using simulation data. This hybrid formulation allows us to recover unknown or partially known physical processes while preserving the known structural components of the PDE.
-
-First, we have to define and configure the neural network that has to be used for the training. The implementation for that is as follows:
-
-```@example bruss
-import Lux, Random, Optimization as OPT, OptimizationOptimJL as OOJ,
-       SciMLSensitivity as SMS, Zygote
-
-model = Lux.Chain(Lux.Dense(2 => 16, tanh), Lux.Dense(16 => 1))
-rng = Random.default_rng()
-ps_init, st = Lux.setup(rng, model)
-ps_init = CA.ComponentArray(ps_init)
-```
-
-We use a simple fully connected neural network with one hidden layer of 16 tanh-activated units to approximate the nonlinear interaction term.
-
-To ensure consistency between the ground truth simulation and the learned Universal Differential Equation (UDE) model, we preserve the same spatial discretization scheme used in the original ODEProblem. This includes:
-
-  - the finite difference Laplacian,
-  - periodic boundary conditions, and
-  - the external forcing function.
-
-The only change lies in the replacement of the known nonlinear term $U^2V$ with a neural network approximation
-$\mathcal{N}_\theta(U, V)$. This design enables the UDE to learn complex or unknown dynamics from data while maintaining the underlying physical structure of the system.
-
-The function below implements this hybrid formulation:
-
-```@example bruss
-function pde_ude!(du, u, ps_nn, t)
-    αdx = alpha / dx^2
-    for I in CartesianIndices((N_GRID, N_GRID))
-        i, j = Tuple(I)
-        x, y = XYD[i], XYD[j]
-        ip1, im1 = limit(i+1, N_GRID), limit(i-1, N_GRID)
-        jp1, jm1 = limit(j+1, N_GRID), limit(j-1, N_GRID)
-        U, V = u[i, j, 1], u[i, j, 2]
-        ΔU = u[im1, j, 1] + u[ip1, j, 1] + u[i, jp1, 1] + u[i, jm1, 1] - 4.0f0 * U
-        ΔV = u[im1, j, 2] + u[ip1, j, 2] + u[i, jp1, 2] + u[i, jm1, 2] - 4.0f0 * V
-        nn_val, _ = model([U, V], ps_nn, st)
-        val = nn_val[1]
-        du[i, j, 1] = αdx*ΔU + B + val - (A+1.0f0)*U + brusselator_f(x, y, t)
-        du[i, j, 2] = αdx*ΔV + A*U - val
-    end
-end
-prob_ude_template = ODE.ODEProblem(pde_ude!, u0, tspan, ps_init)
-```
-
-## Loss Function and Optimization
-
-To train the neural network
-$\mathcal{N}_\theta(U, V)$ embedded in the UDE, we define a loss function that measures how closely the solution of the UDE matches the ground truth data generated earlier.
-
-The loss is computed as the sum of squared errors between the predicted solution from the UDE and the true solution at each saved time point. If the solver fails (e.g., due to numerical instability or incorrect parameters), we return an infinite loss to discard that configuration during optimization. We use `FBDF()` as the solver due to the stiff nature of the brusselators euqation. Other solvers like `KenCarp47()` could also be used.
-
-To efficiently compute gradients of the loss with respect to the neural network parameters, we use an adjoint sensitivity method (`GaussAdjoint`), which performs high-accuracy quadrature-based integration of the adjoint equations. This approach enables scalable and memory-efficient training for stiff PDEs by avoiding full trajectory storage while maintaining accurate gradient estimates.
-
-The loss function and initial evaluation are implemented as follows:
-
-```@example bruss
-println("[Loss] Defining loss function...")
-function loss_fn(ps, _)
-    prob = ODE.remake(prob_ude_template, p = ps)
-    sol = ODE.solve(prob, ODE.FBDF(), saveat = t_points)
-    # Failed solve 
-    if !SciMLBase.successful_retcode(sol)
-        return Inf32
-    end
-    pred = Array(sol)
-    lval = sum(abs2, pred .- u_true) / length(u_true)
-    return lval
-end
-```
-
-Once the loss function is defined, we use the ADAM optimizer to train the neural network. The optimization problem is defined using SciML's `Optimization.jl` tools, and gradients are computed via automatic differentiation using `AutoZygote()` from `SciMLSensitivity`:
-
-```@example bruss
-println("[Training] Starting optimization...")
-import OptimizationOptimisers as OPO
-optf = OPT.OptimizationFunction(loss_fn, SMS.AutoZygote())
-optprob = OPT.OptimizationProblem(optf, ps_init)
-loss_history = Float32[]
-
-callback = (ps, l) -> begin
-    push!(loss_history, l)
-    println("Epoch $(length(loss_history)): Loss = $l")
-    false
-end
-```
-
-Finally to run everything:
-
-```@example bruss
-res = OPT.solve(optprob, OPO.Optimisers.Adam(0.01), callback = callback, maxiters = 100)
-```
-
-```@example bruss
-res.objective
-```
-
-```@example bruss
-println("[Plot] Final U/V comparison plots...")
-center = N_GRID ÷ 2
-sol_final = ODE.solve(ODE.remake(prob_ude_template, p = res.u), ODE.FBDF(), saveat = t_points)
-pred = Array(sol_final)
-
-p1 = Plots.plot(t_points, u_true[center, center, 1, :], lw = 2, label = "U True")
-Plots.plot!(p1, t_points, pred[center, center, 1, :], lw = 2, ls = :dash, label = "U Pred")
-Plots.title!(p1, "Center U Concentration Over Time")
-
-p2 = Plots.plot(t_points, u_true[center, center, 2, :], lw = 2, label = "V True")
-Plots.plot!(p2, t_points, pred[center, center, 2, :], lw = 2, ls = :dash, label = "V Pred")
-Plots.title!(p2, "Center V Concentration Over Time")
-
-Plots.plot(p1, p2, layout = (1, 2), size = (900, 400))
-```
-
-## Results and Conclusion
-
-After training the Universal Differential Equation (UDE), we compared the predicted dynamics to the ground truth for both chemical species.
-
-The low training loss shows us that the neural network in the UDE was able to understand the underlying dynamics, and it was able to learn the $U^2V$ term in the partial differential equation.

From 03d7b79f15942bc8f77ac05bb4874622385684f5 Mon Sep 17 00:00:00 2001
From: Christopher Rackauckas <accounts@chrisrackauckas.com>
Date: Fri, 15 Aug 2025 05:43:41 -0400
Subject: [PATCH 09/10] Rename gray-scott.md to gray_scott.md

---
 docs/src/examples/pde/{gray-scott.md => gray_scott.md} | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename docs/src/examples/pde/{gray-scott.md => gray_scott.md} (100%)

diff --git a/docs/src/examples/pde/gray-scott.md b/docs/src/examples/pde/gray_scott.md
similarity index 100%
rename from docs/src/examples/pde/gray-scott.md
rename to docs/src/examples/pde/gray_scott.md

From 13dac6a163efb857d16651edda5a160b3546caf4 Mon Sep 17 00:00:00 2001
From: Christopher Rackauckas <accounts@chrisrackauckas.com>
Date: Fri, 15 Aug 2025 05:45:45 -0400
Subject: [PATCH 10/10] Update docs/src/examples/pde/gray_scott.md

---
 docs/src/examples/pde/gray_scott.md | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/docs/src/examples/pde/gray_scott.md b/docs/src/examples/pde/gray_scott.md
index a48a11016..dddc6c055 100644
--- a/docs/src/examples/pde/gray_scott.md
+++ b/docs/src/examples/pde/gray_scott.md
@@ -12,6 +12,12 @@ In this tutorial, we’ll employ a Universal Differential Equation (UDE) framewo
     [Scientific Machine Learning of Chaotic Systems Discovers Governing Equations for Neural Populations](https://arxiv.org/abs/2507.03631) for more details
 
 ## Equations of the Gray-Scott Model
+
+!!! note
+
+    For more information on the semi-discretization of the Gray-Scott model for performance, see
+    [the DifferentialEquations.jl tutorial](https://docs.sciml.ai/DiffEqDocs/stable/tutorials/faster_ode_example/#Example-Accelerating-Linear-Algebra-PDE-Semi-Discretization).
+
 The system is governed by the coupled PDEs:
 ```math
 \frac{\partial u}{\partial t} = D_1\,\nabla^2 u + \frac{a\,u^2}{v} + \bar{u} - \alpha