Gradient descent for fitting models to stars

This is a topic I've written about before, but I wanted to update my code and simplify things. You can clone the GitHub repo if you'd like an example to run for yourself as well.
Basically, the goal is to fit a two dimensional Gaussian distribution to a star in astronomical data (though it'd be easy to generalize this algorithm to the N-dimensional case).
The Gaussian is a good model in most cases, and it's easy to compute; in the past I tried fitting the Moffat function as well, but I found its parameter \(\beta\) hard to fit, so an iterative method probably isn't optimal.
To start, I'll restate the gradient descent (GD) algorithm (you'll find it peppered throughout the literature because it's so well known).
\begin{align}\vec{\theta}_{\kappa+1}&=\vec{\theta}_{\kappa}-\eta\vec{\nabla}_{\vec{\theta}}\text{cost}(\vec{\theta}_{\kappa}),\end{align}
where
\begin{align}\text{cost}(\vec{\theta}_{\kappa}):=\dfrac{1}{MN}\displaystyle\sum_{i,j=1}^{N,M}[m(i…
Basically, the goal is to fit a two dimensional Gaussian distribution to a star in astronomical data (though it'd be easy to generalize this algorithm to the N-dimensional case).
The Gaussian is a good model in most cases, and it's easy to compute; in the past I tried fitting the Moffat function as well, but I found its parameter \(\beta\) hard to fit, so an iterative method probably isn't optimal.
To start, I'll restate the gradient descent (GD) algorithm (you'll find it peppered throughout the literature because it's so well known).
\begin{align}\vec{\theta}_{\kappa+1}&=\vec{\theta}_{\kappa}-\eta\vec{\nabla}_{\vec{\theta}}\text{cost}(\vec{\theta}_{\kappa}),\end{align}
where
\begin{align}\text{cost}(\vec{\theta}_{\kappa}):=\dfrac{1}{MN}\displaystyle\sum_{i,j=1}^{N,M}[m(i…