Merge pull request #232 from MPoL-dev/nufft_bifurcate

splitting NuFFT into base and uu,vv cached class.
MPoL-dev · Dec 13, 2023 · cb28c37 · cb28c37
2 parents ef11906 + bb4ea6d
commit cb28c37
Show file tree

Hide file tree

Showing 8 changed files with 921 additions and 572 deletions.
diff --git a/docs/changelog.md b/docs/changelog.md
@@ -2,6 +2,10 @@
 
 # Changelog
 
+## v0.2.1
+
+- Changed API of {class}`~mpol.fourier.NuFFT`. Previous signature took `uu` and `vv` points at initialization (`__init__`), and the `.forward` method took only an image cube. This behaviour is preserved in a new class {class}`~mpol.fourier.NuFFTCached`. The updated signature of {class}`~mpol.fourier.NuFFT` *does not* take `uu` and `vv` at initialization. Rather, its `forward` method is modified to take an image cube and the `uu` and `vv` points. This allows an instance of this class to be used with new `uu` and `vv` points in each forward call. This follows the standard expectation of a layer (e.g., a linear regression function predicting at new `x`) and the pattern of the TorchKbNuFFT package itself. It is expected that the new `NuFFT` will be the default routine and `NuFFTCached` will only be used in specialized circumstances (and possibly deprecated/removed in future updates). Changes implemented by #232.
+
 ## v0.2.0
 
 - Moved docs build out of combined and into standalone test workflow

diff --git a/docs/ci-tutorials/loose-visibilities.md b/docs/ci-tutorials/loose-visibilities.md
@@ -97,10 +97,10 @@ uu_chan = uu[chan]
 vv_chan = vv[chan]
 ```
 
-and then use these values to initialize a {class}`mpol.fourier.NuFFT` object
+Now, we'll initialize a {class}`mpol.fourier.NuFFT` object
 
 ```{code-cell}
-nufft = fourier.NuFFT(coords=coords, nchan=nchan, uu=uu_chan, vv=vv_chan)
+nufft = fourier.NuFFT(coords=coords, nchan=nchan)
 ```
 
 Now let's put the NuFFT aside for a moment while we initialize the {class}`mpol.gridding.DataAverager` object and create an image for use in the forward model.
@@ -163,7 +163,7 @@ icube = images.ImageCube(coords=coords, nchan=nchan, cube=img_packed_tensor)
 The interesting part of the NuFFT is that it will carry an image plane model all the way to the Fourier plane in loose visibilities, resulting in a model visibility array the same shape as the original visibility data.
 
 ```{code-cell}
-vis_model_loose = nufft(icube())
+vis_model_loose = nufft(icube(), uu_chan, vv_chan)
 print("Loose model visibilities from the NuFFT have shape {:}".format(vis_model_loose.shape))
 print("The original loose data visibilities have shape {:}".format(data.shape))
 ```

diff --git a/docs/ci-tutorials/optimization.md b/docs/ci-tutorials/optimization.md
@@ -312,14 +312,14 @@ $$
 For speed reasons, the {class}`mpol.precomposed.SimpleNet` does not work with the original data visibilities directly, but instead uses an averaged version of them in {class}`~mpol.datasets.GriddedDataset`. To calculate model visibilities corresponding to the original $u,v$ points of the dataset, we will need to use the {class}`mpol.fourier.NuFFT` layer. More detail on this object is in the [Loose Visibilities tutorial](loose-visibilities.md), but basically we instantiate the NuFFT layer relative to some image dimensions and $u,v$ locations 
 
 ```{code-cell} ipython3
-nufft = fourier.NuFFT(coords=coords, nchan=dset.nchan, uu=uu, vv=vv)
+nufft = fourier.NuFFT(coords=coords, nchan=dset.nchan)
 ```
 
 and then we can calculate model visibilities corresponding to some model image (in this case, our optimal image). Since {meth}`mpol.fourier.NuFFT.forward` returns a Pytorch tensor, we'll need to detach it and convert it to numpy. We'll also remove the channel dimension.
 
 ```{code-cell} ipython3
 # note the NuFFT expects a "packed image cube," as output from ImageCube.forward()
-vis_model = nufft(rml.icube())
+vis_model = nufft(rml.icube(), uu, vv)
 # convert from Pytorch to numpy, remove channel dimension
 vis_model = np.squeeze(vis_model.detach().numpy())
 ```

diff --git a/docs/large-tutorials/pyro.ipynb b/docs/large-tutorials/pyro.ipynb
diff --git a/docs/large-tutorials/pyro.md b/docs/large-tutorials/pyro.md
@@ -5,7 +5,7 @@ jupytext:
     extension: .md
     format_name: myst
     format_version: 0.13
-    jupytext_version: 1.14.1
+    jupytext_version: 1.15.2
 kernelspec:
   display_name: Python 3 (ipykernel)
   language: python
@@ -27,7 +27,23 @@ In this tutorial, we will explore how we can use MPoL with a probabilistic progr
 
 In this tutorial, we will use [Stochastic Variational Inference](http://pyro.ai/examples/svi_part_i.html) algorithms to obtain the posterior distribution of the model parameters. These algorithms are quick to implement in Pyro and--important for this tutorial--quick to run. Pyro also has full support for MCMC algorithms like Hamiltonian Monte Carlo and the No U-Turn Sampler (NUTS) ([for example](http://pyro.ai/examples/bayesian_regression_ii.html#HMC)) that are relatively straightforward to use in an extension from this model. However, because their run times are significantly longer than SVI algorithms, more computational resources are needed beyond the scope of this tutorial.
 
-+++
+If the following output says `Using cuda`, then this tutorial was executed on a GPU. We found that it took about 5 minutes to converge the SVI, which is pretty exciting. You may be able to run this on CPU-only machine, but expect the runtime to take significantly longer. You may want to shorten the number of iterations and reduce the number of predictive samples to get a sense that the routine will in fact execute, but be aware that your solution may not fully converge.
+
+```{code-cell} ipython3
+import torch
+if torch.cuda.is_available():
+    device = torch.device('cuda')                   
+else:                                                       
+    device = torch.device('cpu')   
+
+print(f"Using {device}.")    
+    
+```
+
+```{code-cell} ipython3
+# import arviz now, to check UTF-8 loading issue.
+import arviz as az
+```
 
 ## MPoL and models
 
@@ -244,7 +260,6 @@ We also recommend reading Gelman et al. 2020's paper on [Bayesian Workflow](http
 There are many ways to build a Pyro model. In this tutorial we will take a class-based approach and use the [PyroModule](http://pyro.ai/examples/modules.html) construct, but models can just as easily be built using function definitions (for [example](http://pyro.ai/examples/intro_long.html#Models-in-Pyro)).
 
 ```{code-cell} ipython3
-import torch
 from torch import nn
 from mpol import geometry, gridding, images, fourier, utils
 from mpol.constants import deg
@@ -573,9 +588,10 @@ class VisibilityModel(PyroModule):
         self.flayer = fourier.FourierCube(coords=coords)
 
         # create a NuFFT, but only use it for predicting samples
-        self.nufft = fourier.NuFFT(
-            coords=self.coords, nchan=self.nchan, uu=uu, vv=vv, sparse_matrices=False
-        )
+        # store the uu and vv points we might use 
+        self.uu = torch.as_tensor(uu, device=device)
+        self.vv = torch.as_tensor(vv, device=device)
+        self.nufft = fourier.NuFFT(coords=self.coords, nchan=self.nchan)
 
 
     def forward(self, predictive=True):
@@ -592,7 +608,7 @@ class VisibilityModel(PyroModule):
 
         if predictive:
             # use the NuFFT to produce and store samples
-            vis_nufft = self.nufft(img)[0]
+            vis_nufft = self.nufft(img, self.uu, self.vv)[0]
             
             pyro.deterministic("vis_real", torch.real(vis_nufft))
             pyro.deterministic("vis_imag", torch.imag(vis_nufft))
@@ -623,14 +639,9 @@ As we described in the [NuFFT](../ci-tutorials/loose-visibilities.md) tutorial,
 
 When visualizing model or residual visibility values, it is often far more useful to work with the loose visibility values produced from the NuFFT. This is because the loose visibilities can be gridded using a weighting scheme like Briggs robust weighting, which can dramatically increase the sensitivity of the resulting image. So that is why our `VisibilityModel` uses a {class}`~mpol.fourier.NuFFT` layer to produce model visibilities when working in a predictive mode but otherwise uses a more efficient {class}`~mpol.fourier.FourierCube` layer to produce model visibilities when working in a likelihood evaluation loop.
 
-Now we'll do a predictive check with the `VisibilityModel` using the same disk values found by Guzmán et al. 2018. We will also place it on the GPU, if the device is available.
+Now we'll do a predictive check with the `VisibilityModel` using the same disk values found by Guzmán et al. 2018. We will also place it on the GPU with the `.to` call, if the device is available.
 
 ```{code-cell} ipython3
-if torch.cuda.is_available():
-    device = torch.device('cuda')                   
-else:                                                       
-    device = torch.device('cpu')   
-
 # we will use this object throghout the rest of the tutorial, so we'll just call it 'model'
 model = VisibilityModel(coords=coords, distance=distance, uu=uu, vv=vv, weight=weight, data=data, device=device)
 model.to(device);
@@ -1039,14 +1050,14 @@ ax.set_xlabel("radius [au]")
 ax.set_ylabel(r"$I_\nu$ [Jy $\mathrm{arcsec}^{-2}$]");
 ```
 
-We see that there is a slightly larger scatter in the draws compared to the `AutoNormal` guide, especially around 40 au. This is because the `AutoMultivariateNormal` guide captured more of the covariance between parameters, resulting in a greater dispersion of draws.
+We see that there is a *slightly* larger scatter in the draws compared to the `AutoNormal` guide, most noticeable around 40 au. This is because the `AutoMultivariateNormal` guide captured more of the covariance between parameters, resulting in a greater dispersion of draws.
 
 Encouragingly, both our image and 1D profile results compare favorably with those found by [Guzmán et al. 2018](https://ui.adsabs.harvard.edu/abs/2018ApJ...869L..48G/abstract) (compare their Figures 2 & 4). 
 
 The true uncertainty in the radial profile may still be underestimated. As we discussed, one source could be the parameterization of the model. In reality, the disk rings are not perfect Gaussian shapes, and so, as currently implemented, our model could never capture the true intensity profile. 
 
 
-In our opinion, SVI is a very useful inference technique because of its speed and scalability. There is the risk, though, that your guide distribution does not fully capture complex covariances of your posterior distributions. Perhaps some parameter posteriors are significantly non-Gaussian or banana-shaped, and therefore not able to be captured by the multivariate Normal guide. This risk can be hard to assess from SVI fits alone, though there are steps you can take by trying out more [complex guides](https://docs.pyro.ai/en/stable/infer.autoguide.html#) or [writing your own](https://pyro.ai/examples/svi_part_i.html#Guide), parameterized around anticipated covariances. 
+In our opinion, SVI is a very useful inference technique because of its speed and scalability. There is the risk, though, that your guide distribution does not fully capture complex covariances of your posterior distributions. Perhaps some parameter posteriors are significantly non-Gaussian or banana-shaped, and therefore not able to be captured by the multivariate Normal guide. This risk can be hard to assess from SVI fits alone, though there are steps you can take by trying out more [complex guides](https://docs.pyro.ai/en/stable/infer.autoguide.html#) or [writing your own](https://pyro.ai/examples/svi_part_i.html#Guide), parameterized around anticipated covariances.
 
 +++