I've spent the last week doing radiative transfer calculations with Molpop-CEP in JPL with Moshe Elitzur, Paul Goldsmith and Karen Willacy in photodissociation regions (PDR). I have managed to compute CO, H2O, OH, C+, C and O2. The only molecule that really gave us some problems was water. We found that, in some models, the temperature goes down to 10 K, while part of the PDR is still with temperature above 350 K. This poses a problem to any radiative transfer method that solves the full non-local problem. The reason is that in some parts of the atmosphere we have many levels properly pumped but the upper energy levels are completely unpopulated in the low temperature regions. Then, the convergence is dominated by these levels and the iteration never converges.
We have an idea to update the code to deal with an adaptive control of the number of levels in each zone. The idea is to delete energy levels whose population is below a certain threshold and only solve for the remaining low-energy levels. This would produce that the mean intensity in these regions is not affected. If these region happen close to the boundary, they will simply see the boundary condition. If we have accumulated mean intensity in the radiative transfer calculation, these regions where the line has disappeared will be plainly transparent. So, the mean intensity will be still computed for all transitions, but it will be updated only in the regions where both levels of the transition are active. I will think about the practical implementation of this idea and see if it works.
The research notebook
Monday, December 3, 2012
Saturday, October 27, 2012
I have been recently working on the deconvolution of spectropolarimetric data observed with space-borne instruments like Hinode. The advantage of such telescopes is that the observations are not corrupted with the atmosphere point spread function (PSF). As a consequence, the observations are really stable. In spite of this advantage, the instrumental PSF (telescope+spectropolarimeter) always affects the observations. This PSF induces a smearing of the profiles, with the consequence that the information obtained at a certain pixel is a combination of the Stokes profiles of many surrounding pixels.
Inferring the physical conditions from such kind of observations have been traditionally carried out using the filling factor approach. A fraction of the pixel is assumed to contain information from the magnetic region of the pixel itself and the remaining is just contamination from the surrounding pixels and of the non-magnetic part of the pixel you observe. Correcting for the unresolved structure of the pixel is quite difficult and I guess that it can only be done statistically from the observation of many pixels. However, correcting for the smearing can be done if the PSF of the instrument is known. Given that the PSF of Hinode is quite well known, there are recent efforts to deconvolve the data from it.
Deconvolution is an ill-posed problem. Under some simplifying conditions, the effect of a PSF (that we represent as the image $\mathbf{P}$) on an image $\mathbf{O}$ is given by: $$\mathbf{I}=\mathbf{O}*\mathbf{P}$$ where $\mathbf{I}$ is the degraded image and $*$ is the convolution operator. In the Fourier domain, the image degradation process transforms into a simple product:
$$\mathcal{I}=\mathcal{O}\mathcal{P}$$
where the calligraphic fonts refer to the Fourier transforms. Even with a good knowledge of the PSF, the deconvolution process is an ill-posed problem in the noisy case, given that the deconvolved image is obtained as a division of the Fourier transforms of the degraded images and the PSF. If noise is present, the information about the high frequencies disappear and the resulting deconvolution becomes a mess.
This can be solved using regularization. There is a recent effort by van Noort (2012) in which he carries out the inversion of the whole observed map while simultaneously taking into account the convolution process. That's a very neat idea but extremely time-consuming.
I have come with an idea that is much simpler and also gives good results at a fraction of the computing time. Additionally, the deconvolution and the inversion of the Stokes profiles is uncoupled, so that one can apply the desired inversion scheme once the observations have been deconvolved. The idea is to introduce a regularization on the Stokes profiles and not on the images, as usually carried out in the deconvolution of natural images. Before reaching the telescope, the Stokes profiles at each pixel can be quite generally written as a series expansion in terms of a set of orthonormal eigenfunctions. If these eigenfunctions are, for instance, the principal components obtained from the observations, this expansion can be very compact and it can be truncated with a reduced number of eigenfunctions:
$$\mathbf{O}(\lambda)=\sum_{i=1}^N \mathbf{w}_i \phi_i(\lambda)$$
The resulting convolved image is then given by:
$$\mathbf{I}(\lambda)=\sum_{i=1}^N (\mathbf{w}_i * \mathbf{P}) \phi_i(\lambda) + \mathbf{N}$$
where $\mathbf{N}$ is a Gaussian noise image. The advantage of using orthonormal eigenfunctions is that we can project the previous equation along the eigenfunctions and obtain:
$$\langle \mathbf{I}(\lambda), \phi_k(\lambda) \rangle=\mathbf{w}_k * \mathbf{P} + \mathbf{N}$$
where the noise is still Gaussian with zero mean and the same variance.
The previous equation tells us to deconvolve each of the images obtained by projecting the observed Stokes profiles onto the first $N$ eigenfunctions and recover the final Stokes profiles by using the deconvolved images of coefficients and the eigenfunctions. The deconvolution is done with a standard Richardson-Lucy deconvolution scheme with the advantage that the images of coefficients have a very small noise level. The reason is that the first $N$ projections of the Stokes profiles onto the principal components only capture the real signal, while the remaining capture the noise. This way, noise is efficiently filtered. Finally, one ends up with almost noiseless Stokes profiles to inject to the inversion code of choice.
Inferring the physical conditions from such kind of observations have been traditionally carried out using the filling factor approach. A fraction of the pixel is assumed to contain information from the magnetic region of the pixel itself and the remaining is just contamination from the surrounding pixels and of the non-magnetic part of the pixel you observe. Correcting for the unresolved structure of the pixel is quite difficult and I guess that it can only be done statistically from the observation of many pixels. However, correcting for the smearing can be done if the PSF of the instrument is known. Given that the PSF of Hinode is quite well known, there are recent efforts to deconvolve the data from it.
Deconvolution is an ill-posed problem. Under some simplifying conditions, the effect of a PSF (that we represent as the image $\mathbf{P}$) on an image $\mathbf{O}$ is given by: $$\mathbf{I}=\mathbf{O}*\mathbf{P}$$ where $\mathbf{I}$ is the degraded image and $*$ is the convolution operator. In the Fourier domain, the image degradation process transforms into a simple product:
$$\mathcal{I}=\mathcal{O}\mathcal{P}$$
where the calligraphic fonts refer to the Fourier transforms. Even with a good knowledge of the PSF, the deconvolution process is an ill-posed problem in the noisy case, given that the deconvolved image is obtained as a division of the Fourier transforms of the degraded images and the PSF. If noise is present, the information about the high frequencies disappear and the resulting deconvolution becomes a mess.
This can be solved using regularization. There is a recent effort by van Noort (2012) in which he carries out the inversion of the whole observed map while simultaneously taking into account the convolution process. That's a very neat idea but extremely time-consuming.
I have come with an idea that is much simpler and also gives good results at a fraction of the computing time. Additionally, the deconvolution and the inversion of the Stokes profiles is uncoupled, so that one can apply the desired inversion scheme once the observations have been deconvolved. The idea is to introduce a regularization on the Stokes profiles and not on the images, as usually carried out in the deconvolution of natural images. Before reaching the telescope, the Stokes profiles at each pixel can be quite generally written as a series expansion in terms of a set of orthonormal eigenfunctions. If these eigenfunctions are, for instance, the principal components obtained from the observations, this expansion can be very compact and it can be truncated with a reduced number of eigenfunctions:
$$\mathbf{O}(\lambda)=\sum_{i=1}^N \mathbf{w}_i \phi_i(\lambda)$$
The resulting convolved image is then given by:
$$\mathbf{I}(\lambda)=\sum_{i=1}^N (\mathbf{w}_i * \mathbf{P}) \phi_i(\lambda) + \mathbf{N}$$
where $\mathbf{N}$ is a Gaussian noise image. The advantage of using orthonormal eigenfunctions is that we can project the previous equation along the eigenfunctions and obtain:
$$\langle \mathbf{I}(\lambda), \phi_k(\lambda) \rangle=\mathbf{w}_k * \mathbf{P} + \mathbf{N}$$
where the noise is still Gaussian with zero mean and the same variance.
The previous equation tells us to deconvolve each of the images obtained by projecting the observed Stokes profiles onto the first $N$ eigenfunctions and recover the final Stokes profiles by using the deconvolved images of coefficients and the eigenfunctions. The deconvolution is done with a standard Richardson-Lucy deconvolution scheme with the advantage that the images of coefficients have a very small noise level. The reason is that the first $N$ projections of the Stokes profiles onto the principal components only capture the real signal, while the remaining capture the noise. This way, noise is efficiently filtered. Finally, one ends up with almost noiseless Stokes profiles to inject to the inversion code of choice.
Friday, October 19, 2012
Brussels is a strange city. On the one hand, it is the "capital" of the European Union. It is full of huge government-like buildings with a lot of important-looking people walking around. On the other hand, it is the city in which you book a hotel, it goes into bankruptcy and does not inform their guests in advance. It is the city where a simple football match collapses all hotels and it is the city where you see your life passing before your eyes when you take a taxi.
Saturday, October 13, 2012
On Monday we leave for a 2-day meeting in Brussels about polarization in AGN. This is a good place to understand (if anybody does) how does polarization occur in AGN. The present paradigm of Seyfert galaxies is that Type-1 and Type-2 are exactly the same object. Type-1 Seyferts are characterized for the observation of broad lines (expected to occur very close to the black hole that serves as an engine of the AGN) together with narrow lines occurring well above the black hold. Both kinds of lines are observed simultaneously on the intensity spectrum. Type-2 Seyferts are characterized for the observation only of narrow lines. Broad lines are not conspicuous in these objects.
The present unification model identifies Type-1 sources when the AGN is observed face-on, while Type-2 happen when the AGN is observed edge-on, as shown in the following figure.
Although the broad line region (BLR) is not observed in type-2 sources, they are observed in polarized light, as a consequence of the scattering in the electron clouds above the black hole. Given that scattering in electrons does not depend on wavelength, the clouds behave essentially as a mirror, in which the "reflected" light is polarized.
The central engine is expected to be surrounded by a dusty torus. Initially, it was assumed to be a smooth distribution of dust, but a clumpy dusty torus gives better results in terms of understanding the observations. If this dusty torus is clumpy, there is a point we still do not understand. In principle, it is possible that the light from the central engine also suffers scattering in the dust clumps and induces polarization, which would be observed preferentially in the mid-infrared. Does it make sense to calculate/observe the spectropolarimetric characteristics of the scattered radiation? We'll see if we can answer this question in the meeting.
In a micro step-ahead, the Metropolis-within-Gibbs code can now start from a previous run. This is advantageous for doing the computation in batches and testing whether the chain is already converged or we have enough samples for a nice posterior distribution plot.
I have also tested the more elaborate hierarchical Bayesian analysis of coronal loops oscillations. In one of our previous paper (Arregui & Asensio Ramos 2011), we applied a simple Bayesian analysis of single coronal loops oscillations. The measured period ($P$) and damping time ($\tau_d$) were related, in our model, with the Alfvén time $\tau_A$, density contrast between the interior and exterior of the coronal loop $\xi$, and the ratio between the length and radius of the tube $l/R$ $$P=\tau_A \sqrt{2} \sqrt{\frac{\xi+1}{\xi}}$$ and $$\frac{\tau_d}{P} = \frac{2}{\pi} \frac{\xi+1}{\xi-1} \frac{1}{l/R}$$ Under the assumption that $P$ and $\tau_d$ are measured quantities, we applied a MCMC code to sample the posterior distributions. We observed a quite strong prior sensitivity for some of the parameters, but reliable inferences for others.
The idea now is to go much deeper in the analysis. On the one hand, we plan to get closer to the "measurement" phase. The period and the damping time are inferred quantities from the analysis of the fluctuations of coronal loops obtained from satellite images of the corona. They are obtained after assuming that the oscillation (measured in pixels vs. time from the initial time) is given by the following complex approximate formula:
$$x(t)=x_\mathrm{osc}(t)+x_\mathrm{trend}(t)$$
The trend absorbs anything that is not purely oscillatory and is given by the following polynomial fit:
$$x_\mathrm{trend}(t)=a_0+a_1 t + a_2 t^2+a_3 t^3+ \cdots$$
The oscillatory part is modeled as a damped oscillation:
$$x_\mathrm{osc}(t)=A_0 \sin \left( \frac{2 \pi}{P} t + \phi \right) \exp \left( - \frac{t}{\tau_d} \right)$$
Consequently, assuming the previous expressions for the period and damping time, the i-th loop is described by the set of parameters:
$$\{(A_0)_i, (\tau_A)_i, \xi_i, (l/R)_i, \phi_i, \mathbf{a}_i\}$$
Depending on the complexity of the trend, we can have between 7 and 11 parameters to describe not more than several dozen observed points.
Our idea is then to consider that these parameters are realizations of some parametric distributions (thus the hierarchical character) and try to learn the parameters of these priors. We have proposed the following priors that we are now trying to learn from the data:
$$(\tau_A)_i \sim \mathrm{IG}(\gamma,\delta)$$
$$(\xi)_i \sim \mathrm{IG}(\epsilon,\eta,1)$$
$$(l/R)_i \sim \mathrm{Beta}(\alpha,\beta,0,2)$$
$$(A_0)_i \sim \mathrm{Gamma}(0.01,0.01)$$
$$(a_j)_i \sim \mathrm{N}(0,\sigma^2)$$
with the data following a normal distribution around the previous oscillatory+trend model. We consider that the previous distributions are sufficiently general and we have to play now with the hyperpriors for the parameters $\alpha$, $\beta$, $\gamma$, $\delta$, $\epsilon$ and $\eta$.
I have also tested the more elaborate hierarchical Bayesian analysis of coronal loops oscillations. In one of our previous paper (Arregui & Asensio Ramos 2011), we applied a simple Bayesian analysis of single coronal loops oscillations. The measured period ($P$) and damping time ($\tau_d$) were related, in our model, with the Alfvén time $\tau_A$, density contrast between the interior and exterior of the coronal loop $\xi$, and the ratio between the length and radius of the tube $l/R$ $$P=\tau_A \sqrt{2} \sqrt{\frac{\xi+1}{\xi}}$$ and $$\frac{\tau_d}{P} = \frac{2}{\pi} \frac{\xi+1}{\xi-1} \frac{1}{l/R}$$ Under the assumption that $P$ and $\tau_d$ are measured quantities, we applied a MCMC code to sample the posterior distributions. We observed a quite strong prior sensitivity for some of the parameters, but reliable inferences for others.
The idea now is to go much deeper in the analysis. On the one hand, we plan to get closer to the "measurement" phase. The period and the damping time are inferred quantities from the analysis of the fluctuations of coronal loops obtained from satellite images of the corona. They are obtained after assuming that the oscillation (measured in pixels vs. time from the initial time) is given by the following complex approximate formula:
$$x(t)=x_\mathrm{osc}(t)+x_\mathrm{trend}(t)$$
The trend absorbs anything that is not purely oscillatory and is given by the following polynomial fit:
$$x_\mathrm{trend}(t)=a_0+a_1 t + a_2 t^2+a_3 t^3+ \cdots$$
The oscillatory part is modeled as a damped oscillation:
$$x_\mathrm{osc}(t)=A_0 \sin \left( \frac{2 \pi}{P} t + \phi \right) \exp \left( - \frac{t}{\tau_d} \right)$$
Consequently, assuming the previous expressions for the period and damping time, the i-th loop is described by the set of parameters:
$$\{(A_0)_i, (\tau_A)_i, \xi_i, (l/R)_i, \phi_i, \mathbf{a}_i\}$$
Depending on the complexity of the trend, we can have between 7 and 11 parameters to describe not more than several dozen observed points.
Our idea is then to consider that these parameters are realizations of some parametric distributions (thus the hierarchical character) and try to learn the parameters of these priors. We have proposed the following priors that we are now trying to learn from the data:
$$(\tau_A)_i \sim \mathrm{IG}(\gamma,\delta)$$
$$(\xi)_i \sim \mathrm{IG}(\epsilon,\eta,1)$$
$$(l/R)_i \sim \mathrm{Beta}(\alpha,\beta,0,2)$$
$$(A_0)_i \sim \mathrm{Gamma}(0.01,0.01)$$
$$(a_j)_i \sim \mathrm{N}(0,\sigma^2)$$
with the data following a normal distribution around the previous oscillatory+trend model. We consider that the previous distributions are sufficiently general and we have to play now with the hyperpriors for the parameters $\alpha$, $\beta$, $\gamma$, $\delta$, $\epsilon$ and $\eta$.
Tuesday, October 9, 2012
In fact, applying the Metropolis-within-Gibbs code to the relatively simple hierarchical problem we have devised to infer properties about the magnetic field in central stars of planetary nebulae was a success. Success means that the marginal posteriors are similar to the analytical expressions we have obtained.
The model is quite simple. We use the observed Stokes V and Stokes I profiles to infer the magnetic field. We model the circular polarization of planetary nebulae $i$ as
$$V_i=\alpha B_i \mu_i \frac{dI}{d\lambda}$$
The statistical model that we propose is the following, once all the observations we have have been taken into account:
$$V_i \sim N(\alpha B_i \mu_i dI/d\lambda, \sigma^2)$$
$$B_i \sim Maxwell(b_0)$$
$$\mu_i \sim U(-1,1)$$
$$b_0 \sim Gamma(\nu,\nu)$$
This is a hierarchical model in which we have assumed that the magnetic field strength is described by a Maxwell distribution with a scale $b_0$, over which we put a Gamma prior (which, for very small $\nu$ is equivalent to a Jeffreys prior).
The model is quite simple. We use the observed Stokes V and Stokes I profiles to infer the magnetic field. We model the circular polarization of planetary nebulae $i$ as
$$V_i=\alpha B_i \mu_i \frac{dI}{d\lambda}$$
The statistical model that we propose is the following, once all the observations we have have been taken into account:
$$V_i \sim N(\alpha B_i \mu_i dI/d\lambda, \sigma^2)$$
$$B_i \sim Maxwell(b_0)$$
$$\mu_i \sim U(-1,1)$$
$$b_0 \sim Gamma(\nu,\nu)$$
This is a hierarchical model in which we have assumed that the magnetic field strength is described by a Maxwell distribution with a scale $b_0$, over which we put a Gamma prior (which, for very small $\nu$ is equivalent to a Jeffreys prior).
Monday, October 8, 2012
Today I managed to finish the Metropolis-within-Gibbs MCMC code. In principle, this code can be used to several hierarchical Bayesian problems we have open at the moment. All of them share the property that the posterior depends on a large set of parameters $\theta_i$ which have a common prior that also depends on hyperparameters. The advantage is that the influence of each $\theta_i$ on the posterior is independent of the other set of $\theta_j$ with $j \neq i$.
The first problem we have is the one of detecting magnetic fields in central stars of planetary nebulae when the noise variance is left as an unknown. The likelihood is then transformed into a Student-t distribution. In this case, the marginal posterior for the hyperparameters cannot be found in a closed form as it happens when the noise variance is known. My guess is that a Metropolis-within-Gibbs can be used to rapidly sample from the posterior. Other problems include obtaining global information about the physical parameters of coronal loop oscillations and also about the magnetism of the quiet Sun.
On a short discussion with rms, he suggested to develop a Bayesian meta analysis of some atomic data in astrophysics. This is a really good idea that can be solved within a hierarchical Bayesian model. We will work on it in the following weeks.
The first problem we have is the one of detecting magnetic fields in central stars of planetary nebulae when the noise variance is left as an unknown. The likelihood is then transformed into a Student-t distribution. In this case, the marginal posterior for the hyperparameters cannot be found in a closed form as it happens when the noise variance is known. My guess is that a Metropolis-within-Gibbs can be used to rapidly sample from the posterior. Other problems include obtaining global information about the physical parameters of coronal loop oscillations and also about the magnetism of the quiet Sun.
On a short discussion with rms, he suggested to develop a Bayesian meta analysis of some atomic data in astrophysics. This is a really good idea that can be solved within a hierarchical Bayesian model. We will work on it in the following weeks.
Subscribe to:
Comments (Atom)