Creative Programming

Sunday 17 August 2014

Slicing operations on 1-D spectra

For the last part of GSoc, I implemented slicing operations on spectrum objects. Pythonically, this implies overriding the __getitem__ method. This method is passed the slice object and it returns a new spectrum1D object with the requested slicing parameters applied. For example:
>>> from specutils.io import read_fits
>>> spectra = read_fits.read_fits_spectrum1d("test.fits") # linear spectra file
>>> indexed_spectra = spectra[::2] # this creates a new spectra object which skips
# every alternate object from the original spectra
>>> spectra.dispersion
[320.66, 321.43, 322.77, 324.23, 326.97, 329.44]
>>> indexed_spectra.dispersion
[320.66, 322.77, 326.97]

The spectra can be sliced using the same interface as slicing for python lists. Internally, an indexer model is used to slice the WCS. This is because the WCS is not a list, it is a mapping from input to output. Python slicing operations cannot be directly applied to WCS'es. This indexer model provides an adapter to interface with the slice operations. It also allows WCS'es to slice from negative indices.

For GSoC, this is it. However, I will continue to contribute to Astropy, and specifically to the specutils package. The next step will be to enable transformations between different WCS'es and models, making it easier for users to convert to and fro between different formats.

Monday 4 August 2014

Adding new models to specutils

With the addition of new Models, specutils has started to become a very useful tool. This time, I have added Doppler WCS, Weighted Combination WCS, and Serial Composite WCS. These WCS'es, along with the previously implemented Polynomial, Legendre, Chebyshev and Bspline models enable the user to make very complicated functions with simple building blocks.

Doppler WCS - This model applies doppler shift to the input. Instantiates the doppler shift model from the velocity (v). The doppler factor is computed using the following formula:

doppler factor = sqrt((1 + v/c)/(1 - v/c))

,where c is the speed of light

When the model is called on an input, input * doppler_factor is returned. The inverse of this model can also be obtained, which divides the input by the doppler factor. The Doppler shift model can also be instantiated from redshift(z) and the Doppler factor itself.

Weighted Combination WCS - This model combines multiple WCS using a weight and a zero point offset. Suppose there are n WCS'es. When called with an input, this WCS returns the following:

Sum over i = 1 to n

[weight_i * (zero_point_offset_i + WCS_i(input))

WCS can be added using the add_wcs method, along with it's weight and zero point offset.

Serial Composite WCS - This model applies multiple WCS in-order to a particular input. Suppose there are 4 WCS'es, When called, this WCS returns the following:

wcs_4(wcs_3(wcs_2(wcs_1(input))))

The WCS which is added first is applied first. WCS'es can be added using the add_wcs method.

The last two models, can be used to construct complicated WCS'es just by adding simple models. Even function pointers can be added to these WCS'es making them very flexible and useful.

Apart from these models, specutils now supports reading and writing multispec linear and log-linear formats as well. The IRAF classes were redesigned to use these building blocks, making the code more simpler and elegant. The next focus will be on implementing the slicing operations onto these models, and enabling transformations from one model to another.

Saturday 12 July 2014

Using specutils for reading and writing FITS files

I have just finished developing the readers and writers for FITS IRAF format described here. I am going to give a small tutorial on how the specutils package (affiliated with Astropy) can be used for the same, and also give some details of how it works.

To read a FITS file, just simply use specutils.io.read_fits.read_fits_spectrum1d() method. This method might be later moved to Spectrum1D class. The following code snippet gives an example:

from specutils.io import read_fits
spectra = read_fits.read_fits_spectrum1d(example.fits)

Depending on the type of FITS file, the object can be different. For linear or log-linear formats, the object returned will be a Spectrum1D object. For multispec formats, the object will be a list of Spectrum1D objects. The dispersion and the flux of the Spectrum can be accessed using spectrum1Dobject.dispersion and spectrum1Dobject.flux respectively. The dispersion is automatically computed based on the type of dispersion function defined in the header. Other methods such as slicing dispersion for a particular set of flux coordinates will be supported soon.

To write a FITS file, specutils.io.write_fits.write() method. Like the reader, this method may be moved to Spectrum1D later. The following code snippet gives an example:

from specutils.io import write_fits
write_fits.write(spectra, 'example.fits')

The spectra object passed can be a list of Spectrum1D objects or just one Spectrum1D object. Depending on the type of spectra, the writer will write either in multispec or linear format respectively. If the spectra was read in using the reader, then the additional information from the read file will also be written.

There will be some more operations supported, but the focus will be on keeping things simple, and handling the complexities in the code itself.

Saturday 21 June 2014

Analysing IRAF multispec format FITS files

Today I am going to analyze a couple of FITS files, which have been stored in the IRAF multispec specification described here. I am going to focus to Legendre and Chebyshev polynomial dispersion functions.
First, take a quick look at the FITS headers in the following files:

Legendre dispersion file headers: http://pastebin.com/FqnSYwGe
Chebyshev dispersion file headers: http://pastebin.com/NDKqNj6n

NAXIS defines the number of dimensions. In multispec format, there are always two dimensions. Multiple one-dimensional spectra are stored in this format. These headers have CTYPE1 and CTYPE2 equal to `MULTISPE`. This is necessary to indicate that the spectra is stored in multispec format. NAXIS1 tells us the size of the data (length of the flux array) in each spectra, and NAXIS2 tells us the number of such one dimensional spectra. In both the files, there are 51 spectra stored.

One of the most important header keyword is WAT2_XXX. This keyword stores the information to compute the dispersion at each point, for each spectra. `specK` holds the information for the Kth spectra. There are various numbers separated by spaces within each `specK`. These numbers describe the exact function to be used to compute the dispersion values. The following list explains these numbers in order:

Aperture number: According to Wikipedia, aperture number is directly or inversely proportional to the exposure time. This entry always holds an integer value. For both the files, this number goes from 1 to 51. This value has no significance on the calculation of dispersion.
Beam number: I am not sure what this means, but it is always an integer value. For the Legendre file, this decreases from 88 to 38 and for Chebyshev file this increases from 68 to 118.
Dispersion Type: This can be 0 (linear dispersion), 1 (log-linear dispersion) or 2 (non-linear dispersion). As both these files define non-linear polynomial functions, this is always 2.
Dispersion start: This value indicates the dispersion at the first physical pixel. This value is not used for computation, however, this value can be used to verify whether the function is giving the correct output at the first pixel. Unfortunately, this value is the same for all 51 spectra in both the files, which implies that this value hasn't been correctly stored. The value matches the output returned by the 51st spectra dispersion function.
Average dispersion delta: This value is equal to the mean of the difference between consecutive dispersion values. Again, this value is not used for computation, but can be used to verify the function output. Similar to the previous value, this has been stored incorrectly in both these files. It is only correct for the 51st spectra.
Number of pixels: This value indicates the length of the flux array of this spectra. This value can be at most the value of NAXIS1. This value should be equal to PMAX - PMIN (defined later).
Doppler factor (Z): Due to relative motion of the object and the observer, Doppler effect can alter the dispersion values. This factor can be used to compute the adjusted dispersion values, by using the formula below:

Adjusted dispersion = Dispersion / (1 + Z)
Aperture low: This value is for information only. Stores the lower limit of spatial axis used to compute this dispersion.
Aperture high: Again, this value is for information only. It stores the upper limit of spatial axis used to compute this dispersion.

From this point, the function descriptors start. There can be more than one function too. In that case these descriptors are repeated starting from weight. These descriptors determine the function output. The final dispersion is calculated as:

Final dispersion = Sum of all function outputs
Weight: The weight of the function gives the multiplier for the dispersion calculated. It's use becomes more obvious in the formula below.
Zero point offset: The value to be added to all the dispersion values. Combined with the weight, and the Doppler factor, the function output can be calculated as:

Final function output = Weight * (Zero point offset + function output) / (1 + Z)

In the files given, there is only one function. The Doppler factor and the zero point offset are zero, and the weight is one. So the final dispersion is equal to the function output.
Function type code: Until this point, we know how to calculate the final dispersion, if we know the function output. This value stores to type of the function that will be used to compute the output at any given pixel. There are six possibilities:
1 => Chebyshev polynomial, 2 => Legendre polynomial, 3 => Cubic spline,
4 => Linear spline, 5 => Pixel coordinate array, and 6 => Sampled coordinate array

Starting from this point, the numbers may mean different things for different functions. I am explaining the descriptors for Legendre and Chebyshev.
Order (O): The order of the Legendre or Chebyshev function.
Minimum pixel value (Pmin): The lower limit of the range of the physical pixel coordinates.
Maximum pixel value (Pmax): The upper limit of the range of the physical pixel coordinates. In combination with the lower limit, they determine the domain of the function. This domain should be mapped to [-1, 1].
Coefficients: There are O coefficients that follow. These coefficients define the Legendre or the Chebyshev functions.

And that's it. It's a bit tedious to understand, but format enables so much information to be stored. The documentation is not very clear, and I hope this post helped you understand what these parameters stand for.

Friday 30 May 2014

FITS, not so fit

In my previous post, I introduced the FITS format, which can be used to store spectral data. One of the aim of my GSoc project is to implement a reader for this format. The FITS format can hold a very diverse range of information. Also, this format has been around for more than 30 years and has evolved a lot in that time. The basic keywords in this format are well defined, but the new keywords for storing spectral data are not. This has made the format really messy. There are different keywords that imply the same thing as well as same keywords that imply different things. The documentation of this format is incomplete, there are minimal examples and no concrete definition to make things less ambiguous.

To tackle this issue, we decided to tackle the format, one specification at a time. The first target is long slit, multi-object or echelle spectrographs (one-dimensional). The reader being implemented follows the Image Reduction and Analysis Facility(IRAF) standard described here. Basically, the FITS files can store mappings from pixel coordinates to wavelengths (or dispersion). These mappings are defined using functions. To save space, a lot of these one dimensional mappings are stored together in one FITS file. For each sequence of coordinates, the mapping function is defined in the header. This kind of format is called multispec. There are eight possible dispersion function types that can be defined in multispec format:

Linear

Log-Linear

Legendre polynomial

Chebyshev polynomial

Linear Spline

Cubic Spline

Pixel array

Sampled array

The first three were already implemented in the specutils package. I added the functionality for the fourth function and now I am currently implementing linear and cubic splines. I have been searching everywhere for files that store data in this format, but nothing has been found. Still, for completeness I am trying to decipher the format specification described, and implement a reader in case someone has these files. If you do you files for these formats, please get in touch with us!

Understanding these mathematical models has been very challenging, but fulfilling and interesting. Hopefully, with the contribution of developers, the reader will be ready by the midterms! Until next time...

Sunday 18 May 2014

Google summer of code - Revising my passion

I clearly remember my pre-university days, when I was a member of an non-profit organisation called S.P.A.C.E in India. They were educating students from all over India on astronomy. Their way of education was most interesting. I was part of their practical sessions, including star-gazing sessions from remote parts of India and experiments to measure the circumference of the earth. They also provided their students with kits which included a magic stick to measure shadows, a sky map and a small telescope as well! I also won a nation-wide contest organised by them, which led me to be a part of the team to observe the total solar eclipse in Turkey in 2006. As I entered my university, I got lost into Computer Science and slowly my interest in astronomy dwindled. As my university life almost comes to an end, I have got the opportunity to rekindle my passion via the Google Summer of Code program. I will be contributing to Astropy under the umbrella of Python Software Foundation.

Astropy is a software package written in Python intended to assist astronomy related computations. I will be contributing to the specutils package. Spectral data has been collected over centuries in various formats. This package will enable reading of spectral data into easy to use data structures, manipulation of the data using utility functions and writing of the data into various formats. The most common format for storing spectral data is FITS (Flexible Image Transport System). This format is endorsed by both NASA and IAU. Supporting this format will be a major goal of this project. There are various spectral mappings that can be defined in a FITS file. These mappings define some functions, which give information on the wavelength (or energy or frequency) at a particular point in the data array. I will go into details of these mappings and how they are defined with FITS in subsequent blog posts.

Officially coding starts from tomorrow, 19th May 2014. I am looking forward to a successful project!