by Arjun Radhakrishnan supervised by Prof. Michael Inggs

by Arjun Radhakrishnan supervised by Prof. Michael Inggs
Accelerating Coherent Pulsar De-dispersion on Graphics Processing Units by Arjun Radhakrishnan supervised by Prof. Michael Inggs

Outline Graphics Processing Units (GPUs) Pulsars Pulsar De-dispersion
Motivation Implementation Results Conclusion & Future Work

Graphics Processing Units
GPUs are massively parallel processors that are present on consumer graphics cards Generally used to render 3D objects on screen and calculate the colour of pixel to display *Source: [7] Are mass market products due to the video game industry Performance tracks Moore's Law since the majority of on-chip space is devoted to compute units as opposed to cache on CPUs

Why Use GPUs? Figure 1: Peak floating point performance of NVIDIA GPUs vs Intel CPUs [2]

Figure 2: Pulsar Model [3]
Pulsars Highly magnetised, rapidly rotating neutron stars formed after a supernova Pulsars emit beams of electromagnetic radiation from their magnetic poles Beams sweep in a circular path called the “lighthouse effect” Produce periodic pulses when the pulse sweeps Earth Figure 2: Pulsar Model [3]

Pulsar Dispersion Pulsar emissions are distorted upon passing through the ionised Interstellar Medium (ISM) Lower frequency components of the pulse are delayed more than higher frequencies

Figure 3: Pulsar De-dispersion [4]
Pulsar emissions are distorted upon passing through the ionised Interstellar Medium (ISM) Lower frequency components of the pulse are delayed more than higher frequencies Correct for the dispersion by shifting the received signal a certain amount Figure 3: Pulsar De-dispersion [4]

Coherent De-dispersion
Coherent de-dispersion is the most accurate method of removing the dispersion effects of the Interstellar Matter Preserves amplitude and phase information from the receiving signal Convolve the voltage signal with the inverse transfer function of the ISM This transfer function is a function of the Dispersion Measure (DM) of the signal got from models of the galactic electron density In practice we use the Fast Fourier Transform (FFT) to make the convolution operation a multiplication in the frequency domain and then apply an inverse FFT

Motivation Why study Pulsars? GPU acceleration for MeerKAT
A major SKA Science driver: Detection of gravitational waves and tests of strong field relativity; Analysing black holes GPU acceleration for MeerKAT Large frequency range (Low: 0.5 – 2.5 GHz, High: 8 – 14.5GHz) High bandwidth per polarisation (4GHz final) Large number of channels (16384) >10GB of data per second Even more important for SKA since precision will be a high priority and data storage is not feasible

Implementation Considerations
Both CPU and GPU were tested with single-precision floating point A bottleneck for GPU computing is the time taken to send data to it from main memory – minimise as much as possible Use asynchronous data transfers to hide the latency Re-calculate rather than copy data across Use shared memory on the GPU for calculations and store to global memory at the end Source data file used is fake dual polarisation data generated with a DM of 50pc/cm3 and 100MHz bandwidth centred on 1450MHz

Receive de-dispersed signal
Basic Program Flow HOST DEVICE Read in Data Copy to GPU memory Allocate memory on GPU Initiate GPU Kernel Begin De-dispersion Parallel FFT Parallel FFT ... Parallel FFT V(f0) . H-1(f0) V(f1) . H-1(f1) ... V(fn) . H-1(fn) Inverse FFT Inverse FFT ... Inverse FFT + + Output Array Receive de-dispersed signal Send Data Back to Host Free Memory Figure 4: Program flow

Figure 5: Left: Overall speedup (5x) Right: Kernel Speedup (12x)
Results Figure 5: Left: Overall speedup (5x) Right: Kernel Speedup (12x)

Results Was able to coherently de-disperse 50MHz on 1 GPU
Used 2 GPUs for the full 100MHz Scaling across multiple GPUs was linear Using larger transfer functions was found to increase performance since there was less of an overhead in memory access times

Conclusion GPUs are significantly faster than CPUs for de-dispersion
Enabled real-time coherent de-dispersion for the dataset used Coherent de-dispersion of a 100MHz bandwidth signal requires multiple GPUs at present Faster memory access would greatly improve overall speedup Currently testing with real undetected pulsar data

Thank You! Questions?

References D. R. Lorimer and M. Kramer, Handbook of Pulsar Astronomy Cambridge University Press, 2005 NVIDIA CUDA Programming Guide D. Manchester, “CSIRO ATNF Pulsar Education Page” Jim Cordes, “The SKA as a Radio Synoptic Survey Telescope: Widefield Surveys for Transients, Pulsars and ETI”, SKA Memo 97 John Rowe Animation/Australia Telescope National Facility, CSIRO [Online]. Cornell University Dept. of Astronomy, “Legacy Pulsars: Homepage” [Online]. VR-Zone, “The NVIDIA GeForce GTX 280 1GB bare,” [Online]. zone.com/articles/nvidia-geforce-gtx-280-preview/5872.html?doc=5872

by Arjun Radhakrishnan supervised by Prof. Michael Inggs

Similar presentations

Presentation on theme: "by Arjun Radhakrishnan supervised by Prof. Michael Inggs"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

by Arjun Radhakrishnan supervised by Prof. Michael Inggs

Similar presentations

Presentation on theme: "by Arjun Radhakrishnan supervised by Prof. Michael Inggs"— Presentation transcript:

Similar presentations

About project

Feedback