Low-Frequency Pulsar Surveys and Supercomputing Matthew Bailes
Outline: Baseband Instrumentation MultiBOB MWA survey vs PKSMB survey Data rates CPU times Low-Frequency Pulsar Monitoring The Future Supercomputers
Pulsar “Dedispersion” Incoherent
Coherent Dedispersion Unresolved on us timescales From young or millisecond pulsars Power-law distribution of energies PSR J
Pulsar Timing (Kramer et al.)
CPSR2 Timing (Hotan, Bailes & Ord)
Swinburne Baseband Recorders etc 1998: Canadian S2 to computer (16 MHz x 2) 100K system + video tapes 2000: CPSR 20 MHz x 2 + DLT7000 drives x : CPSR2 128 MHz x 2 + real-time supercomputer (60 cores) 2006: DiFX (Deller, Tingay, Bailes & West) Software Correlator (ATNF adopted) 2007: APSR 1024 MHz x 2 + real-time supercomputer (160 cores) 2008: MultiBOB 13 x 1024 ch x 64us + fibre core supercomputer
dspsr software Mature Delivers < 100 ns timing on selected pulsars Total power estimation every 8us with RFI excision Write a “loader” Can do: Giant pulse work Pulsar searching (coherent filterbanks) Pulsar timing/polarimetry Interferometry with pulsar gating
PSRDADA (van Straten) psrdada.sourceforge.net Generic UDP data capture system (APSR/MultiBOB) Ring Buffer(s) Can attach threads to fold/dedisperse etc Hierachical buffers Shares available CPU resources/disk Web-based control/monitoring Free! + hooks to dspsr & psrchive.
APSR Takes 8 Gb/s voltages Forms: 16 x 128 channels (with coherent dedispersion) 4 Stokes, umpteen pulsars Real-time fold to DM=250 pc/cc. O(100) Ops/sample Sustaining >>100 Gflops ~100K computers. June MHz 4bits 768 MHz 2bits
Coherent Dedispersion BW/time x x x x (100K) (300K) BW year
Coherent Dedispersion Now “trivial” FFT ease ~ B -2 / 3
MultiBOB High Resolution Universe Survey (PALFA of the South) Werthimer’s iBOB boards 1024 channels, down to 10us sampling Two pols FPGA coding hard… Use software gain equalizer/summer ~5 MB/s beam 1 Gb/s Fibre to Swinburne (>1000 km fibre) Real time searching!
New PKS MB Survey: Bailes 13 beams 9 minutes/pointing 1024 channels 300 MHz BW 64 us sampling +/- 15 deg Kramer 13 beams 70 minutes/pointing 1024 channels 300 MHz BW 64 us sampling +/- 3.5 deg Johnston 13 beams 4.5 minutes/pointing 1024 channels 300 MHz BW 32 us sampling The rest
MWA Samples Takes (24x1.3MHz=32 MHz) x 2 x 512 “Just” 32 GB/s (64 Gsamples/s) FFTs it (5 N log2 ops/pt = 2.2 Tflops) XMultiplies & adds (512)*256*B*4 = 16 TMACs
Sensitivity: ~3-5x PKS 32 vs 288 MHz 350 vs 25 K 700 vs 0.6 deg 2 (folded factor)
PKS vs MWA G ~ 3-5 x better T sys ~ 14 x worse ? B 1/2 ~ 3 x worse Flux ~ 25 x better (1400 vs 200 MHz) t 1/2 ~ 32 x better ~ Parity Single Pulse work ~ Comparable Coherent search ~ 32x improvement! But: There is a limit to the time you can observe a pulsar! 4m vs 144m -> 5x deeper.
Scattering b=0 1,10,100,1000ms
Scattering b=5d 1,10,50,100ms
b=30 0.5,1ms
36 GB/s Search instrumentation? 32 MHz... FX GB/s 5 bits x 512 Grid... 2D FFT Volts SpectraVisibilities uv FBanks Dedisp... Spectra FFT Fold Pulsars <1 bit/s 200 GB/s 32 bits x GB/s 32 bits x 512 x GB/s x GB/s Correlator Us ? ?
Search Timings 36,000 “coherent beams” (768m/4m=192) 2 36 gigapixels/s Dedisperse/CPU core Gigapixel/120s 36 x 120 = 4320 cores = 500 machines = 250 kW N FFT = 36,000 * 1024 (DMs)/8192 = 4608 FFTs/sec Seek (3s / 8192 x 1024 pt FFT) 14,000 cores ~ 1800 machines = MW. (M$/yr)
Swinburne The Green Machine installed May/June 2007 185 Dell PowerEdge1950 nodes 2 quad-core processors (Clovertown: Intel Xeon 64-bit 2.33 GHz) 16GB RAM 1TB disk -> 300 TB total 1640 cores/14 Tflops dual channel gigabit ethernet CentOS Linux OS job queue submission 20 Gb infiniband (Q1 2008) 83 kW.vs. 130 kW cooling Machines: ~1.2M Fuel: ~100K/yr
Search Times: Depend only upon: Npixels x Nchans x Tsamp -1 Requires: No acceleration trials PSR J In 8192s, small width from acceleration
Search Timings (32x32 tiles) >1024 “coherent beams” 36->1 gigapixels/s Dedisperse/core Gigapixel/120s 120 = 120 cores = 15 machines = 7 kW N FFT = 1024 * 1024 (DMs)/8192(s/FFT) = 128 FFTs/sec Seek (3s / (8192 x 1024) pt FFT) 378 cores ~ 50 machines = 25 kW.
RRATs Log N - Log S (helps with long pointings…) 1000 x integration time. Maybe good RRAT finder.
Monitoring: Monitoring?
Monitoring:
Build Your Own Telescope? May be cheaper to build dedicated PSR telescope than attempt to process everything from existing telescopes! 32x32 tile: (2D FFT - 1D FFT - dedisperse - FFT) ~2M telescopes ~2M “beamformer/receivers” ~1M correlator ~1M Supercomputer ~1M construction ~7-8M
Next-Gen Supercomputers (IO or Tflops?) Infiniband 20 Gb (40Gb) 288 port switch ~10 Tb/s IO Capacity (1-2K/node) Teraflop CPU capacities/node (140 Gflops now) Teraflop Server or Tflop GPU? 10 GB/s vs 76 GB/s Power (0.1W/$) 2M = 200 kW
Architecture (2011??): 288 Ports 40 Gb/s 288 Ports 40 Gb/s 144 Tflops 300K ~1M FX
Summary: Strong motivation for multiple (~100) tied array beams PSRs/deg^2 Surveys only possible with compact configurations At present Future Supercomputers may allow search even with MWA-like telescopes