James Pearson Simon Dye, Nan Li

James Pearson Simon Dye, Nan Li
Determining Strong Lens Mass Models using Convolutional Neural Networks James Pearson Simon Dye, Nan Li

Overview Strong Gravitational Lensing Theory and Surveys
Detection & Characterisation - What’s been done already? Simulating Lensing Machine Learning and Convolutional Neural Networks Results Summary and Future Work Here’s a quick overview of what I’ll be talking about I’ll first talk about a brief background of Gravitational Lensing and the Surveys done to find such events, followed by a brief look at some of the methods used in finding and characterising strong lenses. Then I’ll talk about my own work in Simulating lensing images and constructing a CNN to analyse them, including an overview of such networks. I’ll discuss my results so far in both general accuracy of the CNN and investigating how best it learns, and finally do a quick summary and the plan for what’s next.

Strong Gravitational Lensing
Lensing Theory & Surveys | What’s Been Done? | Simulations | ML & CNNs | Results | Summary & Future Work Strong Gravitational Lensing Origin from GR & Effect: In gravitational lensing we have light from a source bending around a massive object in our line of sight, such as the galaxy cluster shown here. I am focusing on strong galaxy-scale lensing, whereby both the source and lens are single galaxies, rather than galaxy clusters. Image Credit: NASA, ESA & L. Calçada

Lensing Theory & Surveys | What’s Been Done? | Simulations | ML & CNNs | Results | Summary & Future Work Strong Gravitational Lensing Origin from GR & Effect: Because of this, we observe arcs and rings of light - these are Einstein rings (and crosses). An extreme example of such arcs is of course the “Cosmic Horseshoe” taken by HST. [Next 2 images] You don’t often see such a complete arc - normally you might see something like these, provided you are viewing in multiple filters. Uses: Studying these lensing events can help constrain the mass density profile and hence the dark matter content of the foreground galaxy, and if combined with redshift measurements this can aid in galaxy evolution models and dark matter simulations. Also, while the background source is distorted, it is also magnified while maintaining surface brightness, allowing us to study it as well. If we can know the mass profile of the lens then we can potentially reconstruct what the unlensed high-redshift source would look like, as per Simon Dye’s work. [Can also use Time Delays and mass model to constrain the cosmological model by inferring the Hubble constant.] Image Credit: Kavan Ratnatunga (Carnegie Mellon Univ.) and NASA/ESA Image Credit: NASA, ESA, A. Bolton (Harvard-Smithsonian CfA) and the SLACS Team Cosmic Horseshoe, LRG Image Credit: ESA/Hubble & NASA

Lensing Theory & Surveys | What’s Been Done? | Simulations | ML & CNNs | Results | Summary & Future Work Strong Gravitational Lensing Sloan Lens ACS (SLACS) Survey CFHTLS Strong Lensing Legacy Survey Dark Energy Survey (DES) Credit: Gunn et al. 2006 Credit: CFHT website, Credit: Reidar Hahn, Fermilab Credit: LSST Project/NSF/AURA Credit: ESA Large Synoptic Survey Telescope (LSST) Euclid Telescope Current Surveys: So far the main search for strong lenses has been the Sloan Lens ACS (SLACS) survey, which used follow-up images from Hubble to confirm candidates selected using spectroscopic data from the Sloan Digital Sky Survey. There have been discoveries from other surveys as well, such as the CFHTLS Strong Lensing Legacy Survey (using the CFH Telescope) and lenses found in the Dark Energy Survey. However, so far only a few hundred strong lenses have been found, with most lying at low redshift. Upcoming Surveys: But, this is set to change in a few years time with the advent of the Euclid telescope and the LSST. These are expected to produce billions of galaxy images, thought to contain tens of thousands of strong lensing systems, so there needs to be a way to sift through the catalogues and find these lenses in a quick and accurate way.

Detection & Characterisation - What’s been done already?
Lensing Theory & Surveys | What’s Been Done? | Simulations | ML & CNNs | Results | Summary & Future Work Detection & Characterisation - What’s been done already? Lens Detection: Geometrical quantification (Bom et al. 2017; Seidel & Bartelmann 2007) Analysis of colour bands & spectroscopy (Maturi et al. 2014; Baron & Poznanski 2016) Lens Detection: There’s been lots of methods proposed for detecting lenses, in a variety of different ways - for example… [Bom et al used their own Mediatrix Filamentation Method. Seidel & Bartelmann 2007 used their own grids of “cells” method.]

Lensing Theory & Surveys | What’s Been Done? | Simulations | ML & CNNs | Results | Summary & Future Work Detection & Characterisation - What’s been done already? Lens Detection: Non-CNN Machine Learning (Joseph et al (Principle Component Analysis); Avestruz et al (Histogram of Oriented Gradients)) Convolutional Neural Networks (Petrillo et al. 2017; Lanusse et al. 2017; Jacobs et al. 2017; Schaefer et al. 2017) Convolutional Neural Networks: Early last year was when Convolutional Neural Networks, or CNNs, were first used for lens detection. CNNs have shown to be very effective at identifying lenses purely from images, and are able to do so at great speed, but for this task they require training, needing 10s of thousands if not 100s of thousands of images. Of course, we don’t have that many real images so they must be simulated instead.

Lensing Theory & Surveys | What’s Been Done? | Simulations | ML & CNNs | Results | Summary & Future Work Detection & Characterisation - What’s been done already? Lens Parameter Estimation: Parameter Fitting techniques (Vegetti & Koopmans 2009; Warren & Dye 2003, Nightingale, Dye & Massey 2017) CNNs (Hezaveh et al. 2017) Parameter Estimation: Meanwhile, others have been looking into ways of analysing lensing systems rather than detect them, using parameter fitting techniques such as the work done by Simon. Such analysis allows you to work out the parameters of each lens’ mass model and hence potentially reconstruct what the source galaxy looks like. Late last year a paper was published by Hezaveh et al. on using CNNs, not for lens detection, but for predicting these model parameters, which sparked our interest as they showed promising results, although their method used a very complex network and relatively simple lens simulations. Hence I began work on creating my own CNN for parameter estimation, investigating how best to train the network and ultimately how efficient it can be while remaining a relatively simple. Vegetti & Koopmans (2009): Adaptive-grid method, based on a Bayesian analysis of the surface brightness distribution of highly magnified Einstein rings and arcs, that allows the identification and precise quantification of substructure in single lens galaxies (likelihoods of substructure measurements, with prior and posterior probability functions) Warren & Dye (2003): Semi-Linear (grav. lens) Inversion method: Based on older technique that used 3 nested cycles: Inner(best source light profile), Middle(best lens mass profile), Outer(best value of lambda, i.e more or less weight to negentropy term) in order to minimise the the Merit function = Chi-Squared + lambda*Entropy. [Classical Entropy is normally negative, so adding in this case is negative entropy => negentropy] The new technique replaces the negentropy with a linear regularisation term, and for a fixed mass dist. the merit minimisation is now linear, solved by matrix inversion => innermost cycle is eliminated. Source parameters are linear dimensions and mass parameters are non-linear dimensions, => “Semi-Linear”. They find that in most cases the lambda regularisation term can be removed, with speed greatly increased, among other benefits. Nightingale, Dye & Massey (2017): Based of Nightingale & Dye 2015, which introduced Adaptive Semi-Linear Inversion, which uses an h-means clustering algorithm to derive source plane pixelization that adapts to the lens model magnification. Unlike previous adaptive schemes, the pixels are completely arbitrary in shape and are not forced to adhere to any prescribed geometric forms which may bias the lens reconstruction. [An h-means cluster = region in source plane to which a subset of image pixels is allocated.] This 2017 paper introduces AutoLens, the first entirely automated modelling suite for analysis of galaxy-scale strong grav. lenses.

My Simulations Lens Equation:
Lensing Theory & Surveys | What’s Been Done? | Simulations | ML & CNNs | Results | Summary & Future Work My Simulations Created using data from Lens Equation: Simulating Lensing: To do this, I first needed to simulate my own lensing images so that they are what the LSST (and Euclid telescope) would capture for strong lensing events. I also wanted to be able to quickly and easily control the parameters in the simulations (such as mass and light profile parameters) in order to test the CNN. Here is a useful schematic of lensing showing the basic distances and angles involved. The solid line showing the path of the light as it leaves from the source and is bent into the observer’s view by a lens in between them. The dashed line here indicates where we perceive the light coming from as a result. Because the depths of the lensing and source galaxies are much less than the distances to and between them, we can use the Thin Lens Approximation, in which we consider the galaxies as planes of matter [highlight], with their matter distributions described by a 2D surface density. To model how the light from the source plane is distorted by lensing as it reaches the lens plane, we use the Lens Equation [Appear on slide], written in terms of the angles, or as distances or pixels. It describes how the coordinates of, say, a pixel of light in the source plane (Y) is distorted to the new coordinates in the lens plane (X) by some deflection angle (Alpha). My Simulations: For the mass profile, I use the commonly used Singular Isothermal Ellipsoid (SIE) model, which has been found to fit strong lens profiles well, and allows me to vary the ellipticity of the lens. Likewise I use a Sersic Profile for the light of the source and lens, allowing the Sersic index to vary to better account for realistic variations in the light. I am currently using distributions of parameters as determined from real lenses in the SLACS papers, and incorporating Poisson & Read noise, Point Spread Functions, simulated Spectral Energy Distributions, and Gain values from LSST and Euclid where available. I’m also using the LSST’s Throughput Curves, shown here, which detail for each filter the fraction of photons that penetrate the atmosphere and are actually recorded by the telescope’s CCD, as a function of wavelength. [Deflection angle comes from the differential of the deflection potential (function of the 2D mass profile).]

Lensing Theory & Surveys | What’s Been Done
Lensing Theory & Surveys | What’s Been Done? | Simulations | ML & CNNs | Results | Summary & Future Work My Simulations My Simulations: Here are a few examples of the lenses my simulations are so far producing [detail on these] At the moment there is only a single foreground galaxy and a single source galaxy, with no extra galaxies, but more non-lensing galaxies can be added manually in the code. I currently use greyscale, but can create colour images, for up to the 6 filters of LSST.

Machine Learning and Convolutional Neural Networks
Lensing Theory & Surveys | What’s Been Done? | Simulations | ML & CNNs | Results | Summary & Future Work Machine Learning and Convolutional Neural Networks Credit: Cornell University Blogs IMAGE Input Convolution 1 Convolution 2 Pooling 2 Fully- Connected Output A CNN is trained using batches of images from your training catalogue, with each image fed in turn into the first layer. For a CNN, this is usually a convolutional layer, where a number of convolutional kernels are applied which scan across the image, performing the dot product between the kernel of weights and each small region of the image (called the “receptive field”). This generates a newly convolved image for each kernel (so for example if we used 4 kernels then we would get 4 images, as seen in the diagram). The matrix values that make up each kernel can change as the network is trained. Pooling layers are often used in CNNs, but cannot be trained. These layers simply act to reduce the size of the images, in other words downsampling. For example, I use max-pooling layers that find the maximum values in each area of an image and pass these on to the next layer. Pooling layers are important as by reducing the size they also reduce the number of nodes and parameters, and hence the computational cost required for training. Reducing their size also means that further convolutional layers will identify more and more abstract features, so that the CNN considers both small and large scale variations. Convolutional and max-pooling layers make up the majority of a CNN, and are often repeated multiple times [See diagram]. But at the end of a CNN, the last layers incorporated are Fully-Connected Layers. These are the simplest type of layer, in which all the nodes lie in one dimension as a vector, with each node in the layer connected to all nodes in the neighbouring layers. The last layer in the network is fully-connected, with a number of nodes equal to the number of outputs you want - for the case of classification shown here, the output is two nodes, giving the probabilities of an image belonging to each of two classes, such as whether it contains a lens or not. Alternatively, rather than the output being probabilities, I set the last layer to contain a number of nodes corresponding to the number of parameters I want it to estimate, which currently are the Einstein radius, the two components of complex ellipticity, and the mass model normalisation factor for the lens. The derivative of the error function wrt the weights is used to compute the error on each node from each image, and after each batch the errors on a weight are summed and used to adjust the weights - this is called the Backpropagation algorithm, as the error in a layer is used to find the error in the preceding layer. How the sum of the errors is used to alter weights is determined by specifying an Optimizer function. For a while now I have been using what’s called the Stochastic Gradient Descent Algorithm (SGD), which is quite a basic, but effective optimizer. [Adam is a combination of two other methods, which are both extensions to SGD.] Pooling 1

Lensing Theory & Surveys | What’s Been Done? | Simulations | ML & CNNs | Results | Summary & Future Work Machine Learning and Convolutional Neural Networks Complex Ellipticity: Mean Squared Error: Backpropagation: My CNN: For my network, I use: 6 convolutional layers 2 max-pooling layers A large fully-connected layer and a smaller FC output layer. Convolutional layers have manually-tunable hyperparameters such as the number and size of the kernels, which I have optimised, and the input images and parameters are all rescaled so the to improve performance. To train the network, the catalogue of images are fed into the network in batches, and you determine the error of your network by specifying an error function - in my case I am using the Mean Squared Error of the predicted parameters. This function is used to compute the error on each node through what is called the Backpropagation Algorithm, so that the network knows which nodes need to be modified. Also key to this is specifying an Optimizer function, which tells the network how it should modify these nodes. Recently I switched the optimizer I was using, and this gave a notable change in performance, which we’ll see on the next slide. After training on the whole catalogue of images, the accuracy is checked by quickly testing on a separate validation catalogue to ensure that the relationships it has learned can be generalised to images outside the training catalogue. This validation marks the end of a training Epoch - network training involves multiple epochs, and after each one the total error is calculated, and this (hopefully) will decrease exponentially with the number of epochs.

Results - Parameter Accuracies
Lensing Theory & Surveys | What’s Been Done? | Simulations | ML & CNNs | Results | Summary & Future Work Results - Parameter Accuracies [Carrying on from previous slide] This is shown in the left graph, which shows the mean square error of the network during training as a function of epoch, with the y-axis having a logarithmic scale. The graph shows results from three different CNN configurations, and each has two lines plotted - one for the training catalogue and one (noisier one) for the validation catalogue. The first (red) are the results I obtained about a month ago - clearly the error is quite high. Since then I have updated the network, notably by increasing its width, i.e. doubling the number of filters that each convolutional layer applies, and changing the optimizer from the standard Stochastic Gradient Descent Algorithm (SGD) to the more advanced NAdam. This is shown as the pink line. As you can see, this greatly improved the results, with the third (blue) results being that same network but continuing for another 150 epochs, having initialised the weights to be the same as they were at the end of the first 150 epochs. [Other graphs] The graph in the top-right shows each’s test results for one of the parameters it was estimating - in this case the first of the two components of complex ellipticity - showing a scatterplot of the CNN’s predicted values for each image against their true values, with them having offsets on the x-axis to better see them. Hence points closer to the 45-degree line are more accurate, with the numbers in bold indicating the Likewise, the corresponding results of the conversion back to normal ellipticity are shown in the bottom-right graph. You can clearly see how the estimated values get closer and closer to the true values with each network, with the RMS error of each parameter decreasing by roughly a factor of 10 each time.

Results - Parameter Accuracies
Lensing Theory & Surveys | What’s Been Done? | Simulations | ML & CNNs | Results | Summary & Future Work Results - Parameter Accuracies Here I tested how the CNN performed for changes in the simulations, with: Magenta (New): Same as magenta before, using NAdam optimizer and wider convolutional layers. Green (Including LSST PSF) Teal (Including Lower SNR (Nvisits=1), including Sky background) Blue (Including Both of the above)

Results - With and Without the Lens Light
Lensing Theory & Surveys | What’s Been Done? | Simulations | ML & CNNs | Results | Summary & Future Work Results - With and Without the Lens Light MSE vs Epochs Values of e₁ As part of investigating how best the CNN trains, I wanted to see whether the CNN was getting its mass model parameters from just the Einstein rings and arcs, rather than the image as a whole, and to check how the accuracy would be affected depending on the correlation between the lens’ light and mass profiles, in terms of how similar their ellipticity and orientation are, and to ultimately seeing whether there was some benefit to removing the foreground light. Again we have here graphs of training error vs epoch, and predicted vs true values of the first component of complex ellipticity. Here, I trained the CNN on image catalogues where the lens’ light and mass profiles had exactly the same ellipticity and orientation (red), and where these were completely randomised (pink). There was some difference in training, with the matching parameters giving a better accuracy, and this is investigated further on the next slide. I also tested removing the lens light completely, trying this for both complete Einstein rings (blue) and just lensing arcs (green). The results show that not having the lens light gave the lowest error, doing significantly better than when the lens was present.

Results - With and Without the Lens Light
Lensing Theory & Surveys | What’s Been Done? | Simulations | ML & CNNs | Results | Summary & Future Work Results - With and Without the Lens Light MSE vs Epochs Here I wanted to test how the CNN’s accuracy varied, if at all, with the correlation between the light and mass profiles of the lens. Varying the “correlation” between light and mass profiles - the light profile parameters were set to be that of the mass, but offset by amounts determined from normal distributions, each with a given standard deviation, so the larger the deviation the less correlated the two profiles were. The graph shows a clear trend between ellipticity rms error and the orientation correlation - clearly the CNN finds it easier to determine the ellipticity of the lens when the light profile is orientated similar to the mass profile. It may actually find it easiest when the light is present and exactly matching the mass profile, than not having the light! However, in reality this is not often the case, so it is still better to not include the lens light.

Lensing Theory & Surveys | What’s Been Done
Lensing Theory & Surveys | What’s Been Done? | Simulations | ML & CNNs | Results | Summary & Future Work Summary Can use lensing to study distant galaxies, with upcoming surveys expected to produce many thousands of lensing images. Created my own CNN to estimate the mass model parameters of lenses. Simulated my own images for network training, using SLACS and LSST data. In testing the CNN, I found that: increasing the width of the network and changing the optimizer greatly increased accuracy, larger parameter errors can arise depending on the correlation between lens light and mass profiles, the CNN achieves higher accuracy for images without their lens light, the CNN works extremely well for low-noise, low-PSF images, but is less accurate when both are increased to LSST levels.

Future Work Any Questions? Compare simulated LSST and Euclid images
Lensing Theory & Surveys | What’s Been Done? | Simulations | ML & CNNs | Results | Summary & Future Work Future Work Compare simulated LSST and Euclid images Improve simulations, including external shear, other galaxies, using multi-band images and larger catalogues. Test my CNN on EAGLE1 hydrodynamical simulations Compare my CNN to Simon Dye’s parameter-fitting Semi-Linear Inversion technique2 Ultimately connect my CNN to AutoLens3 source reconstructor => highly efficient automated software for studying the properties of both lenses and sources. Questions and Answers: Why my own simulations? Want to simulate what the LSST (and Euclid telescope) would capture for strong lensing events. I also wanted to be able to quickly and easily control the parameters in the simulations (such as mass and light profile parameters) in order to test the CNN. Also want to be able to quickly produce my own catalogues due to the large numbers needed, not just for a catalogue, but for many catalogues for testing. Want code that specifically creates strong galaxy-galaxy lenses. What is the NAdam optimiser? A: Adam is a combination of two other methods (AdaGrad & RMSProp), which are both extensions to SGD, and NAdam is Adam with Nesterov Momentum (Accelerated momentum, where momentum means the optimiser includes the last weight update (Δw)). What are the units for your images? A: They are in Analog-to-Digital Units (ADU), where ADU = electrons/gain (Gain=0.34 electrons per ADU). External Shear? A: Not implemented yet, but I plan to. External shear comes from matter in the neighbourhood, and likewise you could add an additional mass sheet with uniform surface mass density. Can come from galaxies or clusters near the lens, or not near but near line of sight, and from perturbations from large scale structure. The addition of adding external convergence leads to the “Mass Sheet Degeneracy Problem”, as the only observable effect is to rescale the time delays by 1 - k_external. Why are LSST & Euclid better than current or other future surveys? A: LSST will conduct a survey over an enormous area of the (southern) sky (18000 square degrees (134x134deg)) down to a depth of r<24.5 for single images, r<27.8 for stacked images. It will cover this area every few nights (up to 15 TB of data per night), continuing over 10yrs to produce catalogues thousands of times larger than have ever been compiled. Camera has 10 square degree FOV, taking pairs of 15s exposures. Euclid is designed to study weak lensing amongst other things, using an optical camera with high resolution to make very accurate measurements of galaxy shapes. Size of Images? A: I chose to use 56x56 (or 57x57) pixels based off of simulating a variety of lenses of different sizes and redshifts to ensure there was enough room for the Einstein ring (and lens light) to fit in the image, as well as some background to distinguish the light. The images are “postage stamps” of the lenses that would be extracted from larger FOV images using, say, Source Extractor. What makes some images worse than others for the CNN’s predictions? A: Rounder lenses make it harder for orientation estimates, although they are obviously less important for rounder lenses. Larger PSFs and lower SNRs make it much harder. Larger rings can be harder due to them being rarer, which may be helped if I biased the catalogue. (Fine line between biasing the catalogue and just increasing the number of images). Other elliptical galaxy mass & light profiles? A: For light profiles, ellipticals are usually fitted using the Sersic Profile, with the de Vaucouleurs profile providing a good fit in most cases. Sometimes use multiple Sersic profiles, treating the core separately from halo. For luminous ellipticals, sometimes use a core-Sersic model where the core follows a power-law profile. For mass profiles, the Singular Isothermal Ellipsoid Model is often used, which describes mass distributions with flat rotation curves outside of the core. There is also the softened power law ellipsoid which is outdated due to not providing a cuspy mass distribution as seen in many ellipticals. For Dark Matter Halo distributions there is the NFW model. Any Questions? 1(Crain et al. 2015; Schaye et al. 2015), 2(Warren & Dye 2003), 3(Nightingale, Dye & Massey 2017)

James Pearson Simon Dye, Nan Li

Similar presentations

Presentation on theme: "James Pearson Simon Dye, Nan Li"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

James Pearson Simon Dye, Nan Li

Similar presentations

Presentation on theme: "James Pearson Simon Dye, Nan Li"— Presentation transcript:

Similar presentations

About project

Feedback