Structured variational inference for simulating populations of radio galaxies

Number of selected FRI and FRII sources selected from the FRICAT and ConFIG. 600 sources were selected from both data sets with 550 Train sources and 50 test sources.

	FRI	FRII	Total
FRICAT	219	–	219
ConFIG	50	390	440
Total # of sources	269	390	659
Total # of selected sources	264	336	600
Training	242	308	550
Test	22	28	50
Train augmented	87 120	110 880	198 000

	FRI	FRII	Total
FRICAT	219	–	219
ConFIG	50	390	440
Total # of sources	269	390	659
Total # of selected sources	264	336	600
Training	242	308	550
Test	22	28	50
Train augmented	87 120	110 880	198 000

Table 1.

Number of selected FRI and FRII sources selected from the FRICAT and ConFIG. 600 sources were selected from both data sets with 550 Train sources and 50 test sources.

	FRI	FRII	Total
FRICAT	219	–	219
ConFIG	50	390	440
Total # of sources	269	390	659
Total # of selected sources	264	336	600
Training	242	308	550
Test	22	28	50
Train augmented	87 120	110 880	198 000

	FRI	FRII	Total
FRICAT	219	–	219
ConFIG	50	390	440
Total # of sources	269	390	659
Total # of selected sources	264	336	600
Training	242	308	550
Test	22	28	50
Train augmented	87 120	110 880	198 000

3.4 Image pre-processing and data augmentation

An important procedure in machine learning is data pre-processing, which helps to maintain a homogeneous sample space (Aniyan & Thorat 2017). While human classifiers can easily deal with the background noise in images and classify objects oriented differently, ML classifiers perform badly if an attempt is made to classify radio sources without a proper pre-processing procedure. For CNNs, Aniyan & Thorat (2017) and Tang et al. (2019) have shown that the noise should first be clipped at 3-sigma level, the pixels values rescaled between 0 and 1 and finally the images augmented through flipping and rotation for the classifier to attain a good accuracy. Wu et al. (2018) applied a different pre-processing procedure, they performed a zero-centring of the pixel values followed by a rescaling of the source followed by a horizontal flip to attain good performance. Our image pre-processing makes use of the method used by Aniyan & Thorat (2017) and Tang et al. (2019). This was done in two phases. The first phase involves the processes covered by Tang et al. (2019) where the pre-processed data were made available through their git repository.¹ These data were reprocessed to adapt to our needs in the VAE. To augment the training data set the sources were rotated from 0 to 360 degrees in intervals of 1 degree. This resulted in a total of 198 000 sources in the training set with 87 120 FRI sources and 110 880 FRIIs. Table 1 summarizes this source distribution.

4 RADIO ASTRONOMY MORPHOLOGY INCEPTION SCORE

One of the main challenges of developing and implementing generative algorithms is that they are difficult to evaluate and compare. Compared to discriminative methods, where we can make use of metrics like accuracy, the F1-Score, the precision or the recall, generative algorithms do not have such direct measures for comparing different outcomes. While the loss can be used as an indication of the model performance, it cannot be used comparatively across different architectures or algorithms.

To deal with this issue, we make use of an adapted form of the inception score (Szegedy et al. 2014), a measure that can be used for comparison across different generative algorithms. This standardized method makes use of classifiers to quantify the generated image quality by evaluating the degree of uncertainty in its classification. The inception score was first introduced using the inception v1 model, trained on 1000 target classes of the Imagenet Large Scale Visual Recognition (ILSVRC) 2014 classification challenge (Russakovsky et al. 2015). The inception classifier was used to evaluate the uncertainty in image classification: images that were correctly generated were assigned to one of the 1000 classes and assigned a high probability, while those that were incorrectly generated were assigned to multiple classes with low probability. The inception score was then used as a measure of this difference.

To calculate the score, the inception model was applied to each generated image and the conditional probability, i.e. the probability that the image belongs to one class, p(y|x), was obtained. Using those probabilities, the entropy was calculated in order to evaluate the inception score, which is given by the KL divergence between p(y|x) and p(y):

$$\begin{eqnarray*} I_{\rm score} = \exp \left[D_{\rm KL}(p(y|x)||p(y) \right]. \end{eqnarray*}$$

(13)

Both p(y|x) and p(y) are evaluated from the generated data set: p(y|x) is the output of the classifier, i.e. the label distribution of an image where y is the set of labels and x is the image; p(y) is the marginal distribution of the labels, y, for the generated data set.

A high inception score implies a well performing generative algorithm and the highest inception score to date for the ILSVRC 2014 classification data set is 9.46, based on the de-noising diffusion probabilistic mode (Ho, Jain & Abbeel 2020).

Although we wish to use the concept of the inception score to evaluate radio source generation, the inception model itself was not trained using radio sources. To address this we make use of the existing radio source classifier architecture introduced in Tang et al. (2019), which uses an architecture similar to Aniyan & Thorat (2017). This classifier consists of five convolutional layers each with batch normalization and max pooling. The output of the final convolution layer is flattened and input into a fully connected network consisting of three layers. The output from the fully connected layers is then passed through a softmax function to obtain the predictions p(y|x).

This CNN was trained using the FRDEEP data set described in Section 3 using 390 sources, validated using 110 sources, and tested using 55 sources. The data were augmented and pre-processed using the method described in Section 3.4. We make use of the Adagrad optimizer with an initial learning rate of 0.001. The network was trained for 30 epochs and the training was stopped when the validation and training loss stabilized. Using this network, we are able to evaluate p(y|x) and calculate the RAdio Morphology Inception Score (RAMIS), our equivalent of the original inception score for radio astronomy.

4.1 Metric performance evaluation

To test the ability of the RAMIS to quantify the quality of generated images, we performed a number of tests, transforming the training images from FRDEEP in different ways and evaluating the RAMIS score for each transformation. To perform these tests, we make use of the non-augmented training data set from FRDEEP. These images once transformed were passed through the classifier model to obtain p(y|x) and evaluate the RAMIS score using equation (13).

The original images with no transformation applied had a RAMIS score of 1.4 when evaluated. We then performed the following transformations on those images, see Fig. 1: (i) blurring of the images using kernels of different sizes; (ii) addition of noise at varying levels; and (iii) cropping of the images.

Figure 1.

Experiments 1, 2, and 3: Experiment 1 involves the image blurring with kernel size (1,1) (No blurring) to (19,19). Experiment 2 involves the cropping of the images while experiment 3 involves the addition of random noise.

In the first experiment, we performed a Gaussian blurring and vary the size of the smoothing kernel from (1,1) to (19,19) pixels in steps of 2. This results in a decrease in the resolution of the images. The RAMIS score was evaluated for each kernel size.

An exponential drop in the RAMIS was observed from 1.4 to 1.05 where the non-transformed images with a kernel size of (1,1) resulted in a RAMIS of 1.4. The classifier confusion is similar to that obtained when observing at different resolving power, for example in Fig. 1 row 3 the core and the jets are clearly unresolved, which causes class confusion within the model and reduces the RAMIS.

The second experiment involved the addition of random noise to the image, see Fig. 1. We added random noise with varying amplitude between 0 and 0.011 in steps of 0.01 and evaluated the RAMIS at each stage. With no added noise the RAMIS score was 1.4, and a drastic drop in the score is observed from noise level 0.001 to 0.003. Above that noise threshold the classifier was strongly influenced by the noise and eventually randomly classifies the images.

The final experiment involved cropping the radio images to different sizes. In Section 3.4, where we cover the pre-processing process, we cropped the input images to 100 x 100 pixels, i.e. blanked an edge strip with a width of 25 pixels. This was done to prevent corner differences in the images when rotating. This process may remove information from some images, for example those that include low level features that might result from lobes or jets. To quantify this effect, the training images were cropped by blanking the edge with strips of width 0 to 75 pixels, at which point the image is completely blank. In a similar manner to the previous experiments, the RAMIS dropped exponentially as a function of strip width, converging towards 1.0 (worst score). At a strip width of 25 pixels, which corresponds to the pre-processing procedure used in this work, a RAMIS of 1.25 was measured.

This analysis is relevant when quantitatively evaluating the performance of VAEs or other generative algorithms. For example Ma et al. (2019a) cropped their input images to 40 x 40, corresponding to an edge blanking of 55 pixels. At that level the RAMIS is less than 1.05. This may be explained by the fact that many FRII lobes, which range in radial distance up to 75 pixels from the image centre, have been cut out of the images, causing only the core of the radio galaxy to be seen by the VAE. This will result in poorer performance when generating FRIIs.

5 NETWORK ARCHITECTURE AND IMPLEMENTATION

The unsupervised VAE used in this work consists of two fully connected neural networks: (i) the encoder and (ii) the decoder. While many VAE implementations make use of convolutional layers (e.g. Ma et al. 2018; Ralph et al. 2019; Spindler et al. 2020), in this work we choose to use a fully connected network. There are a number of known issues involved in the use of convolutional VAEs and these are described in more detail in Section 7. Consequently, although the use of convolutional layers may be addressed in future work, here we retain a fully connected architecture for simplicity.

The encoder takes in the 100 × 100 pixel input images that have been reshaped into a 10 000 × 1 vector using an input layer of the same dimension. We then make use of a funnel architecture that reduces the high dimensional inputs, x, to a lower latent dimension, z, where z ∈ R^d and x ∈ R^D with D > d, see Fig. 2. The first hidden layer has a soft-plus activation function and is fitted with a drop-out. The number of neurons are halved for each subsequent layer so as to create the tunnel architecture with the 2nd layer consisting of 2048 neurons, 3rd layer – 1024 neurons, 4th layer – 512 neurons, and 6th layer – 256 neurons. Each layer was fitted with a leaky ReLU. As compared with the widely used Rectified Linear Unit (ReLU) which is a non-linear activation function that allows positive outputs from the neuron to pass while zeroing any negative values given by the function:

$$\begin{eqnarray*} f(x)=\left\lbrace \begin{array}{@{}l@{\quad }l@{}}x & \text{if $x\gt 0$} \\ 0 & \text{if $x \le 0$}. \end{array}\right. \end{eqnarray*}$$

(14)

The leaky ReLU resolves the inability of the ReLU to map the negative values by introducing a small negative slope to any negative input by following the function:

$$\begin{eqnarray*} f(x)=\left\lbrace \begin{array}{@{}l@{\quad }l@{}}x & \text{if $x\gt 0$} \\ \alpha x & \text{if $x \le 0$} \end{array}\right., \end{eqnarray*}$$

(15)

where α = 0.001 in our work. All these hidden layers are fitted with a Leaky ReLU activation function and followed by a drop-out with p = 0.2. The final output layer of the encoder consists of two d dimensional layers where one layer outputs the mean parameters and the other layer outputs the variance parameter.

Figure 2.

The VAE neural network architecture: The encoder reduces the dimensions of the input images through the use of a funnel architecture that halves at every layer in the encoder and doubles for the decoder.

The decoder on the other hand converts the sampled z values into outputs with the same dimension as the encoder input. The decoder input takes the latent variable, z, that is sampled from the two Gaussians with parameters z_μ and |$z_{\sigma ^2}$|⁠. This sampling of the first layer is fitted with a Leaky ReLU and a drop-out. The network is then a mirrored version of the encoder where the 2nd layer has 256 neurons, 3rd layer – 2048 neurons, 4th – 1024 neurons, 5th – 2048 neurons, and finally 6th – 4056 neurons. These layers are each fitted with a leaky ReLU and drop-out. The final layer is followed by a sigmoid activation that bounds the output between 0 and 1.

The architecture of the encoder–decoder network is shown in Fig. 2 and detailed in Table 2.

Table 2.

VAE and CVAE architectures.

VAE Architecture				CVAE Architecture
Encoder		Decoder		Encoder		Decoder
Layer	Dimensions/	Layer	Dimensions/	Layer	Dimensions/	Layer	Dimensions/
	parameters		Parameters		Parameters		Parameters
Input Layer	10 000 × 1	Linear FC	d × 1	Input Layer	(10 000 + 2) × 1	Linear FC	(d + 2) × 1
Softplus	–	Leaky ReLU	–	Softplus	–	Leaky ReLU	–
Linear FC	4096 × 1	Dropout	p = 0.2	Linear FC	4096 × 1	Dropout	p = 0.2
Dropout	p = 0.2	Linear FC	256 × 1	Dropout	p = 0.2	Linear FC	256 × 1
Linear FC	2048 × 1	Leaky ReLU	–	Linear FC	2048 × 1	Leaky ReLU	–
Leaky ReLU	–	Dropout	p = 0.2	Leaky ReLU	–	Dropout	p = 0.2
Dropout	p = 0.2	Linear FC	512 × 1	Dropout	p = 0.2	Linear FC	512 × 1
Linear FC	1024 × 1	Leaky ReLU	–	Linear FC	1024 × 1	Leaky ReLU	–
Leaky ReLU	–	Dropout	p = 0.2	Leaky ReLU	–	Dropout	p = 0.2
Dropout	p = 0.2	Linear FC	1024 × 1	Dropout	p = 0.2	Linear FC	1024 × 1
Linear FC	512 × 1	Leaky ReLU	–	Linear FC	512 × 1	Leaky ReLU	–
Leaky ReLU	–	Dropout	p = 0.2	Leaky ReLU	–	Dropout	p = 0.2
Dropout	p = 0.2	Linear FC	2048 × 1	Dropout	p = 0.2	Linear FC	2048 × 1
Linear FC	256 × 1	Leaky ReLU	–	Linear FC	256 × 1	Leaky ReLU	–
Leaky ReLU	–	Dropout	p = 0.2	Leaky ReLU	–	Dropout	p = 0.2
Dropout	p = 0.2	Linear FC	4096 × 1	Dropout	p = 0.2	Linear FC	4096 × 1
\|$z_{\mu },z_{\sigma ^2}$\|	d × 1	Leaky ReLU	–	\|$z_{\mu },z_{\sigma ^2}$\|	d × 1	Leaky ReLU	–
		Dropout	p = 0.2			Dropout	p = 0.2
		Linear FC	10 000 × 1			Linear FC	10 000 × 1
		Sigmoid	–			Sigmoid	–

VAE Architecture				CVAE Architecture
Encoder		Decoder		Encoder		Decoder
Layer	Dimensions/	Layer	Dimensions/	Layer	Dimensions/	Layer	Dimensions/
	parameters		Parameters		Parameters		Parameters
Input Layer	10 000 × 1	Linear FC	d × 1	Input Layer	(10 000 + 2) × 1	Linear FC	(d + 2) × 1
Softplus	–	Leaky ReLU	–	Softplus	–	Leaky ReLU	–
Linear FC	4096 × 1	Dropout	p = 0.2	Linear FC	4096 × 1	Dropout	p = 0.2
Dropout	p = 0.2	Linear FC	256 × 1	Dropout	p = 0.2	Linear FC	256 × 1
Linear FC	2048 × 1	Leaky ReLU	–	Linear FC	2048 × 1	Leaky ReLU	–
Leaky ReLU	–	Dropout	p = 0.2	Leaky ReLU	–	Dropout	p = 0.2
Dropout	p = 0.2	Linear FC	512 × 1	Dropout	p = 0.2	Linear FC	512 × 1
Linear FC	1024 × 1	Leaky ReLU	–	Linear FC	1024 × 1	Leaky ReLU	–
Leaky ReLU	–	Dropout	p = 0.2	Leaky ReLU	–	Dropout	p = 0.2
Dropout	p = 0.2	Linear FC	1024 × 1	Dropout	p = 0.2	Linear FC	1024 × 1
Linear FC	512 × 1	Leaky ReLU	–	Linear FC	512 × 1	Leaky ReLU	–
Leaky ReLU	–	Dropout	p = 0.2	Leaky ReLU	–	Dropout	p = 0.2
Dropout	p = 0.2	Linear FC	2048 × 1	Dropout	p = 0.2	Linear FC	2048 × 1
Linear FC	256 × 1	Leaky ReLU	–	Linear FC	256 × 1	Leaky ReLU	–
Leaky ReLU	–	Dropout	p = 0.2	Leaky ReLU	–	Dropout	p = 0.2
Dropout	p = 0.2	Linear FC	4096 × 1	Dropout	p = 0.2	Linear FC	4096 × 1
\|$z_{\mu },z_{\sigma ^2}$\|	d × 1	Leaky ReLU	–	\|$z_{\mu },z_{\sigma ^2}$\|	d × 1	Leaky ReLU	–
		Dropout	p = 0.2			Dropout	p = 0.2
		Linear FC	10 000 × 1			Linear FC	10 000 × 1
		Sigmoid	–			Sigmoid	–

Table 2.

VAE and CVAE architectures.

VAE Architecture				CVAE Architecture
Encoder		Decoder		Encoder		Decoder
Layer	Dimensions/	Layer	Dimensions/	Layer	Dimensions/	Layer	Dimensions/
	parameters		Parameters		Parameters		Parameters
Input Layer	10 000 × 1	Linear FC	d × 1	Input Layer	(10 000 + 2) × 1	Linear FC	(d + 2) × 1
Softplus	–	Leaky ReLU	–	Softplus	–	Leaky ReLU	–
Linear FC	4096 × 1	Dropout	p = 0.2	Linear FC	4096 × 1	Dropout	p = 0.2
Dropout	p = 0.2	Linear FC	256 × 1	Dropout	p = 0.2	Linear FC	256 × 1
Linear FC	2048 × 1	Leaky ReLU	–	Linear FC	2048 × 1	Leaky ReLU	–
Leaky ReLU	–	Dropout	p = 0.2	Leaky ReLU	–	Dropout	p = 0.2
Dropout	p = 0.2	Linear FC	512 × 1	Dropout	p = 0.2	Linear FC	512 × 1
Linear FC	1024 × 1	Leaky ReLU	–	Linear FC	1024 × 1	Leaky ReLU	–
Leaky ReLU	–	Dropout	p = 0.2	Leaky ReLU	–	Dropout	p = 0.2
Dropout	p = 0.2	Linear FC	1024 × 1	Dropout	p = 0.2	Linear FC	1024 × 1
Linear FC	512 × 1	Leaky ReLU	–	Linear FC	512 × 1	Leaky ReLU	–
Leaky ReLU	–	Dropout	p = 0.2	Leaky ReLU	–	Dropout	p = 0.2
Dropout	p = 0.2	Linear FC	2048 × 1	Dropout	p = 0.2	Linear FC	2048 × 1
Linear FC	256 × 1	Leaky ReLU	–	Linear FC	256 × 1	Leaky ReLU	–
Leaky ReLU	–	Dropout	p = 0.2	Leaky ReLU	–	Dropout	p = 0.2
Dropout	p = 0.2	Linear FC	4096 × 1	Dropout	p = 0.2	Linear FC	4096 × 1
\|$z_{\mu },z_{\sigma ^2}$\|	d × 1	Leaky ReLU	–	\|$z_{\mu },z_{\sigma ^2}$\|	d × 1	Leaky ReLU	–
		Dropout	p = 0.2			Dropout	p = 0.2
		Linear FC	10 000 × 1			Linear FC	10 000 × 1
		Sigmoid	–			Sigmoid	–

VAE Architecture				CVAE Architecture
Encoder		Decoder		Encoder		Decoder
Layer	Dimensions/	Layer	Dimensions/	Layer	Dimensions/	Layer	Dimensions/
	parameters		Parameters		Parameters		Parameters
Input Layer	10 000 × 1	Linear FC	d × 1	Input Layer	(10 000 + 2) × 1	Linear FC	(d + 2) × 1
Softplus	–	Leaky ReLU	–	Softplus	–	Leaky ReLU	–
Linear FC	4096 × 1	Dropout	p = 0.2	Linear FC	4096 × 1	Dropout	p = 0.2
Dropout	p = 0.2	Linear FC	256 × 1	Dropout	p = 0.2	Linear FC	256 × 1
Linear FC	2048 × 1	Leaky ReLU	–	Linear FC	2048 × 1	Leaky ReLU	–
Leaky ReLU	–	Dropout	p = 0.2	Leaky ReLU	–	Dropout	p = 0.2
Dropout	p = 0.2	Linear FC	512 × 1	Dropout	p = 0.2	Linear FC	512 × 1
Linear FC	1024 × 1	Leaky ReLU	–	Linear FC	1024 × 1	Leaky ReLU	–
Leaky ReLU	–	Dropout	p = 0.2	Leaky ReLU	–	Dropout	p = 0.2
Dropout	p = 0.2	Linear FC	1024 × 1	Dropout	p = 0.2	Linear FC	1024 × 1
Linear FC	512 × 1	Leaky ReLU	–	Linear FC	512 × 1	Leaky ReLU	–
Leaky ReLU	–	Dropout	p = 0.2	Leaky ReLU	–	Dropout	p = 0.2
Dropout	p = 0.2	Linear FC	2048 × 1	Dropout	p = 0.2	Linear FC	2048 × 1
Linear FC	256 × 1	Leaky ReLU	–	Linear FC	256 × 1	Leaky ReLU	–
Leaky ReLU	–	Dropout	p = 0.2	Leaky ReLU	–	Dropout	p = 0.2
Dropout	p = 0.2	Linear FC	4096 × 1	Dropout	p = 0.2	Linear FC	4096 × 1
\|$z_{\mu },z_{\sigma ^2}$\|	d × 1	Leaky ReLU	–	\|$z_{\mu },z_{\sigma ^2}$\|	d × 1	Leaky ReLU	–
		Dropout	p = 0.2			Dropout	p = 0.2
		Linear FC	10 000 × 1			Linear FC	10 000 × 1
		Sigmoid	–			Sigmoid	–

For the declarative definition, our encoder and decoder were defined within the model and guide as

p_θ(z) = N(0, I)
p_θ(x|z) = H_θ(z), where H_θ(z) represents the encoder

The guide which was introduced in Section 2.1 is given as

|$q_{\phi }(z|x) = N(z_{\mu },z_{\sigma ^2})$|⁠, where z_μ = F_ϕ(x) and |$z_{\sigma ^2}=G_{\phi }(x)$|⁠, where F_ϕ and G_ϕ are the same neural network with the final output layer outputs z_μ, the mean parameters and |$z_{\sigma ^2}$|⁠, the variance parameters.

The VAE is trained by optimizing the guide to match the model so as to minimize the loss function derived in equation (8).

5.1 Conditional VAE

The conditional VAE is a variation on the unsupervised VAE that also takes in the labels on the data. The labels are the conditions that associate particular data samples and can be used to generate images based on specific conditions. We input these labels as one hot-vectors at two instances in the network: first at the input to the encoder along with the reshaped images, and secondly at input to the decoder along with the latent z for image reconstruction. The alterations to the unsupervised VAE are shown in Fig. 3 and Table 2.

Figure 3.

Modifications to the VAE for the CVAE model with two additional neurons at the input layer of the encoder and decoder. Labels are forwarded at both the encoder and decoder.

The model was also modified following the method outlined in Section 2.2, to accommodate this additional information. We alter the definition of the model to include a prior on the class, p(y) = cat(y|π), and we alter the likelihood, p_θ(x|z), to be p_θ(x|z, y) = Bernouilli(x|H_θ(z, y)), where H_θ represents the encoder. The variational approximation, or guide, remains unchanged with |$q_{\phi }(z|x) = N(z_{\mu },z_{\sigma ^2})$|⁠, where z_μ = F_ϕ(x, y) and |$z_{\sigma ^2}=G_{\phi }(x,y)$|⁠.

All networks used in this work were implemented using the Python probabilistic programming library pyro (Bingham et al. 2018; Phan, Pradhan & Jankowiak 2019).²

5.2 Learning rate search and training

The unsupervised VAE was first trained by performing a learning rate search. The learning rate search was performed for the different latent dimensions d = 2, 4, 8, 16, 32, and 64. This was done to identify the ideal learning rate that leads to the lowest train loss after 10 epochs. The VAE was trained using initial learning rates between 0.0005 and 0.0015 in steps of 0.0001. Such a procedure is crucial as the initial learning rate is often considered to be the most important hyperparameter (Smith 2017). Table 3 shows the ideal initial learning rate for each latent dimension d.

Table 3.

Selected initial learning rates, train loss, test loss, and RAMIS scores for VAE at different latent dimensions.

d	2	4	8	16	32	64
Initial LR(10⁻³)	1.07	0.96	1.04	0.94	0.88	0.94
Train loss	133.6	127.2	136.3	127.2	128.3	127.4
Test loss	125.7	121.0	127.6	120.6	121.2	121.1
RAMIS	1.17	1.10	1.12	1.16	1.18	1.13

d	2	4	8	16	32	64
Initial LR(10⁻³)	1.07	0.96	1.04	0.94	0.88	0.94
Train loss	133.6	127.2	136.3	127.2	128.3	127.4
Test loss	125.7	121.0	127.6	120.6	121.2	121.1
RAMIS	1.17	1.10	1.12	1.16	1.18	1.13

Table 3.

Selected initial learning rates, train loss, test loss, and RAMIS scores for VAE at different latent dimensions.

d	2	4	8	16	32	64
Initial LR(10⁻³)	1.07	0.96	1.04	0.94	0.88	0.94
Train loss	133.6	127.2	136.3	127.2	128.3	127.4
Test loss	125.7	121.0	127.6	120.6	121.2	121.1
RAMIS	1.17	1.10	1.12	1.16	1.18	1.13

d	2	4	8	16	32	64
Initial LR(10⁻³)	1.07	0.96	1.04	0.94	0.88	0.94
Train loss	133.6	127.2	136.3	127.2	128.3	127.4
Test loss	125.7	121.0	127.6	120.6	121.2	121.1
RAMIS	1.17	1.10	1.12	1.16	1.18	1.13

The identified initial learning rates were used to train the network for 8000 epochs. For all the latent dimensions, the loss converged towards a minimum showing an adequate convergence towards a minimized loss. However the minimum differs for the different latent dimensions. None of the networks overfitted the training data and the test loss remained stable after 6000 epochs. At d = 8, the test loss stabilizes at 130. For the other latent dimensions i.e. d = 2, 4, 16, 32, and 64, the test loss stabilizes at 120. This difference in test loss had an impact on the RAMIS score across the epochs. Table 3 and Fig. 4 shows the train loss, test loss, and RAMIS for the different latent dimensions.

Figure 4.

Test loss and RAMIS score of VAE for different latent dimensions.

A similar two phase procedure was adopted for the training of the Conditional VAE. We again made use of a learning rate search for the different latent dimensions d = 2, 4, 8, 16, 32, and 64. Using the identified learning rate, the CVAE was trained for each selected latent dimensions (as shown in Table 4). In parallel to the VAE, the initial learning rate search was done between 0.0005 and 0.0015 in steps of 0.0001. Once identified, the CVAE was trained for 5000 epochs, where the test loss, train loss, and RAMIS score was calculated every 10 epochs. In addition to those three metrics, we also generated 100 FRIs and 100 FRIIs which were classified using the classifier described in Section 4. This was done to find the class generation efficiency and to identify any class bias in our CVAE. Those results are shown in Table 4 and Fig. 5, the additional metrics were denoted as FRI_count and FRII_count are the respective fraction of generated FRI and FRII that were correctly classified.

$Test loss, RAMIS score, and FRII fraction of the CVAE for different latent dimensions.$

Figure 5.

Test loss, RAMIS score, and FRII fraction of the CVAE for different latent dimensions.

Table 4.

Selected initial learning rates, train loss, test loss, and RAMIS scores for VAE at different latent dimensions. FRI_count and FRII_count are the percentage of generated FRIs and FRII correctly classified by the CNN.

d	2	4	8	16	32	64
Initial LR(10⁻³)	0.90	1.08	0.90	0.94	0.88	0.96
Train loss	136.9	130.9	145.8	133.1	136.0	132.8
Test loss	129.1	125.0	136.7	124.3	127.3	125.1
RAMIS	1.18	1.14	1.18	1.11	1.17	1.10
FRI_count/per cent	57.5	75.5	53.4	37.1	65.0	75.9
FRII_count/per cent	37.2	31.8	51.5	61.7	35.6	24.7

d	2	4	8	16	32	64
Initial LR(10⁻³)	0.90	1.08	0.90	0.94	0.88	0.96
Train loss	136.9	130.9	145.8	133.1	136.0	132.8
Test loss	129.1	125.0	136.7	124.3	127.3	125.1
RAMIS	1.18	1.14	1.18	1.11	1.17	1.10
FRI_count/per cent	57.5	75.5	53.4	37.1	65.0	75.9
FRII_count/per cent	37.2	31.8	51.5	61.7	35.6	24.7

Table 4.

Selected initial learning rates, train loss, test loss, and RAMIS scores for VAE at different latent dimensions. FRI_count and FRII_count are the percentage of generated FRIs and FRII correctly classified by the CNN.

d	2	4	8	16	32	64
Initial LR(10⁻³)	0.90	1.08	0.90	0.94	0.88	0.96
Train loss	136.9	130.9	145.8	133.1	136.0	132.8
Test loss	129.1	125.0	136.7	124.3	127.3	125.1
RAMIS	1.18	1.14	1.18	1.11	1.17	1.10
FRI_count/per cent	57.5	75.5	53.4	37.1	65.0	75.9
FRII_count/per cent	37.2	31.8	51.5	61.7	35.6	24.7

d	2	4	8	16	32	64
Initial LR(10⁻³)	0.90	1.08	0.90	0.94	0.88	0.96
Train loss	136.9	130.9	145.8	133.1	136.0	132.8
Test loss	129.1	125.0	136.7	124.3	127.3	125.1
RAMIS	1.18	1.14	1.18	1.11	1.17	1.10
FRI_count/per cent	57.5	75.5	53.4	37.1	65.0	75.9
FRII_count/per cent	37.2	31.8	51.5	61.7	35.6	24.7

5.3 Summary and model selection

We make a model selection using the test loss and the RAMIS score as our main criteria. Table 3 shows the metrics for the VAE. As a selection criteria, we make use of the mean RAMIS and mean test loss as a benchmark. Any model with RAMIS larger than the mean RAMIS and with a test loss lower that the mean test loss was selected as being the best performing model. For the VAE we chose d = 32 at epoch = 8000 as the best performing model. For the CVAE, only one model conformed to our criteria at d = 32 with a test loss of 127.33 (which is lower than the mean 127.93) with a RAMIS of 1.17 (higher than the mean which is 1.15). We settled on d = 32 at epoch = 5000 for our selected CVAE model. Fig. 6 shows both the training curves for the VAE and CVAE. Samples of images generated by the trained models are presented in Appendix A.

Figure 6.

The VAE and CVAE training curves with the RAMIS curve.

6 RESULTS

6.1 Image reconstruction

To qualitatively understand the general ability of the unsupervised VAE model, a selection of images from the training set were fed into the encoder and reconstructed images generated from those latent coordinates were plotted for each latent dimension, see Fig. 7. As the generation is a stochastic process, it is not expected that the output images will appear identical to the corresponding input, however they should appear similar. In each case the image shown is generated at the minimum test loss and maximum RAMIS score for each model. All six latent dimensions were able to reconstruct the radio sources, however some dimensions reproduce images that have structures closer to those of the original images, for example at d = 4, d = 16, and d = 32 the VAE is able to reproduce the asymmetry of the sources: Source 3, which is asymmetric with one lobe brighter than the other, can partially be reconstructed with d = 4 and d = 16. The FRI and FRII division can also be reproduced correctly. Source 1, which is an FRII, is reproduced as a radio source with lobes brighter than the core; Source 7, which is an FRI, is reconstructed as a radio source with a bright core and low brightness lobes. However for some latent dimensions, such as d = 8, we observe that for sources like Source 5, the sources are reconstructed as triple sources while the input image is a double radio source. Another limitation of the system is its inability to reproduce bent structures: Source 2, which is a bent source, is reconstructed as a straight source. This is assumed to be a consequence of having only a small number of bent sources present in the training data. Finally, one of the major drawbacks of the VAE is the ‘blurriness’ of the generated images. This is a known constraint of VAEs (Boesen Lindbo Larsen et al. 2015) and impacts on the RAMIS score as we have seen in Section 4.1.

Figure 7.

Original and reconstructed images at different latent dimensions. The 1st row shows the original images from the FRDEEP training set. Rows 2 through 6 show the reconstructed images for d = 2, 4, 8, 16, 32, and 64.

Fig. 8 shows the reconstructed images from the unsupervised VAE with d = 32 for three different sources from the training set. We can see that the model initially learns the bright central core of the galaxy before learning the extended structure. The orientation of the source is only learned towards the end of the training process.

Figure 8.

Sources generated as a function of training epoch from a model with d = 32. The first column shows the original image from the training set that is used to define the point in latent space from which the synthetic images are generated.

For the conditional VAE, we can use an input image from the training data set to specify a point in latent space and then choose to generate synthetic images as either FRI or FRII.

6.2 Noise analysis

As covered in Section 3.4, the training and testing images have been sigma clipped. This resulted in all pixel values below a given threshold being set to zero and created a gap in pixel values between 0 and 0.0039. This is equivalent to approximately |$95 {{\ \rm per\ cent}}$| of the pixels being set to zero. Generating images with zero-valued pixels is difficult for machine learning algorithms and they instead assign an infinitely small value to these regions in order to mimic the zeroing process.

By saturating the generated images at the 95th pixel percentile and using a log-scale to visualize the data we can observe that artefacts are present in the generated images at a low level. These are illustrated in Fig. 9. These artefacts appear as concentric ring-like structures around the generated sources.

Figure 9.

Image of low level noise structures in generated images from a model with d = 32.

The distribution of pixel values in these regions varies between models with different latent dimensions and we suggest that this distribution can be used as an indicator for the performance of the VAE. While zeroing the pixels is computationally difficult, attaining very small values close to zero is an indicator of good network performance. This was evident for models that became stuck in local minima during the training process where the percentage of pixels with values <5 × 10⁻⁵ was significantly higher than those which converged to a global minimum.

6.3 d =2 latent space of the VAE

Although the lower dimensionality of the d = 2 latent space produces sub-optimal generated images, it does enable us to visualize the mapping of the latent variable, z, to the output images. Analysing the latent space can also be useful to visualize the VAEs’ ability to separate source characteristics in the 2D latent space based on their morphology. To do this, we sample points from the d = 2 latent space between −4.0 < z < 4.0 in steps of 0.4 along the two dimensions of the d = 2 model and output the images generated at each point. Fig. 10 shows the organization of the latent space.

Figure 10.

2D Latent mapping of the VAE.

It can be seen that there is a clear separation between point-like and extended sources in the latent space with point sources being generated at the origin of the latent space. This should be considered when generating sources from the VAE as latent points close to the origin should not be used as these would generate unresolved sources. We also note that moving around the latent space in a clockwise direction results in a change in the orientation of the source, whereby horizontally oriented sources lie towards the latent line z₁ = 0.

For latent points with z > 3, it can be seen that part of the structure of the generated radio source overlaps the maximum 100 × 100 pixel image extent and, while large extended sources are spread around the latent space, compact double radio sources are concentrated towards the bottom left-hand corner of the space. Towards the top left of the space we find uncentred sources. These sources appear to have one lobe centred at the centre of the image with the other lobe appearing on the lower left quarter of the image.

In the same manner as was illustrated for the d = 2 latent space of the MNIST data set (Kingma & Welling 2013, Appendix 2) we can see that the latent variable, z, transitions smoothly between different morphologies represented in the training set, even when those morphologies do not necessarily correspond to an equivalent physical continuum. Whilst this may align with observations of hybrid morphologies in some radio galaxies (e.g. Miraghaei & Best 2017; Mingo et al. 2019), it should be considered carefully when generating synthetic examples for data augmentation as the inclusion of too many intermediate morphologies may bias the model performance when applied to real data.

6.4 Class balance in synthetic source populations

In the case of the conditional VAE, the generated sources were used to evaluate an additional performance metric designed to measure the degree of class separation achieved by the conditional generator. This was calculated by using the classifier described in Section 4 to classify a population of synthetic sources generated by the model with an input specification of a 50:50 FRI:FRII class balance and a uniform random sampling of the latent space in each case. The distributions of classifications are shown in Table 4 for each different latent dimension. From Table 4, it can be seen that although a balanced sample was specified at the input to the generator, the resulting synthetic population was found to be imbalanced by the external classifier. More specifically all models were seen to produce an excess of FRI-type sources compared to FRIIs, with the exception of d = 16.

If, instead of using a uniform distribution, the latent space is sampled randomly from the prior distribution, |$\mathcal {N}(0,1)$|⁠, this behaviour is reversed and the synthetic population is found to be dominated by objects classified as FRII-type sources; however, we note that this is likely due to the fact that sources generated from the latent volume around the origin are predominantly compact, comprising unresolved and only marginally resolved objects, and the classifier is biased towards classifying these objects as FRII sources. This effect is also seen when sampling from the prior distribution over the latent space of the VAE. A selection of sources generated by randomly sampling from the prior distribution for the VAE is shown in Fig. A9.

6.5 CVAE – Class evaluation

To qualitatively evaluate the CVAE’s ability to produce sources with FRI and FRII characteristic, FRI and FRII sources originating from the same latent coordinates were generated The sources were generated from the model with d = 32 obtained at epoch = 5000. The cross-sectional profile across the principal axis of the FRI source was compared with that of the FRII. As such two versions (FRI and FRII) of the ‘same’ source could be generated and compared. Fig. 11 shows six selected sources where the cross-section along the principal axis have been plotted. For each pair of generated sources the main distinctive feature lied in the lobe brightness, generated FRII sources had lobes brighter than the core while for FRIs, the pixel intensity decreases as we move away from the core which in most cases are brighter than the lobe. This is inline with the definition of FRIs and FRIIs, and shows that CVAEs can generate sources with distinctive FRI/II morphological characteristics.

Figure 11.

Comparison of FRI and FRII sources. Column 1 shows the FRI sources, Column 2 the corresponding FRII source, and Column 3 shows the cross-sectional plot across the jet axis.

https://github.com/HongmingTang060313/FR-DEEP

7 DISCUSSION

Previous work on the generation of radio galaxy images has been undertaken by Ma et al. (2019a). In that work the authors used a conditional VAE that used convolutional layers in both the encoder and decoder, which is a significantly different architecture to the fully connected network presented here. They evaluated their network based on standard classification scores (precision, recall, F1-score) using the classifier defined in Ma et al. (2019b), which itself was trained on augmented images produced using a VAE. As already noted in Section 4.1, the data pre-processing in Ma et al. (2019a) involved clipping the training images to 40 × 40 pixels, which would have impacted significantly on the RAMIS evaluation measure defined in this work and which we suggest may have a disproportionately large effect on the generation of FRII galaxies, which were noted to achieve poorer performance metrics than FRIs in the work of Ma et al. (2018).

The range of latent space dimensionalities considered by Ma et al. (2019a) was significantly larger than in this work, with models up to d = 500 being trained. However their conclusion is inline with the results of this work that find a relatively low latent space dimensionality is preferred. Ma et al. (2018) do not explicitly state their preferred dimensionality, but from their Fig. 3 it appears to be in the same range as proposed here.

A further interesting observation in the work of Ma et al. (2019a) was that the generated images contained two artefacts that are not seen in the generated images from this work. The first of these was described as pseudo-structure, particularly in the generated FRII images, and the second was the presence of a grid structure, or pseudo-texture, overlying the images. Ma et al. (2019a) attributed this structure to a bias from the mean square error (MSE) loss used to construct those images, equivalent to equation (7). However, we suggest that it may be a consequence of the convolutional layers used in their network. Chequerboard artefacts in generative algorithms that employ convolutional and specifically deconvolutional layers are a known issue. These artefacts arise from kernel overlap in the deconvolution steps of the decoder (Odena, Dumoulin & Olah 2016). Although they can be minimized by the use of deconvolution layers with stride 1 this is typically only used in the final layer of a convolutional decoder and artefacts that have been produced in earlier layers with larger stride steps can still be present. Alternatively, such chequerboard effects can also be mitigated through the use of up-sampling (see e.g. Spindler et al. 2020). Other high frequency artefacts are also thought to be caused by the use of max-pooling to subsample the output from convolutional layers in the encoder (e.g. Hénaff & Simoncelli 2015); however, we note that Ma et al. (2019a) do not use max-pooling to subsample, employing an average pooling approach instead.

While the VAE and CVAE presented in this work can clearly be used to generate realistic radio sources, we note that there remain a number of limitations to this method that should be addressed in future work. The first of these is that VAEs and CVAEs tend to produce blurry images, and while an ideal generative system should generate radio sources with resolution similar to the training set, the resolution of the generated images produced here appears lower than that of the training set, i.e. FIRST. As we have demonstrated, the effect of this blurring will effect performance based on the RAMIS evaluation method introduced in this work. As an alternative to the MSE loss, Ma et al. (2019a) also used a pixel-wise cross entropy (PCE) loss, which they propose enabled finer structures to be generated in their output images. Another possibility for addressing this resolution issue is to introduce a discriminative network after the VAE identifying input images as real or fake. This approach is known as a VAE-GAN (Kawai et al. 2020).

A second limitation is that VAEs cannot reproduce the sigma clipping applied in data pre-processing. In principle this can be remedied by applying a post-processing sigma clipping to the output images, but future applications should also address the nature of the systematic artifacts that appear in this noise.

A final point of note is that the generator is biased towards the creation FRI radio galaxies. With the exception of models with a latent space dimensionality of d = 16, the CVAE creates a higher number of FRIs compared to FRIIs when the output images are passed through a classifier. Most of these misclassified FRIIs had an X-shaped morphology or were double–double sources. Compact FRIIs with smaller angular extent were correctly generated and classified while those with large angular extent were generated with unclear morphologies. This is similar to the performance mismatch seen by Ma et al. (2018) for their FRI/II populations.

8 CONCLUSIONS

In this work, we have demonstrated the use of generative machine learning methods to simulate realistic radio galaxies. We present results from both an unsupervised variational autoencoder and a conditional variational autoencoder. The networks were trained using sources from the FIRST radio survey and produce radio sources with FRI and FRII morphologies. Furthermore, we have presented a quantitative method for evaluating the performance of these generative models in radio astronomy, formulated as the radio morphology inception score (RAMIS).

Using both the RAMIS as a quantitative measure and by inspecting the radio sources, we found that VAEs could be used as a method for the generation of relativistic radio sources. We found that the lowest model loss was obtained at a latent dimension of d = 32 with a RAMIS of 1.175. However, we also found that the VAE could correctly construct asymmetry in the radio sources at the lower latent dimension, d = 4.

We also investigated the mapping of the latent space to output images, this was done by visualizing the generated sources plotted from different latent points in the d = 2 latent space. We identified a systematic distribution of morphologies in the latent space with extended radio sources separating themselves from point-like sources. We also investigated the class balance in the generative source population for the CVAE, illustrating the difference in outcomes when sampling from the latent space in different ways. We suggest that the implications of this investigation can be used if some control is needed over the generation of synthetic radio sources but caution that it also highlights the potential for bias when being used to augment data sets for training other models.

DATA AND SOURCE CODE AVAILABILITY

Trained models from this work are publicly available on Zenodo (DOI:10.5281/zenodo.4456165). All codes are available on github: https://github.com/joshen1307/RAGA. The FRDEEP data set used in this work is publicly available from Zenodo (DOI:10.5281/zenodo.4255826) and should be cited as Tang et al. (2019).

ACKNOWLEDGEMENTS

The authors thank the reviewer for their comments, which significantly improved this paper. The authors would like to thank Alex Shestapoloff and Brooks Paige from the Alan Turing Institute and Tingting Mu from the University of Manchester for early discussions that informed the course of this work. DJB gratefully acknowledges support from Science and Technology Facilities Council (STFC) and the Newton Fund through the Development in Africa through Radio Astronomy (DARA) Big Data program under grant ST/R001898/1. AMS gratefully acknowledges support from an Alan Turing Institute AI Fellowship EP/V030302/1. MB gratefully acknowledges support from the University of Manchester STFC Centre for Doctoral Training (CDT) in Data Intensive Science, grant number ST/P006795/1. FP gratefully acknowledges support from STFC and IBM through the iCASE studentship ST/P006795/1.

Footnotes

1

2

Code for this work is available on the git repository: https://github.com/joshen1307/RAGA

REFERENCES

Acuna

D.

,

2017

,

Unsupervised modeling of the movement of basketball players using a deep generative model

,

Paper only accessible online through

http://www.cs.toronto.edu/~davidj/projects/unsupervised_modeling_using_a_DGM.pdf

Alger

M. J.

et al. ,

2018

,

MNRAS

,

478

,

5547

10.1093/mnras/sty1308

Aniyan

A. K.

,

Thorat

K.

,

2017

,

ApJS

,

230

,

20

10.3847/1538-4365/aa7333

Banfield

J. K.

et al. ,

2015

,

MNRAS

,

453

,

2326

10.1093/mnras/stv1688

Bao

J.

,

Chen

D.

,

Wen

F.

,

Li

H.

,

Hua

G.

,

2017

,

IEEE International Conference on Computer Vision (ICCV)

,

2764

Becker

R. H.

,

White

R. L.

,

Helfand

D. J.

,

1994

, in

Crabtree

D. R.

,

Hanisch

R. J.

,

Barnes

J.

, eds,

ASP Conf. Ser. Vol. 61, Astronomical Data Analysis Software and Systems III

.

Astron. Soc. Pac

,

San Francisco

, p.

165

Best

P. N.

,

2009

,

Astron. Nachr.

,

330

,

184

10.1002/asna.200811152

10.1111/j.1365-2966.2012.20414.x

Best

P. N.

,

Heckman

T. M.

,

2012

,

MNRAS

,

421

,

1569

10.1051/0004-6361/201629287

Bhattacharyya

A.

,

Hanselmann

M.

,

Fritz

M.

,

Schiele

B.

,

Straehle

C.-N.

,

2019

,

preprint (arXiv:1908.09008)

Bingham

E.

et al. ,

2018

,

J. Mach. Learn. Res.

Blaschke

T.

,

Olivecrona

M.

,

Engkvist

O.

,

Bajorath

J.

,

Chen

H.

,

2018

,

Mol. inform.

,

37

,

1700123

Boesen Lindbo Larsen

A.

,

Kaae Sønderby

S.

,

Larochelle

H.

,

Winther

O.

,

2015

,

International conference on machine learning

,

1558

Bonaldi

A.

,

Braun

R.

,

2018

,

preprint (arXiv:1811.10454)

Capetti

A.

,

Massaro

F.

,

Baldi

R. D.

,

2017

,

A&A

,

598

,

A49

Condon

J. J.

,

Cotton

W. D.

,

Greisen

E. W.

,

Yin

Q. F.

,

Perley

R. A.

,

Taylor

G. B.

,

Broderick

J. J.

,

1998

,

AJ

,

115

,

1693

10.1086/300337

Croston

J. H.

,

Ineson

J.

,

Hardcastle

M. J.

,

2018

,

MNRAS

,

476

,

1614

10.1093/mnras/sty274

Dai

Z.

,

Damianou

A.

,

González

J.

,

Lawrence

N.

,

2015

, in

Bengio

Y.

,

LeCun

Y.

, eds,

4th International Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings

.

San Juan, Puerto Rico

Davidson

T. R.

,

Falorsi

L.

,

De Cao

N.

,

Kipf

T.

,

Tomczak

J. M.

,

2018

,

preprint (arXiv:1804.00891)

Denton

E.

,

Fergus

R.

,

2018

,

Proceedings of the 35th International Conference on Machine Learning

,

80

,

1174

Ekers

R. D.

,

Fanti

R.

,

Lari

C.

,

Parma

P.

,

1978

,

Nature

,

276

,

588

10.1038/276588a0

Elgammal

A.

,

Liu

B.

,

Elhoseiny

M.

,

Mazzone

M.

,

2017

, in

Goel

A.

,

Jordanous

A.

,

Pease

A.

, eds,

Proceedings of the 8th International Conference on Computational Creativity, ICCC 2017

.

Georgia Institute of Technology

Fanaroff

B. L.

,

Riley

J. M.

,

1974

,

MNRAS

,

167

,

31P

10.1093/mnras/167.1.31P

10.1111/j.1365-2966.2008.13792.x

Ge

X.

,

Goodwin

R. T.

,

Gregory

J. R.

,

Kirchain

R. E.

,

Maria

J.

,

Varshney

L. R.

,

2019

,

Proc. Int. Conf. Comput. Creativity (ICCC 2018)

,

preprint (arXiv:1905.08222)

Gendre

M. A.

,

Wall

J. V.

,

2008

,

MNRAS

,

390

,

819

10.1111/j.1365-2966.2010.16413.x

Gendre

M. A.

,

Best

P. N.

,

Wall

J. V.

,

2010

,

MNRAS

,

404

,

1719

Gendre

M. A.

,

Best

P. N.

,

Wall

J. V.

,

Ker

L. M.

,

2013

,

MNRAS

,

430

,

3086

10.1093/mnras/stt116

Gheller

C.

,

Vazza

F.

,

Bonafede

A.

,

2018

,

MNRAS

,

480

,

3749

10.1093/mnras/sty2102

Goodfellow

I. J.

,

Pouget-Abadie

J.

,

Mirza

M.

,

Xu

B.

,

Warde-Farley

D.

,

Ozair

S.

,

Courville

A.

,

Bengio

Y.

,

2014

,

Proceedings of the 27th International Conference on Neural Information Processing Systems

,

2

,

2672

Hénaff

O. J.

,

Simoncelli

E. P.

,

2015

, in

Bengio

Y.

,

LeCun

Y.

, eds,

4th International Conference on Learning Representations

.

San Juan, Puerto Rico

Ho

J.

,

Jain

A.

,

Abbeel

P.

,

2020

,

Advances in Neural Information Processing Systems

,

33

,

6840

Hurley-Walker

N.

et al. ,

2016

,

MNRAS

,

464

,

1146

10.1093/mnras/stw2337

10.1111/j.1365-2966.2007.12350.x

Jarvis

M. J.

et al. ,

2017

,

MeerKAT Science: On the Pathway to the SKA

.

Stellenbosch, South Africa

, p.

6

Kaiser

C. R.

,

Best

P. N.

,

2007

,

MNRAS

,

381

,

1548

Karras

T.

,

Laine

S.

,

Aittala

M.

,

Hellsten

J.

,

Lehtinen

J.

,

Aila

T.

,

2020

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

,

8110

Kawai

H.

,

Chen

J.

,

Ishwar

P.

,

Konrad

J.

,

2020

,

2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP)

,

1

Kim

Y.

,

Wiseman

S.

,

Miller

A. C.

,

Sontag

D.

,

Rush

A. M.

,

2018

,

Proceedings of the 35th International Conference on Machine Learning

,

80

,

2678

Kingma

D. P.

,

Welling

M.

,

2013

,

2nd International Conference on Learning Representations, {ICLR} 2014

,

Banff, AB, Canada

Kingma

D. P.

,

Rezende

D. J.

,

Mohamed

S.

,

Welling

M.

,

2014

,

Advances in Neural Information Processing Systems

,

27

:

Kullback

S.

,

Leibler

R. A.

,

1951

,

Ann. Math. Stat.

,

22

,

79

Ledlow

M. J.

,

Owen

F. N.

,

1996

,

AJ

,

112

,

9

10.1086/117985

Lu

J.

,

Deshpande

A.

,

Forsyth

D.

,

2016

,

preprint (arXiv:1612.00132)

Lukic

V.

,

Brüggen

M.

,

Banfield

J. K.

,

Wong

O. I.

,

Rudnick

L.

,

Norris

R. P.

,

Simmons

B.

,

2018

,

MNRAS

,

476

,

246

10.1093/mnras/sty163

Lukic

V.

,

Brüggen

M.

,

Mingo

B.

,

Croston

J. H.

,

Kasieczka

G.

,

Best

P. N.

,

2019

,

MNRAS

,

487

,

1729

10.1093/mnras/stz1289

Ma

Z.

,

Zhu

J.

,

Li

W.

,

Xu

H.

,

2018

, in

Baozong

Y.

, ed.,

2018 14th IEEE International Conference on Signal Processing (ICSP)

.

IEEE Press

, p.

522

Ma

Z.

,

Zhu

J.

,

Zhu

Y.

,

Xu

H.

,

2019a

, in

2019 15th International Conference on Computational Intelligence and Security (CIS)

, Cristina Ceballos,

IEEE Computer Society

,

Los Alamitos, CA, USA

, p.

151

10.1007/978-981-32-9563-6_20

Ma

Z.

,

Zhu

J.

,

Zhu

Y.

,

Xu

H.

,

2019b

,

Commun. Comput. Inform. Sci.

,

1071

,

191

Makhathini

S.

,

Jarvis

M.

,

Smirnov

O.

,

Heywood

I.

,

2015

, in

Advancing Astrophysics with the Square Kilometre Array (AASKA14)

, p.

81

Mikołajczyk

A.

,

Grochowski

M.

,

2018

, in

2018 International Interdisciplinary PhD Workshop (IIPhDW)

,

Szczecin

, p.

117

Mingo

B.

et al. ,

2019

,

MNRAS

,

488

,

2701

10.1093/mnras/stz1901

Miraghaei

H.

,

Best

P. N.

,

2017

,

MNRAS

,

466

,

4346

10.1093/mnras/stx007

Norris

R. P.

et al. ,

2011

,

Publ. Astron. Soc. Austr.

,

28

,

215

10.1071/AS11021

Odena

A.

,

Dumoulin

V.

,

Olah

C.

,

2016

,

Distill

.

10.23915/distill.00003

10.1088/0067-0049/218/1/14

Pagnoni

A.

,

Liu

K.

,

Li

S.

,

2018

,

preprint (arXiv:1812.04405)

Peterson

J. R.

et al. ,

2015

,

ApJS

,

218

,

14

Phan

D.

,

Pradhan

N.

,

Jankowiak

M.

,

2019

,

preprint (arXiv:1912.11554)

Polykovskiy

D.

et al. ,

2018

,

Mol. pharm.

,

15

,

4398

Ralph

N. O.

et al. ,

2019

,

PASP

,

131

,

108011

10.1088/1538-3873/ab213d

Ravanbakhsh

S.

,

Lanusse

F.

,

Mandelbaum

R.

,

Schneider

J.

,

Poczos

B.

,

2016

,

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence

,

1488

Razavi

A.

,

van den Oord

A.

,

Poole

B.

,

Vinyals

O.

,

2019

,

ICLR 2019

,

preprint (arXiv:1901.03416)

Regier

J.

,

Miller

A.

,

McAuliffe

J.

,

Adams

R.

,

Hoffman

M.

,

Lang

D.

,

Schlegel

D.

,

Prabhat

M.

,

2015

, in

Bach

F.

,

Blei

D.

, eds,

Proceedings of Machine Learning Research Vol. 37, Proceedings of the 32nd International Conference on Machine Learning

.

PMLR

,

Lille, France

, p.

2095

Available at: http://proceedings.mlr.press/v37/regier15.html

Ren

S.

,

He

K.

,

Girshick

R.

,

Sun

J.

,

2015

, in

Cortes

C.

,

Lawrence

N.

,

Lee

D.

,

Sugiyama

M.

,

Garnett

R.

, eds,

Advances in Neural Information Processing Systems, Vol. 28

.

Curran Associates, Inc

Rolinek

M.

,

Zietlow

D.

,

Martius

G.

,

2019

, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

,

IEEE

,

Seattle, WA, USA

, p.

12406

Rossi

P.

,

Bodo

G.

,

Capetti

A.

,

Massaglia

S.

,

2017

,

A&A

,

606

,

A57

Rumelhart

D. E.

,

Hinton

G. E.

,

Williams

R. J.

,

1985

,

Technical report, Learning internal representations by error propagation

.

California Univ San Diego La Jolla Inst for Cognitive Science

10.1007/s11263-015-0816-y

Russakovsky

O.

et al. ,

2015

,

Int. J. Comput. Vision

,

115

,

211

Ryu

J. J.

,

Choi

Y.

,

Kim

Y.-H.

,

El-Khamy

M.

,

Lee

J.

,

2019

,

preprint (arXiv:1905.10945)

Semeniuta

S.

,

Severyn

A.

,

Barth

E.

,

2017

,

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

.

Association for Computational Linguistics

10.18653/v1/D17-1066

Shao

H.

,

Yao

S.

,

Sun

D.

,

Zhang

A.

,

Liu

S.

,

Liu

D.

,

Wang

J.

,

Abdelzaher

T.

,

2020

,

Proceedings of the 37th International Conference on Machine Learning

,

119

,

8655

Shen

X.

,

Su

H.

,

Li

Y.

,

Li

W.

,

Niu

S.

,

Zhao

Y.

,

Aizawa

A.

,

Long

G.

,

2017

,

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics

,

2

,

504

10.18653/v1/P17-2080

10.1051/0004-6361/201833559

Shimwell

T. W.

et al. ,

2019

,

A&A

,

622

,

A1

Smith

L. N.

,

2017

, in

2017 IEEE Winter Conference on Applications of Computer Vision (WACV)

,

IEEE

,

Santa Rosa, California, USA

, p.

464

Sohn

K.

,

Lee

H.

,

Yan

X.

,

2015

, in

Advances in neural information processing systems

,

MIT Press

,

Montreal, Canada

, p.

3483

Spindler

A.

,

Geach

J. E.

,

Smith

M. J.

,

2020

,

MNRAS

,

502

,

985

10.1093/mnras/staa3670

Szegedy

C.

et al. ,

2015

,

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

,

1

Tang

H.

,

Scaife

A. M. M.

,

Leahy

J. P.

,

2019

,

MNRAS

,

488

,

3358

10.1093/mnras/stz1883

Tchekhovskoy

A.

,

Bromberg

O.

,

2016

,

MNRAS

,

461

,

L46

10.1093/mnrasl/slw064

Tolstikhin

I.

,

Bousquet

O.

,

Gelly

S.

,

Schoelkopf

B.

,

2017

,

International Conference on Learning Representations (ICLR),2018

.

Vancouver Convention Center, Vancouver CANADA

Villarreal Hernández

A. C.

,

Andernach

H.

,

2018

,

preprint (arXiv:1808.07178)

Wu

C.

et al. ,

2018

,

MNRAS

,

482

,

1211

10.1093/mnras/sty2646

Yi

K.

,

Guo

Y.

,

Fan

Y.

,

Hamann

J.

,

Wang

Y. G.

,

2020

,

IJCNN2020

,

preprint (arXiv:2001.11651)

Zhao

S.

,

Song

J.

,

Ermon

S.

,

2019

, in

Proceedings of the AAAI Conference on Artificial Intelligence

.

AAAI Press, Palo Alto, California USA

,

New York Hilton Midtown, New York, New York, USA

, p.

5885

APPENDIX A: IMAGES

We here present samples of generated sources using the trained VAE and CVAE.

Figs A1, A2, A3 and A4: Generated FRI sources from the CVAE that were classified as FRI by the CNN classifier.
Figs A5, A6, A7 and A8: Generated FRII sources from the CVAE that were classified as FRII by the CNN classifier.
Fig. A9: 100 generated sources from the unsupervised VAE for d = 32, where z is sampled randomly from the prior N(0, 1).

Figure A1.

Generated FRI sample set 1.

Figure A2.

Generated FRI sample set 2.

Figure A3.

Generated FRI sample set 3.

Figure A4.

Generated FRI sample set 4.

Figure A5.

Generated FRII sample set 1.

Figure A6.

Generated FRII sample set 2.

Figure A7.

Generated FRII sample set 3.

Figure A8.

Generated FRII sample set 4.

Figure A9.

Generated sources from the unsupervised VAE for d = 32, where z is sampled randomly from the prior N(0, 1).