Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
ABSTRACT
We train a deep residual convolutional neural network (CNN) to predict the gas-phase
metallicity (Z) of galaxies derived from spectroscopic information (Z ≡ 12+log(O/H))
using only three-band gri images from the Sloan Digital Sky Survey. When trained and
tested on 128 × 128-pixel images, the root mean squared error (RMSE) of Zpred − Ztrue
is only 0.085 dex, vastly outperforming a trained random forest algorithm on the same
data set (RMSE = 0.130 dex). The amount of scatter in Zpred − Ztrue decreases with
increasing image resolution in an intuitive manner. We are able to use CNN-predicted
Zpred and independently measured stellar masses to recover a mass-metallicity relation
with 0.10 dex scatter. Because our predicted MZR shows no more scatter than the
empirical MZR, the difference between Zpred and Ztrue can not be due to purely
random error. This suggests that the CNN has learned a representation of the gas-
phase metallicity, from the optical imaging, beyond what is accessible with oxygen
spectral lines.
1 INTRODUCTION technique (e.g., LeCun et al. 1989), their recent increase in
popularity is driven by the widespread availability of afford-
Large-area sky surveys, both on-going and planned, are revo- able graphics processing units (GPUs), that can be used to
lutionizing our understanding of galaxy evolution. The Dark do general purpose, highly parallel computing. Also, unlike
Energy Survey (DES; The Dark Energy Survey Collabora- more “traditional” ML methods, neural networks excel at
tion 2005) and upcoming Large Synoptic Survey Telescope image classification and regression problems.
(LSST; LSST Dark Energy Science Collaboration 2012) will
scan vast swaths of the sky and create samples of galax- Inferring spectroscopic properties from the imaging
ies of unprecedented size. Spectroscopic follow-up of these taken as part of a large-area photometric survey is, at a ba-
samples will be instrumental in order to understand their sic level, an image regression problem. These problems are
properties. Previously, the Sloan Digital Sky Survey (SDSS; most readily solved by use of convolutions in multiple lay-
York et al. 2000) and its spectroscopic campaign enabled ers of the network (see, e.g., Krizhevsky et al. 2012). Con-
characterization of the mass-metallicity relation (hereafter volutional neural networks (CNNs, or convnets) efficiently
MZR; Tremonti et al. 2004) and the fundamental metallic- learn spatial relations in images whose features are about
ity relation, (hereafter FMR; e.g., Mannucci et al. 2010). As the same sizes as the convolution filters (or kernels) that are
future surveys are accompanied by larger data sets, indi- to be learned through training. CNNs are considered deep
vidual spectroscopic follow-up observations will become in- when the number of convolutional layers is large. Visualizing
creasingly impractical. their filters reveals that increased depth permits the network
Fortunately, the large imaging data sets to be pro- to learn more and more abstract features (e.g., from Gabor
duced are ripe for application of machine learing (ML) meth- filters, to geometric shapes, to faces; Zeiler & Fergus 2014).
ods. ML is already showing promise in studies of galaxy
morphology (e.g., Dieleman et al. 2015; Huertas-Company In this work, we propose to use supervised ML by train-
et al. 2015; Beck et al. 2018; Dai & Tong 2018; Hocking ing CNNs to analyze pseudo-three color images and predict
et al. 2018), gravitational lensing (e.g., Hezaveh et al. 2017; the gas-phase metallicity. We use predicted metallicities to
Lanusse et al. 2018; Petrillo et al. 2017, 2018), galaxy clus- recover the empirical Tremonti et al. (2004) MZR. This pa-
ters (e.g., Ntampaka et al. 2015, 2017), star-galaxy sepa- per is organized as follows: In Section 2, we describe the
ration (e.g., Kim & Brunner 2017), creating mock galaxy acquisition and cleaning of the SDSS data sample. In Sec-
catalogs (e.g., Xu et al. 2013), and asteroid identification tion 3, we discuss selection of the network’s hyperparameters
(e.g., Smirnov & Markov 2017), among many others. ML and outline training the of network. We present the main re-
methods utilizing neural networks have grown to prominence sults in Section 4 and discuss our findings in the context of
in recent years. While neural networks are a relatively old current literature. In Section 5, we interpret the MZR using
c 2018 The Authors
2 Wu and Boada
the metallicity predicted by our CNN. We summarize our function. We use the root mean squared error loss function:
key results and discuss possible future work in Section 6.
Unless otherwise noted, throughout this paper, we use p
a concordance cosmological model (ΩΛ = 0.7, Ωm = 0.3, RMSE ≡ h|ytrue − ypred |2 i, (1)
and H0 = 70 km s−1 Mpc−1 ), assume a Kroupa initial mass where ytrue is the “true” and ypred is the predicted value, and
function (Kroupa 2001), and use AB magnitudes (Oke 1974). y represents the target quantity. It is worth emphasizing
that the Ztrue is the observed metallicity estimated from
SDSS spectra using the R23 method, but for the purpose
of training the network, we label the observed metallicity as
2 DATA the true metallicity. We define Ztrue to be the 50th percentile
To create a large training sample, we select galaxies from metallicity estimated by Tremonti et al. (2004).
the Sloan Digital Sky Survey (SDSS; York et al. 2000) DR7 We randomly split our training sample of ∼ 96, 953 im-
MPA/JHU spectroscopic catalog (Kauffmann et al. 2003; ages into 80% (76,711) training and 20% (19,192) validation
Brinchmann et al. 2004; Tremonti et al. 2004; Salim et al. data sets, respectively. The test data set of 20,466 images
2007). The catalog provides spectroscopically derived prop- is isolated for now, and is not accessible to the CNN un-
erties such as stellar mass (M? ) and gas-phase metallicity til all training is completed. Images and Ztrue answers are
(Z) estimates (Tremonti et al. 2004). We select objects with given to the CNN in “batches” of 256 at a time, until the full
low reduced chi-squared of model fits (rChi2 < 2), and me- training data set has been used for training. Each full round
dian Z estimates available (oh_p50). We supplement the of training using all of the data is called an epoch, and we
data from the spectroscopic catalog with photometry in each compute the loss using the validation data set at the end of
of the five SDSS photometric bands (u, g, r, i, z), along with each epoch. We use gradient descent for each batch to adjust
associated errors from SDSS DR14 (Abolfathi et al. 2018). weight parameters, and each weight’s fractional contribution
We require that galaxies magnitudes are 10 < ugriz < of loss is determined by the backpropagation algorithm (Le-
25 mag, in order to avoid saturated and low signal-to-noise Cun et al. 1989), during which finite partial derivatives are
detections. We enforce a color cut, 0 < u − r < 6, in or- computed and propagated backwards through layers (i.e.,
der to avoid extremely blue or extremely red objects, and using the chain rule for derivatives).
require objects to have spectroscopic redshifts greater than We use a 34-layer residual CNN architecture (He et al.
z = 0.02 with low errors (zerr < 0.01). The median redshift 2015), initialized to weights pre-trained on the ImageNet
is 0.07 and the highest-redshift object has z = 0.38. We data set, which consists of 1.7 million images belonging to
also require that the r-band magnitude measured inside the 1000 categories of objects found on Earth (e.g., cats, horses,
Petrosian radius (petroMag_r; Petrosian 1976) be less than cars, or books; Russakovsky et al. 2014). The CNN is trained
18 mag, corresponding to the spectroscopic flux limit. With for a total of 10 epochs. For more details about the CNN ar-
these conditions we construct an initial sample of 142,182 chitecture, transfer learning, hyperparameter selection, data
objects (there are four objects with duplicate SDSS DR14 augmentation, and the training process, see the Appendix.
identifiers). We set aside 25,000 objects for later testing, and In total, our training process requires 25-30 minutes on our
use the rest for training and validation. GPU and uses under 2 GB of memory.
We create RGB image cutouts of each galaxy with the We evaluate predictions using the RMSE loss func-
SDSS cutout service1 , which converts gri bands to RGB tion, which approaches the standard deviation for Gaussian-
channels according to the algorithm described in Lupton distributed data. We also report the NMAD, or the normal
et al. (2004) (with modifications by the SDSS SkyServer median absolute deviation (e.g., Ilbert et al. 2009; Dahlen
team). Since images are not always available, we are left et al. 2013; Molino et al. 2017):
with 116,429 SDSS images with metallicity measurements,
NMAD(x) ≈ 1.4826 × median x − median(x) ,
(2)
including 20,466/25,000 of the test subsample. We create
128 × 128-pixel JPG images with a pixel scale of 0.00 296, where for a Gaussian-distributed x, the NMAD will also
which corresponds to 3800 ×3800 on the sky. We do not further approximate the standard deviation, σ. NMAD has the dis-
preprocess, clean, or filter the images before using them as tinct advantage in that it is insensitive to outliers and can
inputs to our CNN. be useful for measuring scatter. However, unlike the RMSE,
which quantifies the typical scatter distributed about a cen-
ter of zero, NMAD only describes the scatter around the
(potentially non-zero) median.
3 METHODOLOGY
Before the CNN can be asked to make predictions, it must
be trained to learn the relationships between the input data 4 RESULTS
(the images described above) and the desired output (metal-
licity). The CNN makes predictions using the input images, 4.1 Example predictions
and the error (or loss) is determined based on the differences In Figure 1, we show examples of 128 × 128 pixel gri SDSS
between true and predicted values. The CNN then updates images that are evaluated by the CNN. Rows (a) and (b)
its parameters, or weights, in a way that minimizes the loss depict the galaxies with lowest predicted and lowest true
metallicities, respectively. The CNN associates blue, edge-
on disk galaxies with low metallicities, and is generally ac-
1 http://skyserver.sdss.org/dr14/en/help/docs/api.aspx curate in its predictions. In rows (c) and (d), we show the
Lowest Zpred
(a)
Ztrue = 8.153 Ztrue = 8.212 Ztrue = 8.204 Ztrue = 7.997 Ztrue = 8.308
Zpred = 8.130 Zpred = 8.162 Zpred = 8.177 Zpred = 8.181 Zpred = 8.187
Lowest Ztrue
(b)
Ztrue = 7.896 Ztrue = 7.959 Ztrue = 7.976 Ztrue = 7.997 Ztrue = 8.010
Zpred = 8.211 Zpred = 8.211 Zpred = 8.467 Zpred = 8.181 Zpred = 8.647
Highest Zpred
(c)
Ztrue = 9.211 Ztrue = 9.325 Ztrue = 9.264 Ztrue = 9.095 Ztrue = 9.466
Zpred = 9.200 Zpred = 9.182 Zpred = 9.181 Zpred = 9.181 Zpred = 9.180
Highest Ztrue
(d)
Ztrue = 9.466 Ztrue = 9.461 Ztrue = 9.432 Ztrue = 9.406 Ztrue = 9.405
Zpred = 9.180 Zpred = 9.158 Zpred = 9.029 Zpred = 9.060 Zpred = 8.916
Most underpredicted
(e)
Ztrue = 9.405 Ztrue = 9.336 Ztrue = 8.932 Ztrue = 9.290 Ztrue = 8.870
Zpred = 8.916 Zpred = 8.888 Zpred = 8.495 Zpred = 8.859 Zpred = 8.456
Most overpredicted
(f)
Ztrue = 8.248 Ztrue = 8.363 Ztrue = 8.312 Ztrue = 8.010 Ztrue = 8.293
Zpred = 9.103 Zpred = 9.057 Zpred = 8.995 Zpred = 8.647 Zpred = 8.927
Random
(g)
Ztrue = 8.747 Ztrue = 9.130 Ztrue = 9.131 Ztrue = 8.768 Ztrue = 8.861
Zpred = 8.759 Zpred = 9.093 Zpred = 9.125 Zpred = 8.763 Zpred = 8.735
Figure 1. SDSS imaging with predicted and true metallicities from the test data set. Five examples are shown from each of the following
categories: (a) lowest predicted metallicity, (b) lowest true metallicity, (c) highest predicted metallicity, (d) highest true metallicity, (e)
most under-predicted metallicity, (f) most over-predicted metallicity, and (g) a set of randomly selected galaxies.
Scatter
NMAD = 0.0668
Number of galaxies
Number of galaxies
1500
9.2
1000
1000 500
9.0
500
Number of galaxies
0
0.2 0.0 0.2
Z
8.8
0
Zpred
8.00 8.25 8.50 8.75 9.00 9.25 10
Metallicity
8.6
Scatter
at low metallicity we find that the CNN makes makes over- 0.14
predictions.
A histogram of metallicity residuals is shown in the in- 0.12
set plot of the Figure 3 main panel. The ∆Z distribution is
characterized by an approximately normal distribution with
a heavy tail at large positive residuals; this heavy tail is likely
0.10
due to the systematic over-prediction for low-Ztrue galaxies.
There is also an overabundance of large negative ∆Z corre- 0.08
sponding to under-predictions for high Ztrue , although this
effect is smaller. 0.06
We now turn our attention to the upper panel of Fig- 12 22 42 82 162 322 642 1282
ure 3, which shows how the scatter varies with spectroscop- Image size (pixels)
ically derived metallicity. The RMSE scatter and outlier-
insensitive NMAD are both shown. Marker sizes are propor-
tional in area to the number of samples in each Ztrue bin, and
the horizontal lines are located at the average loss (RMSE Figure 4. The effects of image resolution on CNN performance.
or NMAD) for the full test data set. Red and violet circular markers indicate scatter in the residual
distribution (∆Z) measured using RMSE and NMAD, respec-
Predictions appear to be both accurate and low in scat-
tively. (Each point is analogous to the horizontal lines shown in
ter for galaxies with Ztrue ≈ 9.0, which is representative of
Figure 3.) We also show predictions from a random forest algo-
a typical metallicity in the SDSS sample. Where the predic- rithm as open triangle markers, and constant hZtrue i predictions
tions are systematically incorrect, we find that the RMSE as open square markers.
increases dramatically. However, the same is not true for the
NMAD; at Ztrue < 8.5, it asymptotes to ∼ 0.10 dex, even
though the running median is incorrect by approximately
the same amount! This discrepancy is because the NMAD
determines the scatter about the median and not ∆Z = 0,
the same images to 64×64, 32×32, · · · , 2×2, and even 1×1
and thus, this metric becomes somewhat unreliable when
pixels via re-binning. All images retain their three channels,
the binned samples do not have a median value close to
so the smallest 1 × 1 image is effectively the pixels in each of
zero. Fortunately, the global median of ∆Z is −0.006 dex,
the gri bands averaged together with the background and
or less than 10% of the RMSE, and thus the global NMAD
possible neighboring sources.
= 0.067 dex is representative of the outlier-insensitive scat-
In Figure 4, we show the effects of image resolution by
ter for the entire test data set.
measuring the global scatter in ∆Z using the RMSE and
This effect partly explains why the global NMAD
NMAD metrics (shown in red and violet circular markers, re-
(0.067 dex) is higher than the weighted average of the binned
spectively). Also shown is the scatter in ∆Z if we always pre-
NMAD (∼ 0.05 dex). Also, each binned NMAD is computed
dict the mean value of Ztrue over the data set (shown using a
using its local scatter, such that the outlier rejection crite-
square marker). This constant prediction effectively delivers
rion varies with Ztrue . To illustrate this effect with an ex-
the worst possible scatter, and the Tremonti et al. (2004)
ample: ∆Z ≈ 0.2 dex would be treated as an 3 σ outlier
systematic uncertainty in Ztrue of ∼ 0.03 dex yields the best
at Ztrue = 9.0, where the CNN is generally accurate, but
possible scatter. We find that both RMSE and NMAD de-
the same residual would not be rejected as an outlier using
crease with increasing resolution, as expected if morphology
NMAD for Ztrue = 8.5. Since the binned average NMAD
or color gradients are instrumental to predicting metallicity.
depends on choice of bin size, we do not include those re-
sults in our analysis and only focus on the global NMAD. There appears to be little improvement in scatter going
RMSE is a robust measure of both local and global scatter from 1 × 1 to 2 × 2 pixel images. 1 × 1 three-color images
(although it becomes biased high by outliers). contain similar information to three photometric data points
(although because background and neighboring pixels are
averaged in, they are less information-dense than photome-
try), which can be used to perform a crude spectral energy
4.4 Resolution effects
distribution (SED) fit. Therefore it unsurprising that the
Because our methodology is so computationally light, we can 1 × 1 CNN predictions perform so much better than the
run the same CNN training and test procedure on images baseline mean prediction. A 2 × 2 three-color image contains
scaled to different sizes in order to understand the effects four times as many pixels as a 1 × 1 image, but because
of image resolution. Our initial results use SDSS 3800 × 3800 the object is centered between all four pixels, information
cutouts resized to 128 × 128 pixels, and we now downsample is still averaged among all available pixels. Therefore, the
The physical interpretation of the MZR is that a 5.2 An unexpectedly strong CNN-predicted
galaxy’s stellar mass strongly correlates with its chemical en- mass-metallicity relation
richment. Proposed explanations of this relationship’s origin
include metal loss through blowout (see, e.g., Garnett 2002; The RMSE = 0.085 dex difference between the true and
Tremonti et al. 2004; Brooks et al. 2007; Davé et al. 2012), CNN-predicted metallicities can be interpreted in one of two
inflow of pristine gas Dalcanton et al. (2004), or a combi- ways: (1) the CNN is inaccurate, and Zpred deviates ran-
nation of the two (e.g., Lilly et al. 2013); however, see also domly from Ztrue , or (2) the CNN is labeling Zpred according
Sánchez et al. (2013). Although the exact physical process to some other hidden variable, and ∆Z residuals represent
responsible for the low (0.10 dex) scatter in the MZR is not non-random shifts in predictions based on this variable. If
known, its link to SFR via the FMR is clear, as star for- the first scenario were true, we would expect the random
mation leads to both metal enrichment of the interstellar residuals to increase the scatter of other known correlations
medium and stellar mass assembly. such as the MZR when predicted by the CNN. If the second
The FMR connects the instantaneous (∼ 10 Myr) SFR were true, we would expect the scatter of such correlations
with the gas-phase metallicity (∼ 1 Gyr timescales; see, e.g., to remain unchanged or shrink. We can therefore contrast
Leitner & Kravtsov 2011) and M? (i.e., the ∼ 13 Gyr inte- the MZR constructed from Zpred and from Ztrue in order to
grated SFR). Our CNN is better suited for predicting M? test these interpretations.
rather than SFR, using the gribands, which can only weakly In the main panel of Figure 6, we plot CNN-predicted
probe the blue light from young, massive stars. Therefore, metallicity versus true stellar mass. For comparison, we also
we expect the scatter in CNN predictions to be limited by overlay the Tremonti et al. (2004) MZR median relation
the MZR (with scatter σ ∼ 0.10 dex) rather than the FMR and its ±1 σ scatter (which is ∼ 0.10 dex). Their empirical
(σ ∼ 0.05 dex). It is possible that galaxy color and mor- median relation (solid black) matches our predicted MZR
phology, in tandem with CNN-predicted stellar mass, can median (solid red), and the lines marking observed scat-
be used to roughly estimate the SFR, but in this paper we ter (dashed black) match Zpred scatter as well (dashed red
will focus on only the MZR. and violet). Over the range 9.5 ≤ log(M?,true /M ) ≤ 10.5,
the RMSE scatter in Zpred appears to be even tighter than
the observed ±1 σ (dashed black). The same is true for the
NMAD, which is even lower over the same interval.
5.1 Predicting stellar mass
In the upper panel of Figure 6, we present the scatter
Since galaxy stellar mass is known to strongly correlate with in both predicted and Tremonti et al. (2004) MZR binned
metallicity, and is easier to predict (than, e.g., SFR) from gri by mass. We confirm that the CNN predicts a MZR that
imaging, we consider the possibility that the CNN is simply is at most equal in scatter than one constructed using the
Scatter
NMAD = 0.1985 NMAD = 0.1008
0.2 0.1
0.0 100 0.0 100
11.0 Predicting stellar mass 9.2 Using the empirical MZR
to predict metallicity
10.5 9.0
Number of galaxies
Number of galaxies
(M )
10.0 8.8
, pred
ZMZR
10 10
log M
9.5
8.6
true metallicity. The stellar mass bins for which our CNN We thus find more evidence that the CNN has learned
found tighter scatter than the empirical MZR are the same something from the SDSS gri imaging that is different from,
bins that happen to contain the greatest number of exam- but at least as powerful as, the MZR. One possible explina-
ples (9.5 ≤ log(M?,true /M ) ≤ 10.5); thus, the strong per- tion is that the CNN is measuring some version of metallicity
formance of our network at those masses may be due to a that is more fundamentally linked to the stellar mass, rather
wealth of training examples. If our data set were augmented than Zpred as derived from oxygen spectral lines. Another
to include additional low- and high-M?,true galaxies, then the possibility is that the MZR is a projection of a correlation
scatter in the predicted MZR could be even lower overall. between stellar mass, metallicity, and a third parameter, per-
The fact that a CNN trained on only gri imaging is haps one that is morphological in nature. If this is the case,
able to predict metallicity accurately enough to reproduce then the Tremonti et al. (2004) MZR represents a relation-
the MZR in terms of median and scatter is not trivial. The ship that is randomly distributed in the yet unknown third
error budget is very small: σ = 0.10 dex affords only, e.g., parameter, while our CNN would be able to stratify the
0.05 dex of scatter when SFR is a controlled parameter plus MZR according to this parameter (much as the FMR does
a 0.03 dex systematic scatter in Ztrue measurements, leav- with the SFR). We are unfortunately not able to identify
ing only ∼ 0.08 dex remaining for CNN systematics, as- any hidden parameter using the current CNN methodology,
suming that these errors are not correlated and are added but we plan to explore this topic in a future work.
in quadrature. This remaining error budget may barely be
able to accommodate our result of RMSE(∆Z) = 0.085. In-
terpreting the MZR scatter as the combination of intrinsic
FMR scatter, Ztrue systematics, and ∆Z systematics cannot 6 SUMMARY
be correct since it assumes that the CNN is recovering the We have trained a deep convolutional neural network (CNN)
FMR perfectly. As we have discussed previously, it is highly to predict galaxy gas-phase metallicity using only 128 × 128-
unlikely that the CNN is sensitive to the SFR, and therefore pixel, three-band (gri), JPG images obtained from SDSS.
cannot probe the MZR at individual values of the SFR. We characterize CNN performance by measuring scatter
If we assume that the error budget for the MZR is not in the residuals between predicted (Zpred ) and true (Ztrue )
determined by the FMR, then the error “floor” should be metallicities. Our conclusions are as follows:
0.10 dex. This is immediately exceeded, as we have found
RMSE ≈ 0.10 dex for the predicted MZR without ac- (i) By training for a half-hour on a GPU, the CNN can
counting for the fact that Zpred and Ztrue differ by RMSE predict metallicity well enough to achieve residuals charac-
= 0.085 dex! Consider the case in which all Ztrue values terized by RMSE = 0.085 dex (or outlier-insensitive NMAD
are shifted randomly by a Gaussian noise distribution with = 0.067 dex). These findings may be promising for future
σ = 0.085 dex. These shifted values should not be able to re- large spectroscopy-limited surveys such as LSST.
construct a correlation without introducing additional scat- (ii) We find that the residual scatter decreases in an ex-
ter unless the shifts were not arbitrary to begin with. pected way as resolution is increased, suggesting that the
Number of galaxies
partment of Energy Office of Science, and the Participating
Institutions. SDSS-IV acknowledges support and resources
8.8 from the Center for High-Performance Computing at the
Zpred
10
University of Utah. The SDSS web site is www.sdss.org.
SDSS-IV is managed by the Astrophysical Research
8.6 Consortium for the Participating Institutions of the SDSS
Median Collaboration including the Brazilian Participation Group,
±1 RMSE the Carnegie Institution for Science, Carnegie Mellon Uni-
8.4 ±1 NMAD versity, the Chilean Participation Group, the French Par-
Ztrue (Tremonti+04) ticipation Group, Harvard-Smithsonian Center for Astro-
±1 physics, Instituto de Astrofı́sica de Canarias, The Johns
8.2 1
8.5 9.0 9.5 10.0 10.5 11.0 Hopkins University, Kavli Institute for the Physics and
log M , true (M ) Mathematics of the Universe (IPMU) / University of Tokyo,
the Korean Participation Group, Lawrence Berkeley Na-
Figure 6. In the main panel, the predicted MZR comparing tional Laboratory, Leibniz Institut für Astrophysik Potsdam
true M? against CNN-predicted Zpred is shown in grayscale. The (AIP), Max-Planck-Institut für Astronomie (MPIA Heidel-
running median (solid red) and scatter (dashed red and violet) berg), Max-Planck-Institut für Astrophysik (MPA Garch-
are shown in 0.2 dex mass bins. For comparison, we also show the ing), Max-Planck-Institut für Extraterrestrische Physik
Tremonti et al. (2004) observed median and scatter (solid and (MPE), National Astronomical Observatories of China, New
dashed black lines, respectively), which are binned by 0.1 dex in Mexico State University, New York University, University
mass. In the upper panel, we show the scatter in the predicted of Notre Dame, Observatário Nacional / MCTI, The Ohio
and empirical MZR. The standard deviation of the scatter for State University, Pennsylvania State University, Shanghai
the empirical MZR is shown as a dashed black line, while the
Astronomical Observatory, United Kingdom Participation
red and violet circles respectively show RMSE and NMAD for
Group, Universidad Nacional Autónoma de México, Univer-
the predicted MZR. Marker sizes are proportional to the number
of galaxies in each stellar mass bin for the test data set. Global sity of Arizona, University of Colorado Boulder, University
scatter in the CNN-predicted MZR appears to be comparable to, of Oxford, University of Portsmouth, University of Utah,
or even lower than, scatter in the empirical MZR. University of Virginia, University of Washington, University
of Wisconsin, Vanderbilt University, and Yale University.
A5 Data augmentation
Nearly all neural networks benefit from larger training sam-
ples because they help prevent overfitting. Beyond the lo-
cal Universe, galaxies are seen at nearly random orienta-
tion; such invariance permits synthetic data to be generated
from rotations and flips of the training images (see, e.g.,
Simonyan & Zisserman 2014). Each image is fed into the
network along with four augmented versions, thus increas-
ing the total training sample by a factor of five.
This technique is called data augmentation, and is par-
ticularly helpful for the network to learn uncommon truth
values (e.g., in our case, very metal-poor or metal-rich galax-
ies). Each augmented image is fed-forward through the net-
work and gradient contributions are computed together as
part of the same batch. A similar process is applied to the
network during predictions, which is known as test-time aug-
mentation (TTA), whereby synthetic images are generated
according to the same rules applied to the training data
set. The CNN predicts an ensemble average over the aug-
mented images, which tends to further improve RMSE by
a few percent. We use the default hyperparameters in the
fastai library.