A&A 617, A70 (2018) https://doi.org/10.1051/0004-6361/201832784 Astronomy & cESO 2018 Astrophysics The VIMOS Public Extragalactic Redshift Survey(VIPERS) The complexity of galaxy populations at 0.4 < z < 1.3 revealed with unsupervised machine-learning algorithms?
M. Siudek1,2,K. Ma ek2,A. Pollo2,3,T. Krakowski2,A.Iovino4,M. Scodeggio5,T. Moutard6,7,G. Zamorani8, L. Guzzo9,4,B. Garilli5,B.R. Granett4,9,M. Bolzonella8,S.delaTorre7,U. Abbas10,C. Adami7,D. Bottini5, A. Cappi8,11,O. Cucciati8,I. Davidzon7,8,P. Franzetti5,A. Fritz5,J. Krywult13,V.Le Brun7,O.Le Fèvre7, D. Maccagni5,F. Marulli12,14,8,M. Polletta5,15,16,L.A.M.Tasca7,R.Tojeiro17,D.Vergani8,A. Zanichelli18, S. Arnouts7,19,J. Bel20,E. Branchini21,22,23,J. Coupon24,G.De Lucia25,O. Ilbert7,C.P. Haines4, L. Moscardini12,14,8, andT.T. Takeuchi26 (Affiliations can be found after the references) Received7February 2018 / Accepted 30 May 2018 ABSTRACT Aims. Variousgalaxy classifcation schemeshave beendeveloped sofarto constrain the mainphysical processes regulatingevolutionofdifferent galaxy types.Inthe eraofa delugeof astrophysical informationand recent progressin machine learning,anew approachtogalaxy classifcation has become imperative. Methods. In this paper, we employ a Fisher Expectation-Maximization (FEM) unsupervised algorithm working in a parameter space of 12 rest-frame magnitudes and spectroscopic redshift. The model (DBk) and the number of classes (12) were established based on the joint analysis of standard statistical criteriaand confrmedbythe analysisofthegalaxy distribution with respecttoa numberof classesand their properties. This new approach allows us to classifygalaxies based on only their redshifts and ultraviolet to near-infrared (UV–NIR) spectral energy distributions. Results. The FEM unsupervised algorithm has automatically distinguished 12 classes: 11 classes of VIPERSgalaxies and an additional class of broad-line activegalactic nuclei(AGNs). Aftera frst broaddivision into blue, green, and red categories, we obtaineda further sub-division into: three red, three green, and fve bluegalaxy classes. The FEM classes follow thegalaxy sequence from the earliest to the latest types, which is refected in their colours (which are constructed from rest-frame magnitudes used in the classifcation procedure)but also their morphological, physical, and spectroscopic properties (not included in the classifcation scheme).We demonstrate that the members of each class share similar physical and spectral properties. In particular, we are able to fnd three different classes of red passivegalaxy populations. Thus, we demonstrate the potentialofan unsupervised approachtogalaxy classifcationandwe retrievethe complexityofgalaxy populationsat z ∼ 0.7, a task that usual, simpler, colour-based approaches cannot fulfl. Keywords. galaxies:evolution–galaxies: star formation–galaxies: stellar content 1. Introduction The problemof classifcationofgalaxies anddividing them into different types is as old as the notion of “extragalactic nebulae” (Hubble 1926). As Sandage et al. (1975), and more recently Buta(2011)andButa&Zhang (2011)point out, classifcation of objects is the frst step in the development of most sciences, andappliestogalaxy studiesnolessthantoanyfeldof research. Only once we fnd common features of studied objects and use ? Based on observations collected at the European Southern Observatory, Cerro Paranal, Chile, using the Very Large Telescope under programs 182.A–0886 and partly 070.A–9007. Also based on observations obtained with MegaPrime/MegaCam, a joint project of CFHT and CEA/DAPNIA, at the Canada–France–HawaiiTelescope (CFHT), which is operated by the National Research Council (NRC) of Canada, the Institut National des Sciences de l’Univers of the Centre National de la Recherche Scientifque (CNRS) of France, and the University of Hawaii. This work is based in part on data products produced at TERAPIX and the Canadian Astronomy Data Centre as part of the Canada– France–Hawaii Telescope Legacy Survey, a collaborative project of NRC and CNRS. The VIPERS web site is http://www.vipers.
inaf.it/
them to sort them into categories, do we obtain a starting point for the further analysis. Identifying similarities and differences between the selected groups allows us to thenbuild theoretical models, which can ultimately lead us to the global picture of physical mechanisms at the origin of their properties. Galaxies in the local Universe display a variety of shapes and structural properties. The main classifcation system still in use is the Hubble tuning fork diagram(Hubble 1926, 1936), with all the refnements introduced by Sandage (1961) and deVaucouleurs (1959), based on the morphological properties of galaxies (see van den Bergh 1998; Buta 2011, for a de-tailed discussion). In the modern context, we alternatively re-fer to continuity of types in the morphological parameter space, where numerous morphological features are taken into account (Lintott et al. 2008, 2011; Buta et al. 2010; Kartaltepe et al. 2015). The basic Hubble classifcation ofgalaxies into “early” and “late” types (and their subtypes) has survived because, among other reasons, these types correlate well with other propertiesofgalaxies, suchas colours, stellar content, neutralhydro-gen content and so on(Kennicutt 1992;Roberts&Haynes 1994; Buta et al. 1994;Strateva et al. 2001;Deng 2010;Moutard et al. 2016a). Article publishedby EDP Sciences A70, page1of 25 Indeed, many types of galaxy properties display bi-modal distributions: photometric parameters, such as colours (e.g. Bell et al. 2004; Balogh et al. 2004b; Baldry et al. 2006; Franzetti et al. 2007;Taylor et al. 2015), morphological param-eters like the Sérsic index (e.g. Sérsic 1963;Strateva et al. 2001; Driver et al. 2006; Krywult et al. 2017), the strength of spec-tral features (e.g. Balogh et al. 2004a; Kauffmann et al. 2003; Siudek et al. 2017)and so on. Therefore, these properties are often used as the basis for galaxy classifcation, especially at higher redshifts, z, where detailedgalaxy morphologies aredifficult to observe. In particular, colour–colour diagrams (e.g. the (NUV−r)–(r−K)diagram (hereafterNUVrK), NURrJ, BzK, NUViB, introduced/used by Arnouts et al. 2013; Bundy et al. 2010;Daddi et al. 2004;Cibinel et al. 2013,respectively) are of-tenusedforthe purposeofgalaxy classifcation.More refnedselection processes can be based on the multi-modality criterion, which selects red passivegalaxies, intermediate “greenvalley” objects,andbluestar-forminggalaxiesbasedontheir rest-frame colours, spectral parameters, or colour and colour-Sérsic index distributions simultaneously (e.g. Bell et al. 2004; Baldry et al. 2006;Franzetti et al. 2007;Bruce et al. 2014;Lange et al. 2015; Krywult et al. 2017; Haines et al. 2017). The bimodality cri-terion can be enriched by a variable cut in galaxy colours that evolves with redshift (Peng et al. 2010; Fritz et al. 2014; Moutard et al. 2016b;Siudek et al. 2017), as a non-evolving cut applied for high-redshiftgalaxies can result in the selection of the reddest and most luminous red-typegalaxies in one group anda mixtureof star-forming and less massive redgalaxiesin the second group. The methods presented above are powerful tools, but they are sensitive only to a few specifc properties. A disadvantage of the methods presented above is the small number of groups which can typically be obtained: selection based on bimodality of the distribution of a certain property or a set of correlated properties usually allows for selection of only two or three groups (blue star-forming galaxies – intermediate types – red passive galaxies). Some two-dimensional (2D) colour– colour diagrams, like the NUVrK, are used for a more detailed classifcation (e.g. Arnouts et al. 2013; Moutard et al. 2016a,b; Davidzon et al. 2016)but are still limited to a relatively small number of groups. Moreover, classifcations based on the standard 2D cuts suffer from multi-fold selection-effect problems.Forexample, the propertiesofredpassivegalaxies selectedusingdifferent criteria (photometry, morphology, and spectroscopy) differ from one se-lection to another (e.g. Renzini 2006;Moresco et al. 2013). Red passivegalaxy samples are mostlyaffectedbysomelevelof con-tamination from dust-reddenedgalaxies with relatively low levels of star formation activity that may strongly affect their mean properties. Moresco et al. (2013)showed that the selection of the purest sample of red passivegalaxies demands the combination of different criteria (in this case, morphological, spectroscopic and photometric information) confrming the necessity of mul-tidimensional approaches in order to avoid obtaining a biased sample of differentgalaxy types. Two-dimensional diagrams based on the fux ratios (or equivalent widths) of spectral lines can also be a powerful tool, forexample forAGN diagnostic and classifcation (e.g. Baldwin, Phillips & Terlevich “BPT” diagrams based on the ratios of “blue” and “red” lines: Baldwin et al. 1981; Lamareille 2010). The BPT diagram allows for separation of: (1) star-forming galaxies, (2) Seyferts, (3) low-ionisation nuclear emission-line regions (LINERS), and the two composite groups, which consist of: (4) star-forminggalaxies and Seyferts, and (5) star-forming galaxies and LINERS. However, it becomes clear that any classifcation based on a small numberof parameters,even carefully chosen,isfar too simple to refect the huge range of different cosmic objects. While classical methods of classifcation are still common and very useful, recent advancements in automatic machine learning have opened up new possibilities for the classifcation of distant sources. In principle, they allow us to operate in a multi-parameter space, combining all the available pieces of in-formation: photometric measurements, redshifts, spectral lines, and morphologies. In principle, such an approach can immensely improve thegalaxy classifcation acrossa wide redshift range. However, thereis alsoa riskof including too much redundant or indiscriminative information whichwould blur the fnal result or lead to the unjustifed subdivision of types. Ball&Brunner (2010)andFraix-Burnetetal. (2015)gavea comprehensivereviewof different methods for clustering objects into synthetic groups in astrophysics, showing that classifcation in multi-dimensional parameter space, backed by sophisticated multivariate statistical tools, leads to a selection of sources that is more accurate than, for example, the colour–colour method. In general, we can distinguish two main groups of algorithms: supervised and unsupervised learning algorithms. Briefy, supervised algorithms classify data into classes that have previously been defned and anticipated. The disadvantage of this method is the requirement to create a training sample apriori and, at the same time, no possibility to defne newclasses of objects. Unsupervised learning algorithms (such as those used in our analysis) search for clusters of objects characterised by some pattern in the data and try to discover new classifcation schemes without anyprior assumptions. The unsupervised algo-rithm fts the input vector data to a statistical model. The algo-rithm then tries to optimise the parameters of the model in iterative cycles to fnd the best ft to the data with an optimised number of classes. Once the defned satisfactory criteria are fulflled, the iterations are stopped. The best known unsupervised learning algorithms include:(a)expectation-maximisation(Bilmes 1998; hereafter EM) algorithms – used to deal with complexdata structures, for example, clusters; (b) k-means(Salman et al. 2011)– whose aim is to assign observations to clusters in which each ob-servation belongs to the nearest mean; and (c) hierarchical clustering(Balcanetal.2014)– treatingeachpointasa clusterand successively merging pairs of clusters recursively until all clusters are merged into one single group that contains all of the points. An overview of unsupervised approaches used in astronomy canbe foundin D’Abruscoetal. (2012). Supervised algorithms have already yielded clear achievements in the selection of different astronomical sources. However, this approach only allows us to reproduce standard classifcations, mostly based on optical colours, which is not optimal to extract all the relevant information from the data. Therefore, it is necessary to adopt unsupervised methods to efficiently ex-tract all the information encoded in the data. The applications of unsupervised machine-learning algorithms togalaxy classifcation have until now mainly been applied togalaxy spectra. In particular, Sánchez Almeida et al. (2010)used an unsupervised k-means cluster analysis algorithm to classify all spectra in the fnal Sloan DigitalSkySurveydata release7(SDSS/DR7). They identifed as manyas 17 different classesofgalaxies. Thiswould have been extremely challenging using classical methods due to the huge number of spectra(∼174k) to process. The classifcation was based on the multidimensional cuts in the space A70, page2of 25 M. Siudeket al.: The complexityofgalaxy populationsat z ∼ 0.7 of a mixture of features (emission/absorption lines, continuum, fuxes and errors) making use of 3849 measurements for each object. The selected classes are well separated in the colour se-quence and morphological groups. The spectroscopic templates obtained for each class can be used for redshift measurements (z < 0.25) as well as to trace morphological and spectroscopic changes in cosmic time. Principal component analysis (PCA) has been used to classify astronomical data based on broadband measurements or asa tool to clean spectra (e.g. Marchetti et al. 2013, 2017;Wild et al. 2014). Marchetti et al. (2013)used a PCA algorithm to classify27350 optical spectrain the redshift range0.4< z < 1.0collected by the VIPERS survey(Public Data Release 1, hereafter PDR1).The algorithm repairedpartsof VIPERS spectraaffected by noise or skyresiduals and reconstructedgaps in the spectra. Aclassifcation into four main classes (early, intermediate, late andstarburstgalaxies)was carriedout,basedonasetoforthogonal spectral templates and the three most signifcant components (eigen-coefficients) obtained for eachgalaxy. In this paper, we introduce a new methodofgalaxy classifcation via an unsupervised learning algorithm applied to the galaxies observed by the VIMOS Public Extragalactic Redshift Survey (VIPERS). The VIPERS survey acquired spectra for ∼105 galaxies. For each galaxy, both spectroscopic measurements (redshift, lines, fuxes) and photometric data are provided. This makes VIPERS a perfect dataset for unsupervised classifcation; it is large enough to separate many different classes on a statistically sound level, and, at the same time, all the wealth of the spectroscopic and photometric information can be used to construct the feature space, and later for the validation process. Moreover, previous analyses made on the VIPERS data provide us with additional parameters such as Sérsic indices andphysical properties, obtained by ftting the spectral energy distributions (SEDs) (stellar mass, star formation rate (SFR), etc.). All these additional measurements, even when not used for the classifcation itself, can serve for an a posteriori interpretation of physi-cal properties of different classes. Our method is based on the multidimensional space defned by the rest-frame luminosities measured in 12 bands and, additionally, spectroscopic redshift information. Theavailabilityof spectroscopic data for VIPERSgalaxies allowsustoverifyhowthe classes obtainedusingthe broadband rest-frame photometry are refected in the spectral properties of galaxies.We demonstrate that the classifcation based on our au-tomatic algorithm and confrmed by spectroscopic features pro-videsa homogeneousviewofdifferent classesofgalaxies which may be used as the starting point to analyse their evolutionary tracks leadingtothe formationof today’sgalaxy types. The paperisorganisedas follows.In Sect. 2,we describethe sample selection. Section 3gives anoverviewof the FisherEM methodology.InSect. 4,we presentthemain resultsand discuss theirphysical meaning.Asummaryis presentedin Sect. 5.We validatethe modelandthe numberof classesin AppendixA,and discuss the class membership probabilities in Appendix B.We compared FEM classifcation to a principal component analysis (PCA) schemein Appendix C,and relateFEM classesto Hubble typesgivenby Kennicutt(1992)in AppendixD. In our analysis, we used the free statistical environment software R31 with the FisherEM package4(Bouveyron&Brunet RCoreTeam (2013). R:Alanguage and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, http://www.R-project.org/
2012). Throughout the paper we use a cosmological framework assuming Ωm = 0.30, ΩΛ = 0.70, and H0= 70kms−1Mpc−1. 2. Data In this paper, we make use of the fnal galaxy sample from the VIMOS Public Extragalactic Redshift Survey2 (VIPERS, Scodeggio et al. 2018). VIPERS is a spectroscopic survey car-ried out with the VIMOS spectrograph(Le Fèvre et al. 2003) on the 8.2m ESOVery LargeTelescope (VLT) aimed at mea-suring redshifts for ∼100 000 galaxies in the redshift range 0.5–1.2. VIPERS covered an area of ∼23.5deg2 on the sky, ob-servinggalaxies brighter than iAB = 22.5at redshifts higher than 0.5(a pre-selectionin the(u−g)and(r−i)colour–colour plane was used to removegalaxies at lower redshifts).Adetailed de-scription of the survey can be found in Guzzo et al. (2014). The galaxy target sample was selected from optical photometric catalogues of the Canada–France–Hawaii Telescope Legacy Sur-veyWide (CFHTLS-Wide: Mellier et al. 2008; Goranova et al. 2009). The data reduction pipeline and redshift quality system are describedby Garillietal. (2014). 2.1. The VIPERS dataset The fnal data release provides spectroscopic measurements and photometric propertiesfor86775galaxies(Scodeggioetal. 2018). The associated photometric catalogue consists of mag-nitudes from the VIPERS Multi-Lamba Survey(Moutard et al. 2016a), combining CFHTLS T0007-based u, g, r, i, z photometry with GALEX FUV/NUV and WIRCam Ks-band observations, complemented where available by VISTA Z, Y, J, H, K photometry from the VIDEO survey(Jarvis et al. 2013). Physical parameters including absolute magnitudes, stel-lar masses, and SFRs for the VIPERS sample were obtained via SED ftting with the code LePhare (Arnouts et al. 2002; Ilbert et al. 2006). The whole multi-wavelength information available in the VIPERS felds (from UV to NIR) was used, ap-plying the Bruzual&Charlot (2003)models and three extinction laws. In addition, absolute magnitudes were computed using the nearest observed-frame band in order to minimise the dependence on models. The detailed description of the VIPERS data SED-ftting scheme that we adopted in the present analysis can be found in Moutard et al. (2016b). In this work, we make use of the subset of galaxies with highly secure redshift measurements (with a confdence level higher than 99%, i.e. with redshift fag 3–4 and 13–14, see Garilli et al. 2014, for details). This subset contains 52 114 ob
jects(51522galaxiesand592 broad-lineAGNs3).They areobserved in the redshift range0.4< z < 1.34 with a mean (median) redshift of 0.7. 2.2. The multidimensional feature data Data preparationisakey issueinworking with learning algo-rithms, both supervised and unsupervised. In order to minimise any biases, maximise homogeneity in the input data, and use all of the available information, 12 rest-frame magnitudes are 2 See http://vipers.inaf.it
3 Broad-lineAGNs were classifedby VIPERS team members accord-ing to visual inspection of spectra. In the following analysis, we refer to broad-lineAGNs as sources attributed witha redshift fag 13–14. 4 The 1 and 99 percentile range of redshift is given. The broad-line AGNs are observed up to the redshiftz ∼ 4.5. A70, page3of 25 chosen: FUV, NUV,u,g,r, i,z, B, V, J, H, and Ks derived from the SED ftting (see Sect. 2, and Moutard et al. 2016b), as well as the spectroscopic redshift(Scodeggioetal. 2018).Toavoid groupinggalaxies based on differences in their luminosities in-stead of differencesin their SEDs,the data were standarised.We normalised i-band to unity and transformed each absolute mag-nitudeby the normalisationfactor (the redshifts were not trans-formed as their values are already around unity). This allowed us to code the data into common numerical range preventing the algorithm from splitting our sample along anydirection withextended amplitudes. The normalised parameters, together with the spectroscopic redshift, are then used to create a multi-parameter space for the FEM algorithm. The spectroscopic redshift is included in the parameter space to make the classifcation sensitive to possibleevolutionary changes with cosmic time. The algorithm could identifyanevolving populationindifferent cosmic epochs as be-longing to physically different classes. Although this is not the case for the VIPERSgalaxies, where all FEM classes seem to be preserved throughout the redshift range probed by the survey (see Sect. 4), we did not want to exclude this option a priori. However, we verifed that if the spectroscopic redshift is not in-cluded in the parameter space, the FEM classifcation remains practically the same. Theglobalpictureofthe classifcationdoesnotsuffer signifcantlyifwereducethe featurespacebyone parameter(e.g.spectroscopic redshift). However, excluding each single feature has an impact on the ability of the algorithm to distinguish individual classes. Feature importance may be statistically determined by the analysis of the orientation of the discriminative subspace. The x-axisofahyperplane separating classesinalatent subspace is constructed with an 11-degree polynomial and each coefficient describes how important each feature is for the distinction of each group.Forexample,high coefficientsof thehyperplane be-tween red passive classes for FUV and NUV reveal their impor-tance in distinguishing those groups. Therefore, excluding FUV and NUV will result in discriminating ten classes with only one large red passive class leaving the remaining classes unchanged. The redundancyof selected features and their importance to dis-tinguisheach groupwillbe further discussedbyKrakowskietal. (in prep.). We note that the redundancyof the spectroscopic redshift re-veals a great potential for future photometric missions such as Euclid and LSST. In Siudek et al. (2018), we explore the po-tential use of photometric information solely to classifygalaxies and estimate their properties. Reliable photometric redshifts and 12 rest-frame magnitudes obtained by the SED-ftting with the photo-z scatter σ ∼ 0.03, and the outlier rate µ ∼ 2% ob-tained for the VIPERS sample, were used to verify how precisely the detailed classifcation could be reproduced if only photometric data were available. The confrmed accuracy in recreatinggalaxy classes: 92%, 84%, 96% for red, green, and blue classes, respectively, together with the ability to efficiently separate outliers (stars and broad-line AGNs) based only on photometric data, demonstrates the potential of our approach in future large cosmology missions to distinguish differentgalaxy classes at z > 0.5. 3. Method – Fisher EM Unsupervised learning algorithms are used to divide the data of a priori unknown properties into clusters. In this paper, we use the FEM(Bouveyron&Brunet 2012)algorithm, which is an extension of the EM algorithm. The main goal of both the EM and the FEM classifers is to maximise the best ft of the chosen statistical model describing the data by fnding the optimal parameters of this model. In the case of the FEM algorithm, the main assumption is that the data can be grouped into a common discriminative latent subspace which is modelled by the discriminant latent mixture (hereafter DLM) model(Bouveyron&Brunet 2012). This discriminative latent subspace is defned by linear combinations of the input data (la-tent variables; Bouveyron&Brunet-Saumard 2014). It is then optimised to maximise the separation between groups and min-imise their variance at the same time. The second assumption of theFEM algorithmisthatourdatacanbe separatedintoanapriori unknown number of groups, each described by a Gaussian profle in the multidimensional parameter space. The role of the FEM algorithm is to fnd the best ft of these multi-Gaussian pro-fles to the data, optimising both the number of the groups and their location in the parameter space. 3.1. The performance of the FEM algorithm Unsupervised learning algorithms start by assigning initial cluster (class) centres, thatis,galaxies representativeofagiven class. To select the optimal centre points, theyare iteratively changed by assigning either (1) random values, or (2) pre-defned values obtained from another simpler andfaster clustering algorithm. This is an essential step as classifcation algorithms yield different classes with each random initialisation, while we want to obtain fnal classifcation results that are as stable as possible. The randomised initialisation is fraught with the risk of fnding a local probability minimum, which results in the erroneous as-signment of objects to groups. In order to avoid such a situation, a random procedure for assigning initial values of function pa-rameters can be repeated several times, and then the model with the highest log-likelihood is selected. However, to achieve op-timal cluster centres, the number of random values needs to be equaltothe numberofgalaxies. The second approach described above is the one applied in our analysis; in particular, for the choice of the initial values, the k-means++ algorithm is used(Arthur&Vassilvitskii 2007) to obtain the optimal cluster centres. This algorithm starts from a random choice of cluster centres among the data points. It then estimates the distances of all data points from these centres, and based on a weighted probability proportional to these squared distances, it selects new centres. This procedure is repeated until the choice of centres does not change with the next realisation, i.e. the optimal centres are found. Each initialisation gives a different classifcation, and each run groups similargalaxies into clusters, and so, in principle, all of them provide valuable classifcations. The problem is then to select which classifcation is the best, i.e. which one should be chosen as the fnal classifcation. To overcome this issue, we run thek-means algorithm 15 times to fnd the optimal initial parameters. Moreover, this ensures that we obtain a representative classifcation, as we are able to recreate the divisions. As in Sánchez Almeida et al. (2010), the k-means algorithm could be used for classifcation purposes itself. However, it is not as sophisticated as FEM, as it demands a pre-defned number of clusters (classes) and it also suffers from the initialisation problem. Therefore, we used k-means as the frststepto optimisethe starting pointsforamoreadvanced tool. Once the starting points of the algorithm have been selected, the FEM algorithm is executed assuming that: (1) the input pa-rameters, magnitudes and redshift values, can be projected onto a latent discriminative subspace witha dimensionlower thanthe dimension(K)of the observed data, and (2) this subspace(K−1) A70, page4of 25 M. Siudeket al.: The complexityofgalaxy populationsat z ∼ 0.7 is sufficient to discriminate K classes. The algorithm then per-forms the E (expectation), F (Fisher criterion), and M (maximisation) steps described below that are repeated in each cycle. In step E, the algorithm calculates the complete log-likelihood, conditionally to the current value of the Gaussian mixture model. In practice, this means the calculation of the probability of each considered object belonging to the groups predefnedby the k-means++ algorithm. In step F, the DLM model chooses the subspace f in which the distances between groups are maximised and their internal scatter is minimised: (η1 − η2)2 f = , (1) σ2 + σ2 12 where η1 and η2 are the mean values of the centres of the analysed groups, and σ2 and σ2 are theirvariances(Fukunaga 1990). 12 The mean and variance are measured for each group in the ob-servation space. The algorithm searches for a linear transformation U, which projects the observation into a discriminative and low-dimensional subspace d, such that the linear transformation U of dimension p × d (where p is the dimension of the original space) aims to maximise a criterion that is large when the between-class covariance matrix (SB) is large and when the within-covariance matrix (SW) is small. Since the rank of SB is at most equal to K−1, where K is the number of classes, the dimension d of the discriminative subspace is therefore at most equal to K−1as well.For details, we refer to Sects. 2.4 and 3.1 in Bouveyron&Brunet (2012). Subsequently, in step M, the parameters of the multivariate Gaussian functions are optimised,bymaximising the conditional expectations of the complete log-likelihood, based on the values obtained in the previous steps (E+F). The algorithm then comes back to step E, now computing the probabilities for each object to belong to groups modifed in the last step M. This procedure is repeated until the algorithm converges ac-cording to the stopping criterion which is based on the difference between the likelihoods calculated in the last two steps. 3.2. DLM models for the FEM algorithm To perform the FEM analysis, it is necessary to choose a model and the number of groups. There exist different DLM models that have been created for different applications. Specifc models differ in the numbers of components and their parameters. The varietyofthese modelsthenallowsthemtoftintovarious situations. The 12 different DLM models are considered: DkBk, DkB, DBk, DB, AkjBk, AkjB, AkBk, AkBk, AjBk, AjB, ABk and AB. The main differences between them is in the number of free pa-rameterslefttobe estimated(Bouveyron&Brunet2012).Inthe primary model, DkBk, two components can be distinguished: Dk and Bk, where Dk is responsible for modelling the variance of the actual data (by parametrizing the variance of each class within the latent subspace), and Bk which models the variance of the noise (i.e. it parametrizes the variance of the class outside the latent subspace). The other models are infact submodels of DkBk in which certain parameters of the Dk and Bk components are assumed to be common between and/or within classes. For example, the DBk model assumes that the variance in a la-tent subspace is common to all classes, whereas the DkB model assumes that the variance outside the latent subspace is common across classes. The combination of these two constraints (common variance inside and outside the latent subspace to all Fig.
1.
NUVrK
diagrams of FEM classes 9–12. The optimal number of classeswasfoundtobe12.The errorbars correspondtothefrstandthe third quartile of thegalaxy colour distribution, while the two half axes of the ellipses correspond to the median absolute deviation. classes) results in the DB model. Therefore, these submodels are characterised by a lower number of parameters: if our thirteendimensional dataset is divided into 12 groups, the “main” DkBk model would be characterised by 1024 free parameters, while the DkB model would be characterised by 1013 parameters, the DBk model by 298 parameters, the DB model by 287 parameters, down to the simplest AB model with 222 free parameters. The number of free parameters needed is dictated by the com-plexity of the input data and the mathematical equations given in Bouveyron&Brunet (2012).Ahighly parametrised model re-quiring the estimation of a large number of free parameters is preferred for clustering of high-dimensional data.We refer the readerto Bouveyron&Brunet (2012)fora detailed description of the DLM family. Comparing the performance and convergence of different models, we fnd that the VIPERS data are best parametrised by the DBk model with 298 free parameters. 3.3. The selection of the optimal model and number of classes The number of classes is not known a priori, which is one of the major difficulties in applying unsupervised clustering algorithms to classify astronomical sources. Defning the optimal number andmodelisnottrivial.Wedonotmakeanyapriori assumptions aboutgalaxy separation,thatis,ifthedatacouldnotbe described by the DLM models, for example because of the non-Gaussian natureofthe datasets,theFEM algorithmsimplywouldnot con-verge. In our work, the best DLM model and the range of possible class numbers is chosen based on three statistical model-based criteria: the Akaike Information Criterion (AIC; Akaike 1974), the Bayesian Information Criterion (BIC; Schwarz 1978) and the Integrated Complete Likelihood (ICL; Baudry 2012, see AppendixA). These are typical criteria usedtoevaluate statisti-cal models (e.g. de Souza et al. 2017), which allow us to select the best model (DBk) and the approximate number of classes (9–12; see Appendix A). However, in order to pinpoint the di-versityofphysical properties among VIPERSgalaxies, the fnal optimal numberof classesis basedonthefowofthegalaxy dis-tribution among a different number of classes (see Fig. A.2)and theirphysical properties (see Fig. 1). A70, page5of 25 The analysis of the positions and properties of different classes on the NUVrK diagrams allows us to verify if the classes do indeed reveal distinctphysical properties. Figure 1shows the NUVr diagrams for the classifers consisting of 9, 10, 11 and 12 groups. As we can see in the fgure, the division into groups for a different number of classes differs, especially in the region of dustygalaxies indicatedby the shadedbox.We can see the emergence of three new classes (classes 5, 6, 8) in the twelvegroup division that were not distinguished by a lower number of clusters. Thephysical analysisof these classes (see Sect. 4) demonstrates that the classifer’s grasp of subtle differences be-tween groups reveals these classes of dusty star-forminggalaxies. Therefore, we fnd that division into 12 classes is physically motivatedandthisisalso confrmedbyanalysisofthefow chart as all 12 classes are naturally separated from bigger groups, in-cluding separating broad-line AGNs from class 9 in the tenth iteration (see Fig. A.2).We also found that with 13 classes, we obtain a worse classifcation, as the 13th class emerges from class11but does not representdifferentphysical properties with respect to the 11th class (see Appendix A). To summarise, using three statistical criteria: AIC, BIC, and ICL, we originally restricted the optimal number of classes to be between9and12. Afterthat,we checkedthefowofgalaxydistributions for realisations with different numbers of classes and theirphysical properties.We concluded that the optimal solution for classifcationofthe VIPERS datasetisa DBk model with12 classes (see Appendix A). 4. Results In this section, the FEM classifcation of z ∼ 0.4–1.3galaxies is presented.We demonstrate that the 12 classes correspond to physically different and separate galaxy categories. In the fol-lowing analysis, different properties of our classes are investigated to show that our classes actually mirror the sequence of galaxy types from the earliest (class 1) to the latest types (class 11) in the redshift range 0.4< z < 1.3. Classes 1–11 all have very similar redshift distributions (see Table 1), centred at z ∼ 0.7, suggesting that these classes are persistent at least over the redshift range 0.4< z < 1.3.A different median red-shift is measured within the 12th class. This class cannot be placed along the same sequence as the other classes. Class 12 mainly groups high-redshift VIPERS sources (with median red-shift zmed ∼ 2; see Table 1). Members of this group are mostly identifed as broad-line AGNs according to their redshift fag (see Sect. 2;∼95%, andTable 1). Therefore, class12is not part of thegalaxy population at z ∼ 0.7 that is the focus of this paper, and from now onwards only the frst 11galaxy classes will be discussed. The global properties of class 12 are presented in Table 1 and the composite spectrum is shown in Fig. D.2, but it is not included in the remaining plots. The SED ftting proce-dure used for VIPERS sources does not includeAGN templates. Therefore, theAGN host properties (stellar mass and SFRs, r−K colour, as K signifcantly depends on models) might be wrong. The classifcation was performed on the whole sample (i.e. in-cluding broad-lineAGNs,evenif they are not the focusof this paper) to demonstrate the global usefulness of the FEM algo-rithm and its ability to separate broad-lineAGNs andgalaxies. Although the algorithm was able to a separate a class of broadlineAGNs, only ∼50% of broad-lineAGNs at z > 1.3were as-signed to this separate class, while the other half were spread among the star-forming classes 9–11. The fraction of broad-line AGNs in these classes is however negligible(<5%galaxies in a given class). This approach allows us to reproduce common classifcation schemes, which do not explicitly exclude any groups of sources. It should be noted that although class 12 can be expected to be separated based on the use of spectroscopic redshift as an input parameter, even when the redshift is not in-cluded in the parameter space (i.e. classifcation is based only on rest-frame colours) class 12 is reproduced with an accuracyof the order of ∼80%. As mentioned in Sect. 1, standard selection methods are powerful tools,but arehowever sensitiveonlytoafew specifc properties.Weexploredhow sucha refned classifcation com-pares with more standard two-or three-class division ofgalaxy population. The FEM classifcation separates VIPERSgalaxies intoeleven classes, whichmaybe assignedto three widergalaxy categories: (1) red, passive, (2) green, intermediate, and (3) blue, star-forming. Since our classifcation was based on colours, the conventional nomenclature of red (classes 1–3), green (classes 4–6),andblue (classes7–11)galaxiesisapplied(seeFigs. 2a–c). As the subsequentanalysis demonstrates (Sects. 4.1 and 4.2),the division between red (passive), green (intermediate), and blue (star-forming)galaxies is not sharp, as the intermediate groups (classes3and7) are not purely passive or star-formingin terms of their global properties. Moreover,we note thataFEM classifcation into two or three main groups is not entirely unequivocal. Wecomparedourfnaleleven-class classifcationwithatwo-class FEM separation. The simple separation into two main clouds (red and blue) is able to distinguish a separate group of blue star-forminggalaxies: 97% ofgalaxies from classes 7–11 are assigned to the blue cloud and only 3% of green galaxies (classes 4–6) were found in the blue cloud. At the same time, red and greengalaxies are indistinguishable in the red cloud: 100% of redgalaxies (classes 1–3) were assigned to the red cloud, as were 97%of greengalaxies (classes 4–6). In the subsequent step, the standard three-class (red/green/blue) division is compared with the FEM 11class classifcation. As in the case of the two-class division, we are also not able to separate a red passive population from green galaxies. Almost all redgalaxies assigned to classes 1–3 (99%) were found in a red group. However, this group is strongly contaminatedby greengalaxies: 43% of intermediategalaxies (classes 4–6) were found in the red cloud. The distinction between green and bluegalaxies is also not obvious. Only 67% of blue star-forming galaxies (classes 7–11) were assigned to the blue cloud, while the remaining 33% were found within the green population. For the two-class separation, red and greengalaxies go to-gether to form one group only, while for the three-class division, the green populationis split between red and bluegalaxies. This implies that the borderlines between green/blue and red/green populations are much less sharp than that for the eleven-class division. Only a more detailed classifcation can appropriately yield the division between red, green, and blue populations. The FEM classifcation yielded distinct clusters in the thirteen-dimensional space, although the separation between classes is smooth. Some galaxies are close to the borders of different classes, and this is refected in their lower posterior probabilities of being members of the class to which they are assigned. The posterior probability is correlated with the dis-tance of the sources from the centre of the group in multidimensional space. There is no correlation of probabilities with the properties of the input data, that is, no dependence of the probability on the redshift measurement accuracy or luminosity was found.We assume that the classifcation, which assignsa probability of being a member of the class instead of a single class membership, should be a better approximation of the galaxy A70, page6of 25 Table 1. Main physical properties of theFEMclasses. Class N frac[%] z n NUV-r r−KU−VD4000n EW(OII) log(Mstar/M ) log(sSFR)[yr−1] NAGNs fracAGNs[%] (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) Ellipticalgalaxies 0.11 0.96 0.37 0.10 0.09 0.11 0.24 2.16 1 4476 9.11 0.67 + 3.33 + 5.43 + 0.95 + 1.99 + 1.76 + – 10.77 + −16.88 + 0 0.00 −0.11 −1.20 −0.24 −0.10 −0.08 −0.11 −0.23 −3.55 0.10 1.03 0.19 0.08 0.09 0.12 0.21 0.14 2 2399 4.88 0.67 + 3.32 + 5.04 + 0.99 + 1.98 + 1.75 + – 10.83 + −11.85 + 0 0.00 −0.11 −1.30 −0.18 −0.09 −0.08 −0.10 −0.22 −0.29 0.10 0.99 0.22 0.12 0.10 0.14 0.23 0.16 3 3558 7.24 0.68 + 3.04 + 4.46 + 0.98 + 1.91 + 1.68 + – 10.83 + −11.28 + 0 0.00 −0.10 −1.42 −0.21 −0.11 −0.09 −0.12 −0.23 −0.17 Intermediate galaxies 0.11 0.68 0.24 0.12 0.14 0.11 0.25 0.60 4 4274 8.70 0.70 + 1.78 + 3.47 + 1.03 + 1.64 + 1.41 + – 10.69 + −9.79 + 4 0.09 −0.11 −1.11 −0.25 −0.13 −0.12 −0.14 −0.23 −0.47 0.08 0.77 0.51 0.12 0.21 0.15 0.27 0.17 5 3375 6.87 0.63 + 2.07 + 3.93 + 1.17 + 1.82 + 1.50 + – 10.50 + −9.57 + 0 0.00 −0.08 −1.17 −0.41 −0.13 −0.15 −0.16 −0.23 −0.33 0.09 0.51 0.29 0.16 0.14 0.107 0.21 0.35 6 964 1.96 0.67 + 1.35 + 3.29 + 1.41 + 1.58 + 1.36 + −16 + 10.50 + −9.21 + 2 0.21 −0.09 −0.81 −0.38 −0.14 −0.20 −0.11 −5 −0.22 −0.42 Star-forming galaxies 0.11 0.40 0.19 0.12 0.14 0.078 0.30 0.45 7 5099 10.38 0.67 + 1.15 + 2.71 + 0.88 + 1.35 + 1.28 + −16 + 10.36 + −9.29 + 17 0.33 −0.12 −0.75 −0.19 −0.12 −0.12 −0.09 −5 −0.31 −0.47 0.09 0.33 0.16 0.12 0.09 0.059 0.23 0.26 8 1755 3.57 0.72 + 0.91 + 2.38 + 1.00 + 1.15 + 1.21 + −21 + 10.12 + −8.76 + 14 0.80 −0.10 −0.70 −0.15 −0.14 −0.08 −0.06 −7 −0.19 −0.30 0.11 0.30 0.16 0.12 0.13 0.068 0.25 0.43 9 5378 10.95 0.67 + 0.92 + 2.13 + 0.63 + 1.07 + 1.21 + −24 + 9.91 + −8.95 + 31 0.58 −0.13 −0.64 −0.15 −0.13 −0.12 −0.07 −7 −0.28 −0.39 0.10 0.31 0.21 0.16 0.12 0.0610 0.22 0.21 10 13978 28.45 0.66 + 0.94 + 1.60 + 0.36 + 0.86 + 1.16 + −36 + 9.56 + −8.84 + 123 0.88 −0.12 −0.63 −0.19 −0.18 −0.11 −0.06 −8 −0.23 −0.28 0.12 0.45 0.21 0.20 0.14 0.0714 0.21 0.33 11 2699 5.49 0.71 + 1.11 + 1.01 + 0.03 + 0.59 + 1.07 + −54 + 9.28 + −8.61 + 216 8.00 −0.18 −0.83 −0.20 −0.22 −0.12 −0.07 −12 −0.25 −0.32 Broad-lineAGNs 0.25 1.17 0.31 a 0.49 0.14 aa 12 174 0.35 2.24 + 2.80 + −0.08 + 0.49 + 1.00 + – 166 95.40 −0.56 −0.89 −0.39 −0.31 −0.16 Notes. The number of members (N)andfractionofwholesample(frac[%])ineachclasscorrespondstothenumberandfractioninthefnalsample,i.e.48129galaxieswithhigh1st-best(>50%) and low 2nd-best (<45%)classmembershipprobabilities.Foreachclass,themedianvalues of:redshift (4),Sérsicindexfrom Krywultet al. (2017)(5),rest-framecolours(6–8),spectralfeatures(9–10),andphysicalpropertiesderivedfromSEDftting((11–12); Moutardetal.2016b)areprovided.Errorscorrespondtothediff erences between median and 1st, and 3rd quartile, respectively.Thenumberandfractionofbroad-lineAGNs(asclassifedbyVIPERSteammembers)ineachclassaregiveninCols.13,and14,respectively. EW([OII]λ3727)was not detected for the majority ofgalaxies(96,91,85,59,72,98%)withinclasses1–5,and12,respectively. (a)Stellar mass,SFR, sSFR, r−K colour derived from SED-ftting are expected to be wrong, as they are estimated throughtheftting of galaxy models (BC03), not suited for broad-lineAGNs. Colours are given inABsystem. M. Siudeket al.: The complexityofgalaxy populationsat z ∼ 0.7 A70, page7of 25 Fig.
2.
Colour–colour(–colour) diagramsofthe VIPERSgalaxies classifedinto11 classeswiththeFEM algorithm.The errorbars correspondto the frstandthe third quartileofthegalaxy colour distribution, whilethetwohalfaxesofthe ellipses correspondtothe median absolutedeviation. Panel a:UVJ diagram.The solidline correspondstothe standard separation between quiescentand star-forminggalaxies.The area occupiedby CANDELS transitiongalaxiesisshownasagreyshaded area(Pandyaetal.2017). Panel b: NUVrK diagram. The black solid line corresponds to the separation of red passivegalaxies, and the dotted line separates additionallygalaxies located in green valley. Galaxies photometrically classifedbytheirSEDtypeas:(1) early-type(redEandSa;ETGs),(2)earlyspiral(ESGs),(3)latespiral(LSGs),and(4)irregularorstarburst (SBGs)followingthe prescriptiongivenin Fritzetal. (2014)aremarkedwithlightsalmon,gold,violet,andblue,respectively. Panel c: NUVrK diagram.Theblack dashedlines correspondstothedivisionof CFHTLSgalaxiesintoseven groups proposedby Moutardetal. (2016a).Thegrey solid line corresponds to the separation of red passivegalaxies. The greydash-dotted lines correspond to upper (lower) limits of the green valley galaxies proposedbyMoutardetal. (2016b). Panel d: 3D diagram. The dotted lines indicate the projection of FEM classes on the bottom plane (z−K vs. NUV−r). evolution, as a continuous transition between different groups (even if theyare well separated in the feature space) is expected. Therefore, each group contains its core representativepopulation anda (usually small) numberofgalaxies that are more loosely mapped to them. A detailed description of the class membership probabilitiesisgivenin Appendix B.In the following anal
ysis, we focus on the representativegalaxies, for which the class membership is not questionable. Our initial sample of 52 114 galaxies was therefore cleaned by excluding objects located in between adjacent classes, andoutliers, based on their probabil-ity.In particular,2947galaxies withlow probabilities(<50%) of being class members, and 1038 objects with high(>45%) probabilities of belonging to a second group were removed. However, itisworthnotingthatthisleadstothe rejectionofonly8%ofthe sample, therefore demonstrating the robustness of the clustering process performed with the FEM algorithm. The resultant fnal catalogue consistsof47556galaxies(and 573 broad-lineAGNs). The number of sources in each class, as well as the basic properties of the FEM classes, are summarised inTable 1. 4.1. Multidimensional galaxy separation versus standard methods The FEM classifcation allows fora more sophisticatedgalaxy separation than the standard two-dimensional (2D) colour– colour diagrams. The typical classifcation schemes are mostly based on tight and linear cuts in the 2D space, while an A70, page8of 25 M. Siudeket al.: The complexityofgalaxy populationsat z ∼ 0.7 unsupervised approach associates each object to the group based on its location in the multidimensional space de-scribing galaxy properties. Colour–colour diagrams, including NUVrK (Arnouts et al. 2013) and U−V versus V−J (UVJ; Williams et al. 2009), are coarser classifcations than the ones obtained with the multidimensional approach, even if the trends are continuous. At the same time, the FEM classes correspond well to the classifcation schemes based on all these colour– colour diagrams. FEM classes are able to reproduce the standard colour–colour separation into passive and star-forminggalaxies. Unsupervised classifcation further introduces the division into subclasses, which monotonously change their physical, spectroscopic, and morphological properties from class to class. This reveals the differences within passive, intermediate, and star-forminggalaxy populations. The FEM classifcation createsa multidimensional separation cut. The advantage of this approach isthatitis sensitivetoalarger numberofgalaxy propertieswith respect to standard classifcation techniques. For example, as shown in a subsequent analysis, the three red passive classes are indistinguishable in the r−K, U−V, and V−J colours,but have different FUV and NUV properties. Figure 2 presents colour– colour diagrams: FUV – NUV versus z−K versus NUV−r, NU-VrK,andUVJ,where the median colours for the 11 FEM classes are shown. The error bars correspond to the frst and third quartiles of the galaxy colour distribution, while the semi-major and minor axes of the ellipses correspond to the normalised median absolute deviation (NMAD) defned by Hoaglin et al. (1983), as NMAD = 1.4826 · median(|P -median(P)|), where P corresponds to the measured colour reported on each axis. Classes are labelled according to their NUVrK colour, from the reddest, class 1, to the bluest, class 11. We note that green galaxies (classes 4–6) are labelled to follow their r−K colour change rather than NUV−r, which is more sensitive to dust obscuration. The FEM classes may overlap with each other on 2D diagrams, and the clear separation may only be revealed when an additional parameter is added. This is especially relevant for red passivegalaxies (classes 1–3) which are not distinguishable in the UV J diagrams (see Fig. 2a), and are only partially separated in the NUVrK diagram (classes 1–2overlaps, see Figs. 2c and b). Only when an additional parameter is added to the diagram (see Fig. 2d)is the clear separation between three classesof red passivegalaxies achieved,andthe inhomogeneityofredgalaxies becomes visible. 4.1.1. The UV J diagram As proposed by Williams et al. (2009) and confrmed by many others (e.g. Whitaker et al. 2011; Patel et al. 2012; van Dokkum et al. 2015), passive and star-forminggalaxies oc
cupytwo distinct regions on the UVJ diagram. Figure 2a shows the distribution of the 11 FEM classes on the UVJ diagram, with the standard division between passive and star-forming galaxies marked with a black solid line. Passive galaxies (classes 1–3) are redder in U−V and bluer in V−J relative to galaxies that are young and dusty, which are red in both U−V and V−J colours (class 6). Galaxies classifed as green intermediate (classes 4–6) are not as red in U−V,which may indicate that they still have some active star formation. Galaxies within classes4 and5reproduce remarkablywellthe CANDELSsampleof1745 massive(>1010 M )transitiongalaxies observed at0.5< z < 1.0 on the UV J diagram (see Fig. A1 in Pandya et al. 2017). Star-forming and transition CANDELS galaxies are not well sep-arated on the UV J plane; the region occupied by the FEM classes6and7is already strongly occupiedby the star-forming sample; therefore, we do not connect them with CANDELS tran-sition population. Moreover, class 4 is placed in the region of dust-free CANDELS transitiongalaxies, whereas class5 correspondstothe moredustygalaxies(seethe distributionoftheoptical attenuation in Fig. A1 in Pandya et al. 2017). Thesegalax
ies tendto occupya transitionregion populatedbygalaxies with a variety of morphologies(Moutard et al. 2016a). Therefore, we concludethat classes4and5consistofgreen intermediategalaxies, representing a mixed population in the transition phase be-tween passive and star-forming categories. Intermediategalaxies are locatedinthe greenvalley,a wide region in the ultraviolet-optical colour magnitude diagram be-tween the blue and red peaks, and usually theyare hard to distinguish, as the classical selection criteria are not well defned (e.g. Salim 2014, and references therein). However, Schawinski et al. (2014)have already shown the existence of two different popu-lations of greengalaxies with respect to theirgas content, sep-arating intermediate galaxies into green spirals and green el-liptical populations. The three intermediate FEM classes (4–6) confrm that the green valleypopulation is not a homogeneous category of galaxies. Star-forming galaxies (classes 7–11) are well separated on the UVJ diagram, showing bluer U−V and V−J colours with increasing class number. The median U−V and V−J colours for 11 FEM classes are given in Fig. 4 and Table1. 4.1.2. The NUVrK diagram Figures2b,c present the distribution of the 11 FEM classes in the NUVrK diagram(Arnouts et al. 2013). The NUVrK diagram is similar to the UV J plane (see Fig. 2a and Sect. 4.1.1),but al-lows fora better separation between passive and activegalaxies. The NUVrK diagram is also a better indicator of dust obscuration and current versus past star formation activity. Old, qui-escentgalaxies exhibit redder NUV−r colours, whilegalaxies with a younger stellar content are bluer. However, the NUV−r colour is highly sensitive to dust attenuation, meaning that dusty star-forming galaxies may also show reddened NUV−r colours(Arnoutsetal. 2007;Martinetal. 2007).Thevectorfor increasing dust reddening acts perpendicularly to the vector of decreasing specifc SFR (defned as the SFR per stellar mass unit, hereafter sSFR), enabling the degeneracy to be broken. Therefore, the NUVrK diagram is extensively used to separate differentgalaxy types (e.g. Arnouts et al. 2013;Fritz et al. 2014; Moutard et al. 2016b;Davidzon et al. 2016). Davidzon et al. (2016)proposed criteria for the selection of passive and intermediate objects in the NUVrK diagram (black solid and black dashed lines in Fig. 2b, respectively) based on VIPERS PDR1galaxy sample. Moutardetal. (2016b)defned a slightly different division between quiescent and star-forming galaxies (black solid line in Fig. 2c), as absolute magnitudes were derived through SED-ftting with other assumptions. In par-ticular, the slopeof the line separating active and passivegalaxies in the NUVrK diagram found by Davidzon et al. (2016)is fatter than the one presented in Moutard et al. (2016b)(slopes are S = 1.37, S = 2.25, respectively). Both criteria show a simi-lar behaviour with respect to the FEM classes. Classes 1–2 per-fectly match the area occupied by red passive galaxies, while class3 is close to the separation line between red passive and the greenvalley regionas defnedby Moutardetal. (2016b).As previously mentioned,class3isnotpurelypassiveandmayrepresent the population of red galaxies that have just joined the passive evolutionary path. A70, page9of 25 There is a clear path in the NUVrK diagram along which the FEM classes are distributed. Figures 2c and b show that classes 1–3 are placed at the top of the diagram, while classes 7–11 occupy its bottom part with the intermediate area reserved for classes 4–6. The FEM classifcation also very closely follows the photometric selection based on the SED ftting by Fritzetal. (2014;see pointsinFig. 2b, colour-coded accord-ing to SED type). Almost all FEM red passivegalaxies (classes 1–3; ∼98%) are defned as ETGs (redE/Sa) by the SED classifcation (ETGs are marked with salmon circles in Fig. 2b), and most star-forminggalaxies (classes 10–11) are classifed asirregular or starburst types(∼97%; SBGs marked with blue tri-angles in Fig. 2b). The intermediate (4–6) and star-forming (7– 9) classes match reasonably well(∼70%) with the early-and late-type spiralgalaxies classifed based on their SEDs (ESGs and LSGs are marked with yellow stars and purple pentagons, respectively). The FEM classes (Fig. 2c) also followvery well the classif
cationof CFHTLSgalaxies proposedby Moutardetal. (2016a). The region of dusty star-forming galaxies mainly corresponds to classes 5–6, whereas classes 7–11 are found in the star-forming area (Moutard et al. 2016a). Galaxies become bluer (both in NUV−r and r−K;except the r−K colour for intermediategalaxies) with increasing class number, that is, classes 7– 11 contain the bluestgalaxies. When the stellar populations be-come older or the amountof dustingalaxies increases, the r−K colour becomes redder. The greengalaxies, members of classes 4–6, are characterised by redder r−K and NUV−r colours relative to the star-forming cloud (classes 7–11). Only edge-on galaxies may have the reddestr−K colours(Arnouts et al. 2013; Moutard et al. 2016a). Therefore, as FEM class 6 shows the reddest r−K colours, we conclude that its colours may be a consequence of dust within the disks or their high inclinations. The area of the NUVrK diagram occupied by classes 4 and5is placedin the region where Moutardetal. (2016a) located a morphologically inhomogeneous class of galaxies, which in our classifcation may be divided into more homo-geneous classes. Moutardetal. (2016b) found thesegalaxies to be most likely transiting from the star-forming to the pas-sive population. Class 4 has similar r−K colours to classes 1–3, showing that this class, as already mentioned, is close to passive galaxies. The top of the diagram is reserved for classes 1–3, which show the reddest NUV−r colours in the FEM classifcation. Besides the clear differences between the three main classes (red/green/blue) on the NUVrK diagram, the difference is visible also within subclasses. The red subclasses show the progressive reddening in NUV−r colour starting from class 3, and ending in class1, as shownin Figs. 2band c. The clear separationof three red passive classes is clearly visible in the FUV–NUV colour (see Fig. 2d). At the same time, there is no signifcant change in their r−K colour. Red passivegalaxies are populatedby old stellar populations and have little dust, and therefore we do not expect to distinguish different red passive populations in r−K colour, which is sensitive to dust obscuration. At the same time, these subclasses show only small differences in the strengths of their D4000n (see Fig. 5, and Table 1), suggesting only small differences in their stellar ages. However, classes 1–3 show signifcant changes in sSFR (see Fig. 6, andTable 1), which may indicate that star formation contributes more to class3 than to the frst and second classes. Figure 3 shows the NUVrK diagram in six redshift bins spanning the redshift range 0.4< z < 1.0. The colour evolution of the galaxy populations with redshift is clearly visible. Madau&Dickinson (2014, and references therein)have already shown thatgalaxy properties such as SFR and colour change signifcantly withinagalaxy population asa functionof redshift. Figure3shows that propertiesofgalaxy types indeedvary with cosmic time. Redpassivegalaxies (classes1–3)form threedifferent, wellseparated clusters in the NUVrK diagram at z ∼ 0.4. When we move back with cosmic time, classes1 and2 tend to progres-sively merge up to z ∼ 1. At z ∼ 1, the separation betweenclasses 1 and 2 is less evident. This could be a consequence of the colour–colour pre-selection sample bias, as at z ∼ 1VIPERS ob-served only the most massive and the brightest galaxies, but may also imply that the population of red passive galaxies was more homogeneous at earlier epochs. Red passive galaxies achieve their fnal morphology at z ∼ 1, whereas at higher redshifts (1 < z < 2) the peak of their evolution is expected (e.g. Bundyetal.2010).The homogeneityof classes1and2at z ∼ 1, at least in NUV−r and r−K colours, may therefore indicate that these groupsofredgalaxies were inseparableatthatepochwith respect to some of their physical properties, when theystill at-tain their fnal form (e.g. Cimatti et al. 2004; Glazebrook et al. 2004).The detailed analysisofthephysical processes leadingto the separation of three different red passivegalaxy classes will be presented in a forthcoming paper. 4.2. Global properties of FEM classes A visible separation of 11 classes in the 3D and 2D colour– colour diagrams may be expected, as the FEM classifcation is based on the normalised absolute magnitudes and, therefore, colours. In this section, we examine properties that were not in-cluded in the parameter space used for the automatic classifcation. Below we investigate morphological, spectral, mass, and star formation properties of the differentFEM classestoexamine whether or not there is a correspondence between our classifcation and these properties. The distributions of main properties along the 11 FEM classes are shown in Figs. 5 and 6, and summarised in Table 1. In particular, the following features were derived for VIPERSgalaxies: Sérsic index(n;calculated for VIPERS sample by Krywult et al. 2017), equivalent widths of[OII]λ3727, the strength of the 4000Å break (D4000n, as defned by Balogh et al. 1999), and physical properties derived from SED ftting: stellar masses, and sSFR (calculated by Moutard et al. 2016b). The following analysis is based on the median values of these parameters derived for each class. The error bars correspond to the frst and third quartiles of thegalaxy property dis-tribution. To trace the change of spectral properties along the FEM classes, the strength of the 4000Åbreak and equivalent width of[OII]λ3727of individualgalaxiesin each FEM classis mea-sured. Figure 5 shows the weakening of the median 4000Å break, and the increasing of the median EW([OII]λ3727) with increasing class number. Galaxies within classes 1–3 have D4000n greater than 1.5 (dashed line in Fig. 5), and simultane
ously display negligible emissionin[OII]λ3727, whilegalaxies within classes 7–11 have strong emission in the[OII]λ3727 line, and a 4000Åbreak lower than 1.5. The threshold for D4000n at 1.5, dividing actively star-forming and passive galaxy populations, has been found by Kauffmann et al. (2003)for local Uni-verse andextendedto higher redshiftby Verganietal. (2008). This cut allows us to associate galaxies hosting old stellar A70, page 10 of 25 M. Siudeket al.: The complexityofgalaxy populationsat z ∼ 0.7 Fig.
3.
NUVrK
diagrams of the 11 FEM classes in six different redshift bins spanning the redshift range0.4< z < 1.0. The error bars correspond to the frst and the third quartiles of thegalaxy colour distribution, while the two axes of the ellipses correspond to the median absolute deviations. The fractionofgalaxiesin each classisgivenin thelegend. populations with no sign of star formation activity to classes 1– 4.2.1. Morphological properties 3, and younger objects with stronger on-going star formation to Onewayto defnethetypeofagalaxyisto analyseits structure. classes 7–11. The more detailed description of the spectral prop-In the local Universe, passive galaxies are usually spheroidal, erties of the 11 FEM classes is presented in Sect. 4.4. while star-forming galaxies are irregular or disk shaped (e.g. The refection of our classifcation on differentgalaxy prop-Bell et al. 2012). Krywult et al. (2017)showed that this is also erties indicates the robustness of our approach and thefact that the case for the whole mass distribution(8 . log(Mstar/M ). 12) the proposed classifcation may be able to trace the evolutionary and redshift range (0.4. z . 1.3) of VIPERS galaxies. To de- stages from blue and active to red passive types. scribe the shapes of the light profles of VIPERSgalaxies, the A70, page 11 of 25 Fig.
5.
Spectral and morphological properties of the 11 FEM classes: D4000n, EW([OII]λ3727),and Sérsicindexasa functionofclass num-ber.Thedivision between red passive and blue active based on D4000n according to Kauffmann et al. (2003)is marked with a black dashed line. The range of mean values of Sérsic index for VIPER red passive and blue star-forminggalaxies obtainedby Krywultetal. (2017)are marked with black solid and dashed lines, respectively. The[OII]λ3727 linehasnotbeen detectedinthe majorityofgalaxies within classes1–5 (for 96, 91, 85, 59, and 72%, respectively). Sérsic index is used(n, Sérsic 1963). The index has low values (n ∼ 1)for spiralgalaxies whose diskshave surface brightnesses witha shallow inner profle, and highvalues(n ∼ 3–4) for ellipticalgalaxies whichhave surface brightnesses witha steep inner profle (e.g. Simard et al. 2011; Bell et al. 2012; Krywult et al. 2017). Krywult et al. (2017) showed that VIPERS disk-shaped galaxies have Sérsic index mean values in the range n ∼ 0.81 –1.11, whereas spheroidgalaxies are characterised with average Sérsic indices in the range n ∼ 2.42 –3.69. As shown in the lower right panel of Fig. 5, there is a very good corre
lation between the FEMgalaxy class and Sérsic index. FEM red passivegalaxies (classes 1–3)havea median Sérsic index n > 3, indicatinga spheroidal shape, while classes 7–11 showa signifcantly lower median Sérsic index n . 1, typical for diskgalaxies. For intermediate classes, the median Sérsic index isn ∼ 1.7, con-frming that classes 4–6 are mainly composed of intermediate galaxies also in terms of this structure. Krywult et al. (2017) A70, page 12 of 25 Fig.
6.
SED-dependent properties of the 11 FEM classes: stellar mass (log(Mstar /M )), and log(sSFR)[yr−1]asa functionof class number.The transition mass found for VIPERSgalaxies at z ∼ 0.7byDavidzon et al. (2013)is shown with a dashed line. demonstrated the strong correlation between morphology and galaxy colour, which is also refected in our studies. 4.2.2. Physical properties ThetoppanelofFig. 6showsthe median stellar masses obtained for the 11 FEM groups. The stellar mass decreases with class number. Galaxies assigned to classes 7–11 are less massive(with median stellar mass ∼109.7± 0.3 M )thangalaxies within classes 1–3 (median stellar mass ∼1010.8± 0.2 M ). The stellar mass changeis much more rapid for star-forming classes (0.3dex per class), whereas for red passive classes the median stellar mass is almost constant (0.05dex). Our classifcation follows well the locationofpassiveandactivegalaxytypeswith respecttothe tran-sition mass. The transition mass separates blue star-forming and red passive populations, since above the transition mass, red pas-sivegalaxies dominate, and below that mass, star-forminggalaxies are the most numerous population (e.g. Kauffmann et al. 2003; Vergani et al. 2008; Pannella et al. 2009; Davidzon et al. 2013). Based on the VIPERS dataset, Davidzon et al. (2013) determined the transition mass to be log(Mstar/M )= 10.6 for galaxies atz ∼ 0.7. Our classifcation is consistent with this re-sult. Median stellar masses of galaxies within classes 1–3 are above the transition mass (marked with the dashed black line in Fig.6), while classes 7–11 are located below the transition mass consistent with thefact that thesegalaxies are still forming stars. The intermediategalaxies within class4 have the median stel-lar mass which matches the transition mass perfectly. This con-frms that this is the group of sources that are just entering the passive evolutionary path. Classes 5–7 have stellar masses just below the transition mass (1010.5 M )between the red and blue populations. Finally, the bottom panel of Fig. 6 shows the change of sSFR as a function of class number. The FEM classes are well separated in sSFR, with red passive galaxies (classes 1–3) showing the lowest star formation activity,whereas sources from the blue classes (7–11) have the highest sSFRs. At the same time, from Fig. 5 we can see that classes 7–11 have high EW([OII]λ3727), which is typical for blue star-forming galaxies (e.g. Cimatti et al. 2002). The sSFR obtained for the intermediate galaxies (log(sSFR)∼− 9[yr−1]) is in agreement with the results derived for 1745 CANDELS transition galaxies observedat0.5< z < 1.0(log(sSFR)∼− 9[yr−1],Pandya et al. 2017). Summarising, the distributions of the physical proper
ties (see Figs. 4–6, andTable 1)show the trends of global and systematic changes along the FEM classes. The main spectral, morphological and physical properties correlate well within and among the groups, that is, the most massive spheroidalgalaxies M. Siudeket al.: The complexityofgalaxy populationsat z ∼ 0.7 Fig.
7.
SFR-stellar mass relation for FEM classifcation. The median log(SFR)vs. median log(Mstar /M )for classes 2–11 are shown. The error bars correspond to the frst and third quartile of thegalaxy SFR-stellar mass distribution, while the area of ellipses correspond to the median absolutedeviations.The colours aregivenasinFig. 2.The frst class is not plotted due to its very low median SFR. The black solid line corresponds to the MS trend at z = 0.7 found by Whitaker et al. (2012), while dashed and dashed-dotted lines correspond to 4× MS , and 10 × MS to represent active star-forming and starburstgalaxies, re-spectively, following Rodighiero et al. (2011). populated by old stellar populations are the reddest in comparison to the disk-shaped bluer galaxies hosting younger stellar contents. This demonstrates that our classifcation traces theevolutionary phases andgalaxy types. 4.3. The SFR − M∗ relation Galaxies show a correlation between their SFR and stellar mass at redshifts at least up to z ∼ 6 (e.g. Brinchmann et al. 2004; Noeske et al. 2007; Whitaker et al. 2012; Speagle et al. 2014; Salmon et al. 2015). This correlation, often called the galaxy main sequence(MS),islikely connectedwiththephysicalmechanisms responsible forgalaxy growth, regulatedby the accretion of gas from cosmic web and gas feedback (e.g. Bouché et al. 2010). The SFR dependence on stellar mass for the different FEM classesisshowninFig. 7.Theblacksolidline correspondstothe MS at z = 0.7according toWhitaker et al. (2012).Whitaker et al. (2012)have established the slope and the normalisation of the SFR(M∗)as a function of redshift allowing us to reproduce the MS trend at z = 0.7, the median redshift of VIPERS galaxies. Passivegalaxies within classes 2–3 (class1 is not presented in Fig.7 dueto itsverylow SFR; log(S FR)=−6.1[M yr−1]) occupy an area well below the MS line. The star-forminggalaxies assigned to classes 7–11 instead follow the tight MS trend, showing a steady increase in SFR with stellar mass as expected for the MS at this redshift. Therefore, this confrms classes 7–11 to be representative clusters of star-forming MS galaxies. However, we note that most of these median values are above the solid line. The global offset for star-forminggalaxies could be due to the extinction law and SFH used for SED ft-ting.The Calzettietal. (2000)extinctionlawis characterisedby larger attenuations at longer wavelengths which results in lower stellar masses compared to other recipes such as Charlot&Fall (2000) or LoFaro et al. (2017; for more detailed discussions we refer to LoFaro et al. 2017 and Ma ek et al. in prep.). Therefore, we relate the offset in the SFR to the method used to calculate SFR. Whitaker et al. (2012)used theKennicutt(1998) relation which assumes a constant SFR. This assumption leads to the overestimation of the SFR with respect to the other SFHs in the literature (and with respect to the delayed SFH used for theSED ftting;e.g. LoFaroetal.2017).To summarise,thedif
ferent models used for VIPERS SED ftting and to obtain the MS relation have infuence on the observed offset in Fig. 7. Galax
ies assigned to class8showa SFR−M∗ relation slightly above MS. However, we stress that within uncertainties this class is still consistent with the trend defned by the other classes. The median SFR ofgalaxies in class8is located at4 × MS (dashed line), which is attributed togalaxies with enhanced star forma-tion(Rodighieroetal.2011).Thisclassisalso characterisedby redder r−K colours than, for example, class 7, and a strong Hβ line, but not one stronger than the Hβ line for class 10 (see Fig.9). 4.4. Spectral properties In this section, the spectral properties of the photometrically motivated classes are presented.To compare the spectral properties to the classifcation scheme, the stacked spectra for each of the 11 FEM classes were derived. The spectra were co-added in nar-row redshift bins(δz = 0.1 from 0.4 to 1.0) in the same way as described in Siudek et al. (2017). Firstly, the rest-frame spectra were re-sampledtoa commonwavelength grid. Individual spectra were normalisedbydividingthefuxatallwavelengthsbythe scalingfactorderivedusing medianfux computedinthewavelength region 4010 <λ(Å)< 4600. The stacked spectra were then obtainedby computing the mean fux from all individual spectra at all wavelengths in the common wavelength grid, and rescaled bymultiplyingthe fuxatallwavelengthsbyanaveragevalueof scalingfactors of the individual spectra. Given the large sample of VIPERSgalaxies, the constructed stacked spectra are char-acterised by a signal-to-noise ratio (S/N) high enough to detect absorption lines that are undetectable on typical, individual spectra (e.g. the Hδ line; see details in Siudek et al. 2017). Figures 8 and 9 show the stacked spectra of the 11 FEM classes in six redshift bins spanning the redshift range 0.4< z < 1.0. The stacked spectra show that there is a gradual change as a function of class number. The lines go from ab-sorption (in the frst class) to strong emission (in the eleventh class). All composite spectra of galaxies assigned to classes 1–3 are dominated by absorption lines and show weak emission lines.We can clearly see the strong 4000Åbreak, G-band (4304Å), and Balmer lines over most of the redshift range, even if some of these features are not observed at z > 0.8 be-causeofthewavelength range 5500–9500Åof VIPERS spectra; see Scodeggio et al. (2018)for details. The strong absorption lines for these features are typical for early-typegalaxies (e.g. Wortheyetal. 1994;Worthey&Ottaviani 1997; Gallazzietal. 2014;Siudek et al. 2017). Therefore, we conclude that the spec-tral properties indicate thatgalaxiesin classes1–3 consistofold stellar populations. From Fig. 8, we can see that the Hδ line is getting stronger with redshift for all three red passivegalaxy classes, which may be simply indicating that stellar populations are getting older as time passes. There is also a change in the relative strength of the CaIIH (3969Å) and CaIIK (3934Å) lines, as the CaII K line dominates at z ∼ 1, while the CaII H line dominates at lower redshifts, especially for galaxies in class 3. The CaIIK line dominates ingalaxies with old stellar A70, page 13 of 25 Fig.
8.
Stacked spectraof VIPERSgalaxies amongFEM classes1–6indifferent redshift bins. Rest-frame composite spectra were normalised in the region 3600 <λ< 4500Å. The most prominent spectral lines are marked withvertical solid lines with labels. populations, whereas CaIIHdominates when the younger stars appear. Spectra of the green group (classes 4–6) show properties in-between the red and blue populations (see also Vergani et al. 2017). The representative stacked spectra of classes4and5 are characterisedbystrong emissioninthe[OIII]λλ4959, 5007 doublet with no or little sign of the recombination line Hβ at redshift range0.4< z < 0.7. Sinceahigh ratioof[OIII]λλ4959, 5007 to Hβ lines is an indication of AGN photo-ionisation, this suggests thata non-negligible fractionofgalaxiesin these classes may host a Seyfert nucleus. However, this is not confrmed by the localisation of classes 4 and 5 on the BPT diagram (see Sect. 4.4.3), even if only galaxies within redshift range 0.4< z < 0.7 are considered. Therefore, we are not able to con-clude whether thosegalaxies hostaSeyfert nucleusor not.The stacked spectra of intermediate galaxies within class 6 show diagnostic lines (e.g.[OII]λ3727,[NeIII]λ3869, Hβ)in emission. There is also a hint of star formation activity in the in-termediate classes (4–6) revealed by detectable emission in the [NeIII]λ3869 linein all redshift ranges(Ho&Keto 2007). The stacked spectra of galaxies in classes 7–11 show that they are undergoing a signifcant level of star formation, in-dicated by prominent emission lines, like the[OII]λ3727 or Hβ lines, and a weak 4000Å break (e.g. Mignoli et al. 2009; Haines et al. 2017). The emission lines are getting stronger with increasing class numberof star-forminggalaxies.The possibility ofAGNsis further discussedin Sect. 4.4.3.In this paper, we focus on general properties of the whole classifcation scheme. The detailed properties and evolutionary trends of the FEM classes will be discussed in future papers. 4.4.1. The comparison of FEM classes withKennicutt’s Atlas To better defne the morphological and spectral types of each of the 11 FEM classes, we compare their representative stacked spectra with those of galaxies of different Hubble types as givenby Kennicutt(1992).Kennicutt’s Atlas consistsof55 inte-grated spectraof nearbygalaxies,coveringthewavelength range 3650 <λ[Å]< 7100 witha resolutionof 5–8Å, grouped accord-ing to their morphological and spectral types. Kennicutt(1992) provides a set of individual normal and peculiar galaxies fol-lowing the Hubble sequence, from giant ellipticals(NGC1275) to dwarf irregulars(Mrk35).We compared the 11 FEM classes with the Atlasby assigning to each FEM class the best spectrum in the Atlas based on the χ2 minimisation. The 11 FEM classes tend to followthe Hubble sequence as classes 1–3 showmorphologically earlier types than the other classes. Stacked spectra of galaxies in classes 7–11 are quite well reproduced by the spiral, irregular,and emission-linegalaxies(Sc,Im), whereas spectraof Sbgalaxies best ft the stacked spectra of intermediategalaxies A70, page 14 of 25 M. Siudeket al.: The complexityofgalaxy populationsat z ∼ 0.7 Fig.
9.
Stacked spectra of VIPERSgalaxies among FEM classes 7–11 in different redshift bins. Rest-frame composite spectra were normalised in the region 3600 <λ< 4500Å.The most prominent spectral lines are marked withvertical solid lines with labels. The last panel shows stacked spectraof FEM classes 1–11in redshift bin0.7< z < 0.8. (classes 4–6), and the template spectrum of Sabgalaxy fts the representative stacked spectra of classes 1–3. The detailed com-parison of spectral properties of the 11 FEM classes to the spec-tral Atlasof Kennicutt(1992)is discussedin AppendixD. 4.4.2. Comparison to principal component analysis (PCA) classifcation of VIPERS galaxies Inthis section,we comparetheFEM classifcationtoaclassifcation scheme used within the VIPERS surveyby Marchetti et al. (2013), based on the PCA technique applied to spectra of VIPERS galaxies. The PCA-based algorithm divided VIPERS galaxies into 15 different clusters based on the frst three eigen coefficients(θ−φ diagram). The PCA classifcation distinguished eight groups among the red and intermediategalaxy types from E to Sc, and seven classes of more active starburst galaxies. We fnd that our classifcation follows the track found by Marchettietal. (2013), since the reddest, early-typegalax
iesfallin the regionof the bottom left edgeof the φ−θ diagram, and with increasing θ and φ, the number of the FEM class is increasing, which implies thatgalaxies are bluer (see Fig. C.1). We fnd that∼70%of early-typegalaxies selected with PCA (PCA classes 1–2 contain E and Sa galaxies) are distributed in the FEM classes 1–3. This indicates the similarities in the capability of separation of ETGs, especially the oldest ones, in the VIPERS datasetby both methods. The dusty spiralgalaxies, Sb4,6 (with E(B −V)> 0.4; Kinneyet al. 1996), assigned to PCA classes 3–6 are spread amongvarious FEM classes, with the majority of them(∼70%) being located in the FEM classes 7–11. Almost all Scgalaxies(∼95%) selected by PCA (PCA classes 7–8) are assigned to the FEM classes 9–11. The spiral galaxies with smaller amounts of dust, Sb1,2 (with E(B − V)< 0.2; Kinneyet al. 1996), within PCA classes 9–13, are also found among FEM classes 10–11(∼80% of them). Thisshowsthatthereisaglobal agreement betweentheFEM and PCA classifcation schemes. However, it should be noted that these two classifcation schemes, being based on different input data (photometric data for FEM, and spectroscopic for PCA), are not fully coherent with each other and therefore do not show precisely the same patterns.In Appendix Citis shown how well, using the derived eigenvalues for VIPERS PDR1, the FEM classes are separated in the θ−φ PCA diagram. 4.4.3. The Baldwin, Phillips&Terlevich diagram To differentiate star-forminggalaxies fromAGNs, we checked the distributions of the intermediate and star-forming galaxies (classes 4–11) on the diagnostic diagram for emission-line A70, page 15 of 25 Fig.
10.
The distributions of FEM classes 4–11 on the “blue” BPT dia
gram introducedby Lamareille(2010).The numberof spectrain each class for which lines were measured in the redshift range0.4< z < 1.3 are given in the legend. The error bars correspond to the frst and third quartile of the line measurements distribution, while the area of ellipses correspond to the median absolute deviations. galaxies. The distribution of VIPERS galaxies in the BPT(Baldwinetal. 1981) diagram is shown in Fig. 10.We are able to separate LINERS and Seyferts based on their emission line ratios.Wemeasured emission lines on individual spectra within the redshift range0.4< z < 1.3assigned to classes 4–11, and the Hβ measurements were corrected for anaverage absorption component. The distribution of VIPERSgalaxies assigned to classes 4–10 indicates that those galaxies are star-forming galaxies. Class 11 is placed in the composite area (SF/Sy2 in Fig. 10), which may indicate that it contains AGNs. The con-taminationby broad-lineAGNs has no infuence on our result, as only line measurements of galaxies within redshift range 0.4< z < 1.0and redshift fag 3–4 are included (i.e.excluding the fags corresponding to broad-line AGN). However, AGNs are very rare among low-massgalaxies (the median stellar mass of galaxies within class 11 is∼109 M ), therefore, we suspect that these might be low-metallicity galaxies. However, both these options (AGN contributions and low-metallicity galaxies) are consistent with the spectroscopic properties of this group (see Sect. 4.4), as the spectra show strong emission lines. 5. Summary In this paper, a new approach togalaxy classifcation is intro-duced, based on the thirteen-dimensional parameter spacebuilt from 12 absolute magnitudes and the spectroscopic redshift. An unsupervised classifer based on the FEM algorithm blindly separated52114 VIPERSgalaxies into12 classes. The model se-lection (DBk) and the determination of the optimal number of classes were based on statistical criteria (BIC, AIC and ICL; see Appendix B)and found to be in the range 9–12. Subsequently, the fnal class number (12) was decided based on the analysisof thegalaxy fow witha changing numberof groups (see Fig. A.2), and the interpretation of physical properties of classes in different realisations (see Fig. 1). All these techniques resulted in the same model and an optimal number of 12 classes in the VIPERS dataset. These classes follow a well-defned se-quence from the earliest to the latest types, separatinggalaxies into three major groups: red, green, and blue. The FEM classifcation automatically fnds groups that share physical and spec-tral properties, beyond the features used for classifcation purposes. Galaxies are not unequivocally assigned to a single class, but the probability of belonging to each group is given. Such an approach is more realistic as the transition between classes can be continuous.In spiteof this,a majorityofgalaxies (92%)in the sample have high(>50%, with <45% second best probability) probabilities of belonging to the selected group.We obtain three main classes: red, green, and blue, which can be further separated into subclasses: three red, three green, and fve blue, and an additional class 12, which consists of outliers.For class 12, 95% of its members are broad-lineAGNs according to the visual classifcationsbythe VIPERS team(Garillietal. 2014). Their median redshift is zmed ∼ 2, which removes this class from the global pictureof VIPERSgalaxy types observedupto z ∼ 1. We demonstrated that our approach leads to a new classifcation scheme allowing us to track galaxy evolutionary paths. The main advantage of this approach is the ability to distinguish 11 galaxy types, which share physical and spec-tral properties not used in the classifcation procedure. The presented separation between different galaxy types differs from traditional selection methods based mainly on the bi-modal distribution in colours (e.g. Bell et al. 2004;Balogh et al. 2004b; Franzetti et al. 2007), spectral properties (e.g., Hα Balogh et al. 2004a), [OII]λ3727 emission (Mignoli et al. 2009), 4000Å break (Kauffmann et al. 2003; Vergani et al. 2008), and SFH(Brinchmann et al. 2004). Our main results are as follows:We presenta new unsupervised approachtogalaxy classifcation basedonthe multidimensional space of absolute magnitudes and the spectroscopic red-shift, which does not introduce anya priori defned cuts.We fnd three red, three green, and fve blue classes which are distributed along a well-defned path in multidimensional space. The bor-ders between classes are not sharp; the probability of belonging toagiven classis associatedtoeachgalaxy.However,theprobabilitiesof belongingtoagiven class arehigh(∼80%) and, in spite of the presence of outliers, the classes are well separated in the feature space and are therefore morefaithfully representativeofthefullcomplexityofthegalaxy populationattheseredshifts.Weshowtheevolutionofthe11 classesoverthe redshift range0.4< z < 1.0.We demonstrate that there are signifcant differences in physical and spectral properties between galaxies classifed as red/green/blue FEM classes and their subclasses. We fnd a very good correlation between the FEM classes and spectroscopic classes in the Atlas of Kennicutt (1992). The 11 FEM groups followthepath fromthe earliesttothe latestgalaxy types. In particular,the following FEM class properties were found: Classes 1–3 host the reddest spheroidal-shape galaxies show-ing no sign of star formation activity and dominated by old stellar populations (as testifed by their strong 4000Åbreaks). Classes 4–6 host intermediate galaxies whose physical properties, such as colours, sSFR, stellar masses, and shapes, are intermediate relative to red, passive, and blue, active galaxies. These intermediate galaxies have more concentrated light profles and lower gas contents than star-forming galaxies (as indicated by the Sérsic index, and EW(OII)). This tendency is also observed for intermediate galaxies observed in the lo-cal Universe(Schiminovich et al. 2007;Schawinski et al. 2014). Classes 7–11 contain the star-forminggalaxies. The blue cloud A70, page 16 of 25 M. Siudeket al.: The complexityofgalaxy populationsat z ∼ 0.7 of disk-shaped galaxies is actively forming new stars and are populated by young stellar populations (as indicated by the weak 4000Åbreak). Class 11 may consist of low-metallicity galaxies, orAGNs according to its localisation on the BPT diagram. Automatic unsupervised classifcations are becoming an in-valuable tool in the current era of information deluge. The FEM algorithm can also be applied to photometric samples with com-parable efficiency in distinguishing a full panoply of galaxy types(Siudeketal. 2018).With the increasing numberof deep surveys, such as Euclid and LSST,such algorithms may allow us tostudygalaxy formationandevolution acrossthe lifetimeofthe Universe. The presented classifcation scheme has great potential,aswe can ascertainthe classto whichagalaxyoragalaxy region belongs. Based on defned classes, different stellar populationscanbetracedandgalaxieswithin structurescanbeclassifed. Acknowledgements. The authors wish to thank the referee for useful and con-structive comments. The authors wish to thank Didier Fraix-Burnet and Charles Bouveyron for useful and constructive discussion.We acknowledge the crucial contribution of the ESO staff for the management of service observations. In particular, we are deeply grateful to M. Hilker for his constant help and support of this program. Italian participation in VIPERS has been funded by INAF through PRIN 2008, 2010, and 2014 programs. LG and BRG acknowledge support of the European Research Council through the Darklight ERC Advanced Re-search Grant (# 291521). OLF acknowledges support of the European Research Council through the EARLYERC Advanced Research Grant (# 268107). KM, TK, JK, MS have been supported by the National Science Centre (grant UMO2013/09/D/ST9/04030). MS also acknowledges fnancial support from UMO2016/23/N/ST9/02963by the National Science Centre.RTacknowledge fnancial support from the European Research Council under the European Community’s Seventh Framework Programme (FP7/2007–2013)/ERC grant agreement n. 202686. EB, FM, and LM acknowledge the support from grants ASI-INAF I/023/12/0and PRIN MIUR 2010–2011. LM also acknowledges fnancial support from PRIN INAF 2012. References Akaike,H.1974, IEEETrans. Autom. Control,19,716 Arnouts, S., Moscardini, L.,Vanzella, E., et al. 2002, MNRAS, 329, 355 Arnouts,S.,Walcher,C.J.,Le Fèvre,O.,etal.2007, A&A,476,137 Arnouts, S., Le Floc’h, E., Chevallard, J., et al. 2013, A&A, 558, A67 Arthur, D.,&Vassilvitskii, S. 2007, in Proc. of the Eighteenth AnnualACM- SIAM Symp. on Discrete Algorithms, SODA ’07 (Philadelphia, PA, USA: Society for Industrial and Applied Mathematics), 1027 Balcan, M., Liang,Y.,&Gupta,P. 2014, ArXiv e-prints [arXiv:1401.0247] Baldry, I. K., Balogh, M. L., Bower, R. G., et al. 2006, MNRAS, 373, 469 Baldwin,J.A., Phillips,M.M.,&Terlevich,R. 1981, PASP,93,5 Ball,N.M.,&Brunner,R.J. 2010, Int.J. Mod.Phys.D,19, 1049 Balogh,M.L., Morris,S.L.,Yee,H.K.C., Carlberg,R.G.,& Ellingson,E. 1999, ApJ, 527, 54 Balogh,M.,Eke,V., Miller,C.,etal.2004a, MNRAS,348,1355 Balogh, M. L., Baldry, I. K., Nichol, R., et al. 2004b, ApJ, 615, L101 Baudry, J.-P. 2012, ArXiv e-prints [arXiv:1205.4123] Bell,E.F.,Wolf,C., Meisenheimer,K.,etal.2004, ApJ,608,752 Bell,E.F.,vanderWel,A.,Papovich,C.,etal. 2012, ApJ,753,167 Bilmes, J. 1998, A Gentle Tutorial of the EM Algorithm and its Application toParameter Estimation for Gaussian Mixture and Hidden Markov Models (Berkeley, CA: International Computer Science Institute) Bouché, N., Dekel, A., Genzel, R., et al. 2010, ApJ, 718, 1001 Bouveyron,C.,&Brunet,C.2012, Stat. Comput.,22,301 Bouveyron,C.,&Brunet-Saumard,C.2014, Comput.Stat.,29,489 Brinchmann, J., Charlot, S., White, S. D. M., et al. 2004, MNRAS, 351, 1151 Bruce,V.A., Dunlop,J.S., McLure,R.J.,etal.2014, MNRAS,444,1660 Bruzual,G.,&Charlot,S.2003, MNRAS,344,1000 Bundy, K., Scarlata, C., Carollo, C. M., et al. 2010, ApJ, 719, 1969 Buta,R.J. 2011, Planets, Stars, and Stellar Systems,6 Buta,R.,&Zhang,X.2011, Mem.Soc. Astron.It.Supp.18,13 Buta, R., Mitra, S., de Vaucouleurs, G., & Corwin, Jr., H. G. 1994, AJ, 107, Buta, R. J., Sheth, K., Regan, M., et al. 2010, ApJ, 190, 147 Buta, R. J., Sheth, K., Athanassoula, E., et al. 2015, ApJ, 217, 32 Calzetti,D.,Kinney,A.L.,&Storchi-Bergmann,T.1994, ApJ,429,582 Calzetti, D., Armus, L., Bohlin, R. C., et al. 2000, ApJ, 533, 682 Charlot,S.,&Fall,S.M. 2000, ApJ,539,718 Cibinel, A., Carollo, C. M., Lilly, S. J., et al. 2013, ApJ, 777, 116 Cimatti, A., Mignoli, M., Daddi, E., et al. 2002, A&A, 392, 395 Cimatti, A., Daddi, E., Renzini, A., et al. 2004, Nature, 430, 184 Connolly, A. J., Szalay, A. S., Bershady, M. A., Kinney, A. L.,& Calzetti, D. 1995, AJ, 110, 1071 Conselice, C. J., Bluck, A.F. L., Ravindranath, S., et al. 2011, MNRAS, 417, 2770 D’Abrusco,R.,Fabbiano,G.,Djorgovski,G.,etal. 2012, ApJ,755,92 Daddi, E., Cimatti, A., Renzini, A., et al. 2004, ApJ, 617, 746 Davidzon, I., Bolzonella, M., Coupon, J., et al. 2013, A&A, 558, A23 Davidzon, I., Cucciati, O., Bolzonella, M., et al. 2016, A&A, 586, A23 Deng, X.-F. 2010, ApJ, 721, 809 de Souza, R. S., Dantas, M. L. L., Costa-Duarte, M.V., et al. 2017, MNRAS, 472, 2808 deVaucouleurs,G.1959, HandbuchderPhysik,53,275 deVaucouleurs, G., deVaucouleurs, A., Corwin, Jr., H. G., et al. 1991, Third Reference Catalogueof Bright Galaxies (NewYork: Springer) Driver,S.P.,Allen,P.D., Graham,A.W.,etal.2006, MNRAS,368,414 Fraix-Burnet, D., Thuillard, M.,& Chattopadhyay, A. K. 2015, Front. Astron. Space Sci.,2,3 Franzetti,P., Scodeggio, M., Garilli, B., et al. 2007, A&A, 465, 711 Fritz, A., Scodeggio, M., Ilbert, O., et al. 2014, A&A, 563, A92 Fukunaga,K. 1990, Introductionto StatisticalPattern Recognition,2ndEd. (San Diego, CA, USA: Academic Press Professional, Inc.) Gallazzi,A.,Bell,E.F., Zibetti,S., Brinchmann,J.,&Kelson,D.D.2014, ApJ, 788, 72 Garilli, B., Guzzo, L., Scodeggio, M., et al. 2014, A&A, 562, A23 Glazebrook,K., Abraham,R.G., McCarthy,P.J.,etal.2004, Nature,430,181 Goranova,Y., Hudelot,P., Contini,T., et al. 2009, The CFHTLS T0006 Release, http://terapix.iap.fr/cplt/table_syn_T0006.html
Guzzo, L., Scodeggio, M., Garilli, B., et al. 2014, A&A, 566, A108 Haines,C.P.,Iovino,A., Krywult,J.,etal.2017, A&A,605,A4 Ho,L.C.,&Keto,E. 2007, ApJ,658,314 Hoaglin,D.C., Mosteller,F.,&Tukey,J.W.1983,in Understandingrobustand exploratory data anlysis,eds.D.C. Hoaglin,F. Mosteller,&J.W.Tukey(New York:Wiley) Hubble,E.P. 1926, ApJ,64,321 Hubble,E.P.1936,in RealmoftheNebulae (NewHaven:YaleUniversity Press), 288 Ilbert, O., Arnouts, S., McCracken, H. J., et al. 2006, A&A, 457, 841 Jarvis,M.J., Bonfeld,D.G.,Bruce,V.A.,etal.2013, MNRAS,428,1281 Karhunen,K.1947, Ann.Acad.Sci. Fennicae:Ser.Al. Math.-Phys.,37,1 Kartaltepe,J.S., Mozena,M.,Kocevski,D.,etal.2015, ApJ,221,11 Kauffmann,G., Heckman,T.M.,White,S.D.M.,etal.2003, MNRAS,341,33 Kennicutt, Jr., R. C. 1992,ApJ, 79, 255 Kennicutt, Jr., R. C. 1998,ApJ, 498, 541 Kinney, A. L., Calzetti, D., Bohlin, R. C., et al. 1996, ApJ, 467, 38 Kormendy,J.,&Kennicutt,J.,R.C.2004,ARA&A,42,603 Krakowski,T., Ma ek, K., Bilicki, M., et al. 2016, A&A, 596, A39 Krywult,J.,Tasca,L.A.M., Pollo,A.,etal.2017, A&A,598,A120 Kurcz, A., Bilicki, M., Solarz, A., et al. 2016,A&A, 592, A25 Lamareille,F. 2010, A&A, 509, A53 Lange,R.,Driver,S.P., Robotham,A.S.G.,etal.2015, MNRAS,447,2603 Le Fèvre, O., Saisse, M., Mancini, D., et al. 2003, SPIE Conf. Ser., 4841, 1670 Lintott, C. J., Schawinski, K., Slosar, A., et al. 2008, MNRAS, 389, 1179 Lintott, C., Schawinski, K., Bamford, S., et al. 2011, MNRAS, 410, 166 LoFaro, B., Buat,V., Roehlly,Y., et al. 2017, MNRAS, 472, 1372 Madau,P.,&Dickinson,M.2014, ARA&A,52,415 Marchetti, A., Granett, B. R., Guzzo, L., et al. 2013, MNRAS, 428, 1424 Marchetti, A., Garilli, B., Granett, B. R., et al. 2017, A&A, 600, A54 Martin,D.C.,Wyder,T.K., Schiminovich,D.,etal.2007, ApJ,173,342 Mellier, Y., Bertin, E., Hudelot, P., et al. 2008, The CFHTLS T0005 Release, http://terapix.iap.fr/cplt/oldSite/Descart/CFHTLS
T0005-Release.pdf
Mignoli, M., Zamorani, G., Scodeggio, M., et al. 2009, A&A, 493, 39 Moresco, M., Pozzetti, L., Cimatti, A., et al. 2013, A&A, 558, A61 Moutard,T., Arnouts, S., Ilbert, O., et al. 2016a, A&A, 590, A102 Moutard,T., Arnouts, S., Ilbert, O., et al. 2016b, A&A, 590, A103 Noeske,K.G.,Weiner,B.J.,Faber,S.M.,etal. 2007, ApJ,660,L43 Pandya, V., Brennan, R., Somerville, R. S., et al. 2017, MNRAS, 472, 2054 Pannella, M., Gabasch, A., Goranova,Y., et al. 2009,ApJ, 701, 787 Patel,S.G., Holden,B.P.,Kelson,D.D.,etal.2012,ApJ,748,L27 Peng,Y.-j., Lilly,S.J.,Kovaˇ c, K., et al. 2010, ApJ, 721, 193 Renzini, A. 2006, ARA&A, 44, 141 A70, page 17 of 25 Roberts,M.S.,&Haynes,M.P.1994, ARA&A,32,115 Rodighiero, G., Daddi, E., Baronchelli, I., et al. 2011, ApJ, 739, L40 Salim,S.2014, Serbian Astron.J.,189,1 Salman, R., Kecman, V., Li, Q., Strack, R., & Test, E. 2011, Int. J. Comput. Networks Commun. (IJCNC),3,4 Salmon,B.,Papovich,C., Finkelstein,S.L.,etal. 2015, ApJ,799,183 Sánchez Almeida,J.,&Allende Prieto,C.2013, ApJ,763,50 Sánchez Almeida,J., Aguerri,J.A.L., Muz-Tun,C.,&deVicente,A. 2010, ApJ, 714, 487 Sandage, A. 1961, The Hubble Atlas of Galaxies (Washington: Carnegie Institution) Sandage, A., Sandage, M., & Kristian, J. 1975, Galaxies and the Universe (Chicago University Press) Schawinski, K., Urry, C. M., Simmons, B. D., et al. 2014, MNRAS, 440, 889 Schiminovich,D.,Wyder,T.K., Martin,D.C.,etal.2007, ApJ,173,315 Schwarz, G. 1978, The Annals of Statistics, 6, 461 Scodeggio, M., Guzzo, L., Garilli, B., et al. 2018, A&A, 609, A84 Sérsic, J. L. 1963, Boletín de la Asociaci Argentina de Astronomía La Plata Argentina, 6, 41 Simard, L., Mendel, J.T.,Patton, D. R., Ellison, S. L.,&McConnachie, A.W. 2011, ApJ, 196, 11 Siudek, M., Ma ek, K., Scodeggio, M., et al. 2017, A&A, 597, A107 Siudek, M., Ma ek, K., Pollo, A., et al. 2018. A&A, submitted, [arXiv:1805.09905] Speagle,J.S., Steinhardt,C.L.,Capak,P.L.,&Silverman,J.D.2014, ApJ,214, 15 Strateva, I.,Ivezi ´c, Ž., Knapp, G. R., et al. 2001, AJ, 122, 1861 Takeuchi,T.T. 2000,Ap&SS, 271, 213 Taylor, E. N., Hopkins, A. M., Baldry, I. K., et al. 2015,MNRAS, 446, 2144 vandenBergh,S.1998,Galaxy Morphologyand Classifcation (Cambridge,NY: Cambridge University Press) van Dokkum,P.G., Nelson,E.J.,Franx,M.,etal.2015, ApJ,813,23 Vergani, D., Scodeggio, M., Pozzetti, L., et al. 2008,A&A, 487, 89 Vergani, D., Garilli, B., Polletta, M., et al. 2017, A&A, submitted [arXiv:1712.08168] Whitaker,K.E.,Labbé,I.,van Dokkum,P.G.,etal.2011, ApJ,735,86 Whitaker,K.E.,vanDokkum,P.G., Brammer,G.,&Franx,M.2012, ApJ,754, L29 Wild,V., Almaini, O., Cirasuolo, M., et al. 2014,MNRAS, 440, 1880 Williams,R.J., Quadri,R.F., Franx,M.,van Dokkum,P.,& Labbé,I. 2009, ApJ, 691, 1879 Worthey,G.,&Ottaviani,D.L. 1997,ApJ,111,377 Worthey,G.,Faber,S.M., Gonzalez,J.J.,&Burstein,D.1994,ApJ,94,687 1 Center for Theoretical Physics, Al. Lotnikow 32/46, 02-668 Warsaw, Poland e-mail: gsiudek@cft.edu.pl 2 National Centre for Nuclear Research, ul. Hoza 69, 00-681 Warszawa, Poland 3 Astronomical Observatory of the Jagiellonian University, Orla 171, 30-001 Cracow, Poland 4 INAF – Osservatorio Astronomico di Brera, Via Brera 28, 20122 Milano – via E. Bianchi 46, 23807 Merate, Italy 5 INAF – Istitutodi Astrofsica SpazialeeFisica Cosmica Milano,via Bassini 15, 20133 Milano, Italy 6 Departmentof Astronomy&Physics, Saint Mary’s University, 923 Robie Street, Halifax, Nova Scotia B3H 3C3, Canada 7 Aix-Marseille Université, CNRS, LAM, Laboratoire d’Astrophysique de Marseille, Marseille, France 8 INAF -Osservatorio di Astrofsica e Scienza dello Spazio di Bologna, via Gobetti 93/3, 40129 Bologna, Italy 9 Università degli Studi di Milano, via G. Celoria 16, 20133 Milano, Italy 10 INAF -Osservatorio Astrofsico di Torino, 10025 Pino Torinese, Italy 11 Laboratoire Lagrange, UMR7293, Université de Nice Sophia An-tipolis, CNRS, Observatoire de la Ce d’Azur, 06300 Nice, France 12 Dipartimento di Fisica e Astronomia -Alma Mater Studiorum Università di Bologna, via Gobetti 93/2, 40129 Bologna, Italy 13 InstituteofPhysics,JanKochanowskiUniversity,ul. Swietokrzyska 15, 25-406 Kielce, Poland 14 INFN, Sezione di Bologna, viale Berti Pichat6/2, 40127 Bologna, Italy 15 IRAP, UniversitédeToulouse, CNRS, UPS,Toulouse, France 16 IRAP,9 av.du colonel Roche,BP 44346, 31028Toulouse Cedex4, France 17 School of Physics and Astronomy,University of St Andrews, St An-drews KY16 9SS, UK 18 INAF – Istituto di Radioastronomia, via Gobetti 101, 40129 Bologna, Italy 19 Canada–France–HawaiiTelescope, 65–1238 Mamalahoa Highway, Kamuela, HI 96743, USA 20 Aix-Marseille Univ., Univ.Toulon CNRS, CPT, Marseille, France 21 Dipartimento di Matematica e Fisica, Università degli Studi Roma Tre,via dellaVascaNavale84, 00146Roma,Italy 22 INFN, Sezione di Roma Tre, via della Vasca Navale 84, 00146 Roma, Italy 23 INAF -Osservatorio Astronomico di Roma, via Frascati 33, 00040 Monte Porzio Catone (RM), Italy 24 Department of Astronomy, University of Geneva, Ch. d’Ecogia 16, 1290Versoix, Switzerland 25 INAF -Osservatorio Astronomico diTrieste, via G. B.Tiepolo 11, 34143Trieste, Italy 26 Division ofParticle and Astrophysical Science, Nagoya University, Furo-cho, Chikusa-ku, 464-8602 Nagoya, Japan A70, page 18 of 25