A&A 592, A25 (2016) DOI: 10.1051/0004-6361/201628142 Astronomy & cESO 2016 Astrophysics Towards automatic classifcation of all WISE sources A.Kurcz1, 2, M. Bilicki3, 2, 4, A. Solarz5, 2, M. Krupa1, 2, A. Pollo1, 5, 2, and K. Ma ek5, 2 1 Astronomical Observatory of the Jagiellonian University, ul.Orla 171, 30-244 Cracow, Poland e-mail: kurcz.agnieszka@gmail.com 2 Janusz Gil Institute of Astronomy, University of Zielona Ga, ul. Szafrana 2, 65-516 Zielona Ga, Poland 3 Leiden Observatory, Leiden University, Niels Bohrweg2, 2333 CA Leiden, The Netherlands 4 Astrophysics, Cosmology and Gravity Centre, Departmentof Astronomy, Universityof CapeTown, Rondebosch, South Africa 5 National Centre for Nuclear Research, ul.Ho˙za 69, 00-681Warszawa, Poland Received 15 January 2016 / Accepted 11 April 2016 ABSTRACT Context. TheWide-feld InfraredSurveyExplorer(WISE)has detected hundredsof millionsof sourcesoverthe entiresky. Classifying them reliably is, however, a challenging task owing to degeneracies in WISE multicolour space and low levels of detection in its two longest-wavelength bandpasses. Simple colour cuts are often not sufficient; for satisfactory levels of completeness and purity, more sophisticated classifcation methods are needed. Aims. Here we aim to obtain comprehensive and reliable star,galaxy, and quasar catalogues based on automatic source classifcation in full-skyWISE data. This means that the fnal classifcation will employonly parametersavailable from WISE itself, in particular those which are reliably measured for the majority of sources. Methods. For the automatic classifcation we applied a supervised machine learning algorithm, support vector machines (SVM). It requires a training sample with relevant classes already identifed, and we chose to use the SDSS spectroscopic dataset (DR10) for that purpose.We testedthe performanceoftwokernels usedbythe classifer,and determinedthe minimum numberof sources in the training set required to achieve stable classifcation, as well as the minimum dimension of the parameter space.We also tested SVM classifcation accuracy as a function of extinction and apparent magnitude. Thus, the calibrated classifer was fnally applied to all-skyWISEdata, fux-limitedto16mag(Vega)inthe3.4 µm channel. Results. By calibrating on the test data drawn from SDSS, we frst established thata polynomialkernelis preferredovera radial one for this particular dataset. Next, using three classifcation parameters(W1magnitude,W1− W2colour, and a differential aperture magnitude) we obtainedvery good classifcationefficiencyin all the tests.At the bright end, the completeness for stars andgalaxies reaches ∼95%, deteriorating to ∼80% at W1 = 16 mag, while for quasars it stays at a level of ∼95% independently of magnitude. Similar numbers are obtained for purity. Application of the classifer to full-sky WISE data and appropriate a posteriori cleaning allowed usto obtain cataloguesof star andgalaxy candidates that appear reliable.However, the sources faggedby the classifer as “quasars”areinfact dominatedbydustygalaxies;theyalsoexhibit contaminationfrom sources locatedmainlyatlow ecliptic latitudes, consistent with solar system objects. Keywords. methods: data analysis – methods: statistical – astronomical databases: miscellaneous – catalogs – infrared: general – surveys 1. Introduction types of sources are present in the WISE database (within its sensitivity limits),but reliablyextractingtheminlarge numbers The Wide-feld Infrared Survey Explorer (WISE, Wright et al. is challenging. Briefy, the WISE data release does not provide 2010) is a space-borne telescope that has scanned the entire separate catalogues of different objects. What is more, at present sky in four infrared (IR) bands (3.4–23 µm) and has delivthere is no separate point-and extended-source catalogues ex-ered one of the largest catalogues of astronomical objects to tracted from this survey, although efforts towards the latter are date. It has detected almost 750 million sources, which are underway(Cluver et al. 2014). compiled in the publicly released AllWISE Source Catalogue (Cutri et al. 2013). WISE provides at present the most compre-There are several reasons for the lack of comprehensive ob-hensive census of the entire sky in the IR, and offers large ad-ject identifcation in WISE. Firstly, this surveyis mostly a nearvancement in comparison to earlier all-skyIR surveys, such as IR selected one. Practically all the sources listed in the WISE IRAS(Neugebaueretal.1984), 2MASS(Skrutskieetal.2006), cataloguehave S/N > 2detections in theW1band (3.4µm), and or AKARI(Murakamietal.2007). 83%ofthemhave W2(4.6µm)measured with this accuracy; the Such a vast amount of data, which gives access to all-sky detection rates in W3(12µm) and W4(23µm) are much lower1. The light emitted at 3.4 and 4.6 µm comes mainly from the pho information in unprecedented volumes, has found multiple as-tospheres of evolved stars. This means that in the low-redshift tronomical applications starting from our closest neighbourhood (near-Earth objects, nearby stars, and brown dwarfs), through the galaxies in the local volume, and up to the largest possible dis-1 http://wise2.ipac.caltech.edu/docs/release/allwise/
tances of high-redshift quasars(Wright et al. 2010). All these expsup/sec2_1.html#stats
Article publishedby EDP Sciences A25, page1of 18 Universe, where a large part of the extragalactic WISE sources are located, the W1− W2colourofgalaxies willbevery similar to that of Galactic stars and cannot in general provide a com-prehensive criterion for distinguishing one from the other. This is readily seen in WISE colour–colour diagrams such as those providedby Jarrettetal. (2011). These diagrams alsoshow that even adding the W3 band, which traces dust such as silicates and PAHs, is not sufficient to unambiguously distinguish stars from elliptical galaxies. Last but not least, such colour-colour plots assume negligible photometric errors, while in reality signifcant scatter will occur, as is the case for the W3and W4 bands, for which most of the WISE sources have only upper limits or are not even detected. Anotherwayto separategalaxies from starsin WISE could be identifying extended sources in the sample. Similarly to the case of the 2MASS Extended Source Catalogue (XSC, Jarrett et al. 2000), such sources in WISE are expected to be mainly extragalactic, especially at sufficiently high Galactic lat-itudes. However, despite much better sensitivity of WISE with respect to 2MASS (e.g. 0.054 mJy in W1 vs. 2.7 mJy in Ks) and the lack of atmospheric nuisances in the former, lower an-gular resolution of WISE and higher background levels mean that the eventual all-sky WISE XSC is expected to contain a similar number of sources to the 2MASS catalogue(∼30 per square degree; Jarrett et al. 2016). This will be a very small percentageof all WISEgalaxies: Cluveretal. (2014)showed that only 3–4% of WISE sources matched with the Galaxy And Mass Assembly (GAMA, Driver et al. 2011)survey are re-solved in W1, while the WISE×GAMA sample itself is already much shallower thanexpected from the full WISEgalaxy catalogue(Bilickietal. 2016;Jarrettetal. 2016).One possibleav
enue leadingtoa WISE-based all-skycatalogueofgalaxiesofa similar depth to those in GAMA is to cross-match WISE sources with SuperCOSMOS data(Hambly et al. 2001), as was frst dis-cussed by Bilicki et al. (2014). Such a catalogue has been re-cently compiled(Bilicki et al. 2016;Krakowski et al., in prep.), butit includesonlyapartoftheWISEgalaxiesowingto limita-tions of the SuperCOSMOS scans of photographic plates (both in depth and in the colour space). Until now, most of the studies dealing with WISE source classifcation were based on cross-matching this catalogue with other samples and using multiband magnitudes and colours as discriminants. Stern et al. (2012), who paired up WISE with COSMOS, have proposed using W1− W2 ≥ 0.8mag (Vega) to identify WISE activegalactic nuclei(AGNs). Assef et al. (2013) have extended this work to a larger and deeper NOAO Deep Wide-Field Survey Boes feld and showed that this criterion is no longer optimal at fainter magnitudes. A more comprehensive efforthas been undertakenby Yanetal. (2013), where WISE sources were cross-matched with SDSS to derive colour cuts for object selection. The Stern et al. (2012)AGN identifca
tion has been confrmed and WISE colours (especially W1− W2 vs. W2 − W3) have been shown to be sufficient to separate star-forminggalaxies fromAGNs and stars from somegalaxies, althoughthiswasnotthe casefor early-type,low-redshiftgalaxies, which occupy practically the same region in the W1–W2– W3 colour-colour space as stars. More recently, Ferraro et al. (2015)have defned their own colour cuts to identifygalaxies and quasars from the WISE database; however,this left position-dependent contamination visible in all-sky maps. Nikutta et al. (2014)haveexploredWISE coloursof Galacticandothernearby sources, while Mateosetal. (2012)haveusedtheXMM-Newton survey to defne a WISE colour-based selection of luminous AGN. The latter criterion has recently been applied to the all-sky WISEdataby Secrestetal. (2015)to selectasampleof1.4mil
lionAGN candidates.We note, however, that their criterion of w1, 2, 3snr ≥ 5(signal-to-noise ratios inW1, W2andW3)eliminates about 95% of AllWISE sources from the parent catalogue, mostly owing to a very low level of WISE detection in the W3 channel. Lastbut not least, Jarrettetal. (2016)have identifed various source types in the G12 equatorial feld by calibrating WISE magnitude and colour cuts on GAMA and SDSS spectroscopic data. Some other WISE source classifcation studies where ad-ditional colours from external surveys were used include Edelson&Malkan (2012) and Wuetal. (2012) for QSOs/ AGNs,Tu&Wang(2013)for asymptotic giant branch stars, and Kovács&Szapudi (2015)for general star-galaxy separation in a WISE – 2MASS PSC cross-match. Finally, Anderson et al. (2014) compiled a catalogue of Galactic H II regions from WISE, based on their mid-IR morphology. In the present paper, we go beyond simple colour and mag-nitude cuts and explore a more sophisticated classifcation of WISE sources. Our approach is based on automatised proce-dures of machine learning, and we use a specifc algorithm – the support vector machines (SVM) – which has proven its ap-titude for similar tasks within AKARI(Solarz et al. 2012)and VIPERS(Ma eketal. 2013)surveys.A similar idea wasexploredin the independent studyby Kovács&Szapudi (2015) where multiband photometry of those WISE sources that had cross-matches with the 2MASS Point Source Catalogue (PSC) was used for SVM-based source classifcation. Our analysis is more general, as we do not limit the fnal source selection to a cross-match with an external catalogue. In addition, for the classifcation we use only the two shortest WISE bands in order to retain the highest all-skycompleteness possible. By training the classifer on WISE data cross-matched with the tenth spectroscopic release of the Sloan Digital SkySurvey(SDSS, Ahn et al. 2014), we make the frst step towardsbuilding reliable and com-prehensiveWISE cataloguesofstars,galaxies,andAGNs/QSOs. At present,this classifcationis limitedbythe spectroscopicdata thatwe usedfor training,butthe methodology canbeextended with forthcoming data from different star,galaxy,and quasar catalogues. Solar system bodies should probably be included, as we fnd hints of them contaminating especially our fnal quasar candidate dataset. The paper is organised as follows. Section 2 describes the data used in our analysis: the photometric sample extracted from WISE (Sect. 2.1)and the spectroscopic sample from SDSS (Sect. 2.2)used by the classifcation algorithm. In Sect. 3, we present the support vector machines classifer and how to ap-ply it to imbalanced datasets (Sect. 3.1)such as ours. Various tests of the SVM method on WISE data are shown in Sect. 4. Section5presentsthe applicationoftheSVM classifertoa full-skysampledrawn from WISE data.We summarise ourworkin Sect.6. All the WISE magnitudes in this paper will be given in the Vega system. For transformation to AB see Jarrett et al. (2011). 2. Data selection 2.1. WISE The WISE is a Medium Class Explorer mission funded by NASAand launchedin December 2009.Withthe useofa40cm space-based telescope, WISE has mapped the whole sky (with A25, page2of 18 A.Kurczetal.:Towards automatic classifcationofallWISE sources a total 47 × 47 arcmin feld of view) in four infrared bands W1− W4, centred respectively at 3.4, 4.6, 12, and 23 µm2, with an angular resolution of 6.100,6.400,6.500, and 12.000, respectively. Its 5σ point source sensitivities exceeded 0.054, 0.071, 0.73,and5mJyinthefour respectivebands;inthe W3channel, for instance, thisis more thana hundred times better than thatof IRAS at similar wavelengths. The publicly available AllWISE catalogue contains positional, photometric, quality, and reliability information and motion ft parameters for over 747 million sources(Cutri et al. 2013). The goal of the present study is to obtain a comprehensive source classifcation for as many WISE objects as possi-ble, and so for object selection we decided to rely uniquely on the parameters provided in the WISE database as there is no other all-sky survey of comparable depth currently available. The basic dataset we employ is the “AllWISE” data release3 (Cutri et al. 2013), which was made publicly available in 2013 and combines data from the WISE cryogenic and NEOWISE (Mainzer et al. 2014)post-cryogenic surveystages. This dataset offers enhanced photometry and astrometry in comparison to the earlier WISE “All-Sky” release and includes estimates of source apparent motions. We wanted to have the greatest skycoverage and depth possible, and we thus chose to be fexible in the preliminary source selection for our catalogue. Regarding the photometry, we use only the two shortest WISE bands, W1andW2, and we employ additional quality parameters to ensure reliability of the sources. Our catalogue includes the sources that match the following cri-teria in the WISE database: w1snr ≥ 5; w2snr ≥ 2; w?sat ≤ 0.1 (no more than 10% of saturated pixels in the respective bands, where ? stands for1or2); cc_flags[?] , ‘DPHO’ (no severe artefacts). These criteria ensure that the sources are detected in the two bands with reliable photometry (most of the objects have S/Nmuch higher than the limits used for the preselection; see below).Thereareover606 millionsuch sourcesinWISE;however, a large number of these are concentrated in the Galactic Plane, where the WISE data suffer from severe blending and saturation due to the enhanced source density. Our ability to classify data at low Galactic latitudes is thus very much compromised, and practically impossible within the Galactic Plane and Bulge. An important caveat here is that the AllWISE catalogue is not complete at thevery bright end(W1 < 8 mag and W2 < 7mag) owing to the saturation of such sources and also in two strips at ecliptic longitudes of 45◦ <λ< 55◦ and 231◦ <λ< 239◦. Both these issues are related to instrumental limitations4, while the survey strategy causes additional patterns related to Moon avoidance manoeuvres5. This will be refected in the all-skymaps that we are producing. Here we are not able to comprehensively classify all the WISE sources preselected as discussed above owing to the limitations brought about by the training set from the spectroscopic SDSS DR10 that we use. Namely, that dataset cross-matched with WISE practically does not provide galaxies fainter than 2 Recalibration of the W4effectivewavelength from22 µmwas carried outby Brown et al. (2014a). 3 Available from theNASA/IPACInfrared Science Archive at http://irsa.ipac.caltech.edu/. 4 For details see http://wise2.ipac.caltech.edu/docs/
release/allwise/expsup/sec2_2.html#cat_phot
and http://wise2.ipac.caltech.edu/docs/release/allwise/
expsup/sec2_2.html#w1sat. 5 http://wise2.ipac.caltech.edu/docs/release/allwise/
expsup/sec1_2.html#survey
Fig.
1.
Aitoff projection in Galactic coordinates of 314 million sources in the AllWISE catalogue, fux-limited to W1< 16 mag. VegaW1< 16mag6.Thisisone magnitude brighterthantheav
erage all-skyphotometric completeness of WISE, so the present analysis will need to be extended once deeper training data be-come available. This should be possible in the coming years thanks to the plethora of spectroscopic surveys currently under-way. The all-skysample of preselected W1 < 16 mag AllWISE sources includes 314 million sources and is illustrated in the Aitoff projectioninFig. 1.We notethelogarithmic scalingof the counts and one order of magnitude larger source density in the GalacticPlanethanathigh latitudes.Wealsonotethatatthis fux limit most of our sources have very reliable photometry, es-pecially in the W1channel. The median signal-to-noise ratios in W1andW2are respectively 31.3 and 16, and more than 99% of the sourceshave magnitude errors smaller than0.08 mag for W1 and0.28 mag for W2. In the machine learning procedure of source classifcation described later in the text we use the following parameters pro-vided by WISE: 1. magnitude w1mpro measured with profle-ftting photometry in the W1band (hereafterW1); 2. colour W1− W2defned as the difference in the w1mpro and w2mpro (hereafter W2) profle-ftting magnitudes; 3. a concentration parameter defned as the difference of two circular aperture magnitudes in the W1channel,w1mag_1 − w1mag_3, measured respectively in radii5.5” and 11” cen-tred on the source; we note that these apertures were fxed, independent of the actual size or shape of the sources, and were not corrected for contamination or bad pixels, thus they cannot be used on their own as reliable measurements of fuxes for resolved sources. As already mentioned, measurements in the W1band in our fux-limited sample have typically very high signal-to-noise ratios; the other two parameters used for the classifcation are somewhat noisier. The error in the W1−W2colourismostlydrivenby the less accurate W2channel, and respectively 90% (99%) of the sources have δ(W1− W2) < 0.16 mag(<0.29 mag).For the con-centration parameter, the same percentiles are δ(w1mag_1 − w1mag_3) < 0.17 mag(<0.41 mag). We also tested the usefulness of apparent motions for source classifcation. These motions, as provided in the AllWISE database(pmra and pmdec), are composed of source proper motions and those due to the parallax, and are expected to be different for various source types. We note, however, different caveats related to their measurements by WISE, discussed by 6 The situation has not improved in the fnal SDSS-III Data Release 12 (Alam et al. 2015). A25, page3of 18 Kirkpatrick et al. (2014)and in the data description7. The accu
racyand signal-to-noise of AllWISE proper motions are strongly correlated with the source’s fux, which means they cannot be reliably used for the whole catalogue (about 2% of WISE ob-jects have no motion measurements at all).We consider using them as a proof of concept for future datasets and surveys that will bring much more precise and comprehensive motion mea-surements, such as the MaxWISE proposal(Faherty et al. 2015), Gaia(Perryman et al. 2001)or LSST(Ivezi´c et al. 2008). The fourth parameter used in such a case was 2 4. apparent motion defned as pm = (pmra+pmdec2)1/2,where pmra and pmdec are the apparent motion in right ascension and declination, respectively. All the tests involving the apparent motions were carried out with the imposed condition on their signal-to-noise being larger than 1: pm > sigpm, where sigpm = (sigpmra2 + sigpmdec2)1/2 is the motion accuracy as provided in the database. This condition introduces a selection effect as it re-moves mostlyfaint, point-likesources. This constraintisavoided in the fnal classifcation procedure when proper motions are not used. One can further increase the number of parameters used for machine learning classifcation; however, it should frst be noted that every new parameter considerably extends computation time. In addition, many of the WISE database parameters are available or are sufficiently reliable only for a subset of all the sources. For instance, the two longer WISE bands, W3 and W4, which are often also used for source classifcation(Ferraroetal. 2015;Kovács&Szapudi 2015),have much worse sensitivity than W1 and W2. Most of the WISE sources are not detected at the longer wavelengths or have only upper limits of S/N < 2. In view of classifying a considerable number of WISE objects (selected as discussed above), we have thus de-cided to limit ourselves to the basic information from the W1 and W2 bands. A possible extension employing more WISE parameters would frst require determining which of them are optimal, for instance thorough a principal component analysis (Soumagnac et al. 2015). In addition, similarly to other applications of SVM in astronomy, our study does not take the observational errorsexplicitly into account.For the present sample this shouldbeagood approximation,asthe noiselevelofthe param-eters we use for classifcation is relatively low (or even very low for the W1magnitude).In Sect.6we discusshow this issue can be dealt with in future work by using what is known as fuzzy logic. 2.2.Training sample: WISE × SDSS DR10 Machine learning methods for source classifcation, like the one we employ here, rely on the availability of a training sample that has relevant classes already identifed. Ideally, this dataset should be as typical of the whole sample as possible. At present, however, such samples drawn from WISE are not available, and the only solution is to cross-match the WISE data with an external dataset that has relevant source types listed. Such an auxiliary dataset is provided by the Sloan Digital Sky Survey (SDSS, York et al. 2000), which in its third phase (SDSS III, Eisenstein et al. 2011)comprises several dedicated star,galaxy, and quasar surveys; however, these three classes are available only for the spectroscopic part of the SDSS (the photometric http://wise2.ipac.caltech.edu/docs/release/allwise/
expsup/sec2_6.html
Fig.
2.
Distributions of the observed W1−W2colour for stars,galaxies, and quasars in the cross-matched WISE × SDSS DR10 sample. classes are “stars”, i.e. point-like, and “galaxies”, i.e. resolved). For this reason we chose to use only those SDSS sources that have spectra(Bolton et al. 2012). Here we use the spectroscopic sample in the SDSS Data Release 10 (DR10, Ahn et al. 2014), which includes almost 3.4 million sources, 26% of which are classifedby the SDSS pipeline as stars, 59% aregalaxies, and the remaining 15% are quasars/AGNs (class “QSO” in SDSS). We have cross-matched this sample with the WISE catalogue selected as described above, usinga100 matching radius, which gave us 2.1 million sources (18% stars, 72%galaxies, and 10% QSOs). However, not all of them had SDSS spectra of sufficient quality,so in order to maintain the reliability of the training sample, we fltered the sources according to redshift(velocity) quality,keeping only those with zWarning = 0. Additional visual inspection of redshift and error distributions of WISE × SDSS sources led us to eliminate the following outliers: zErr > 0.001 for stars, zErr > 0.001 or zErr/z > 0.1 for galaxies, and zErr > 0.01 for QSOs. This fltering left us with about 390 000 stars, 1.5 milliongalaxies, and 190000 quasarsin our training sample (i.e. present both in SDSS and in WISE), which reduces furtherto120000 stars,620000galaxies,and55000QSOsif the condition on the apparent motions to have S/N > 1 is ap-plied (Sect. 2.1). Figure2presents the W1−W2colour histogram of our training sources(as observed,i.e. withoutextinction-or k-corrections applied). It clearly shows that while a simple selection in this colour may allow a large fraction of WISE quasars to be identifed (although the W1 − W2 > 0.8 mag cut proposed by Stern et al. 2012 will miss some of them), it is not sufficient to reliably separate stars fromgalaxies.For instance,a constant W1−W2colour cut will not produce samples that are both com-plete and pure at the same time. Maps presented in Ferraro et al. (2015)also indicate that applying fxed colour cuts to WISE data may leaveposition-dependent contamination. In the WISE × Su-perCOSMOS dataset(Bilicki et al. 2016)this issue was partly alleviatedbyvarying the star-galaxy colour separation asa function of distance from the Galactic Centre. Here we move beyond this simple methodology. As already mentioned in Sect. 2.1, using the SDSS as the training sample imposes restrictions on the depth up to which we can classify WISE sources in the present application. As shown in Fig. 3, presenting normalised W1counts for the three source types in the WISE × SDSS cross-match, there are hardly any galaxiesfainter thanW1 = 16 mag, so we are not able to create reliable training samples beyond this magnitude, although both stars andgalaxies are present in the WISE × SDSS sample at considerablyfainter fuxes. The histogram forgalaxies also dis-plays two clear peaks. This shape for thegalaxy counts can be A25, page4of 18 A.Kurczetal.:Towards automatic classifcationofallWISE sources Fig.
3.
Normalised apparent W1magnitude counts forgalaxies, quasars, and stars in the WISE × SDSS DR10 sample. attributed to the combination of two effects: one results from the heterogeneity of the SDSS dataset itself (due to preselections of the Main, CMASS, and BOSS samples) and the other is related to the selection of the WISE × SDSS catalogue, which is also based on the detections in the W1 flter. First, as demonstrated in Figs. 4and5, the SDSS-basedgalaxy selection does not sam
ple all the regions of the redshift-magnitude space equally. In particular, as illustrated in the left panel of Fig. 4which shows the redshift-magnitude diagram for the z band (the longestwavelength measured by SDSS), in the pure SDSS data we observe a clear depletion offaint redgalaxies at z ≤ 0.3in comparison to higher redshifts. This effect persists after adding the W1-based selection of the WISE × SDSS sample, as shown in the right panelof Fig. 4. The enhancementof thiseffect can be attributed to the properties of the W1flter, which at low redshifts probes the part of thegalaxy spectrum where strongPAH features are very prominent. Second,in Fig. 6the convolutionof the W1flter witha spectrumofa typical spiralgalaxyatvarious redshifts (taken from the Brown et al. 2014b library) demonstrates that observing in W1we can expect a selection function which does not change monotonically with redshift. In particular, it has a minimum at z ∼ 0.15, and then rises again until z ∼ 0.28. The combination of these effects results in a relatively complex sampling of galaxies in the redshift – W1 magnitude space in the WISE × SDSS data, as demonstratedin Fig. 5. Finally, we cannot hope for reliable classifcation also at the very bright end. The WISE W1< 8mag orW2< 7mag sources are saturated; in addition, the WISE × SDSS sample does not includegalaxies or quasars brighter than W1 ∼ 9.5mag. These brightest objects are thus removed from our fnal samples, which has a minor infuence on the results because the W1 < 9.5mag WISE sources are concentrated mostly in the Galactic Bulge (i.e. they are stars and blends thereof). We are aware of all these biases, and we would like to note that introducing themisa trade-off if wewant to use the training sample providing star,galaxy,and quasar spectral classifcations. At the depths we are interested in, these spectral classifcations are currently available only from the SDSS. As a fnal caveat, the training sample applied in this study does not include solar system bodies; however, they are present in the WISE database. Our fnal classifcation thus ignores this contamination, which affects mostly the sources fagged as quasars by our classifer. 3. Classifcation method: support vector machines In general, classifcation is a process that uses pattern recognition. A classifer is a function that maps a feature vector of a given object’s characteristics into a discriminant vector containing likelihoods that the objects belong to the different considered classes. Classifcation schemes rely on choosing a feature space where different classes occupy different volumes with minimal overlapping. This approach has been used to de-velop machine learning algorithms – statistical methods which constitutea branchof artifcial intelligenceand are basedon creating and exploiting systems which learn from data. In this work for the task of identifying object types we adopt the support vector machines (SVM) algorithm. This supervised method based onkernel algorithms(Shawe-Taylor&Cristianini 2004)was designed to extract structures from data, and thanks to its excellent ability to deal with multidimensional samples combined with its high accuracy, it has been extensively ap-plied to many diverse astronomical problems. To name a few, SVMs have been used to solve problems like classifying different structures in the interstellar medium (Beaumont et al. 2011), pinpointing active galactic nucleus (AGN) candidates (Cavuoti et al. 2014), or distinguishing different subclasses of specifc spectral type stars(Buetal. 2014). Lastbut not least, and of particular relevance here, SVM has been proven efficient in classifying different objects, such as stars, quasars, andgalaxies (e.g. Saglia et al. 2012;Solarz et al. 2012;Ma ek et al. 2013; Kovács&Szapudi 2015). In what follows, we draw the general outline of the nature of the algorithm; for an in-depth discussion we refer the reader toVapnik(1999), Cristianini&Shawe-Taylor (2000), Hsu et al. (2003). Each training object canbe describedbya numberof quanti-ties, N, which determine its discriminating properties. The SVM regards the values of the quantities as a position of a given ob-ject in an N-dimensional parameter space; in other words, the algorithm maps the feature vector from the input space X to a feature space H using a non-linear function φ : X → H. In the feature space H,the discriminant function, which will determine the boundary, takes the form of n X f (x)= αik(x, x0)+ b. (1) i=1 Here k(x, x0)represents the kernel function, which returns the inner product of the mapped vectors; αi is a linear coefficient; and b is a perpendicular distance called bias, which translates the discriminant function into a given direction. With a substantial amount of feature vectors representing different classes of objects, the algorithm searches for boundaries segregating those classes with the biggest possible distance from each data point (a margin). The objects lying closest to the boundary are called support vectors. In other words, the SVM algorithm searches fora decision boundary B that will maximise a ftness function F X F = M − C ξi(B, M), (2) i where M denotes the margin of the boundary. The number of training examples violating this criterion is given by ξi(B, M). If a position of a point i is found within a distance higher than M from B, then ξi = 0. In the opposite case, ξi will be equal to the distance that point i should be shifted so that the condition is satisfed.To seta trade-off between the large margins M and misclassifcations ξi, an adjustable cost parameter is used (more details in Beaumont et al. 2011). Using kernel functions allows a shift into a higher dimensional parameter space, where the data are usually much more simply separated than in lower dimensional space. There are A25, page5of 18 Fig.
4.
Redshift – apparent magnitude diagrams for the SDSS z band: SDSS-onlygalaxies(left panel)andgalaxies from the WISE× SDSS cross-match(right panel). Fig.
6.
Convolution of the W1flter with a template spectrum of a typi-cal spiralgalaxy asa functionof redshift. manypossiblekernel functions that can be used (such as polynomials, exponential radial basis functions, or multilayer per-ceptrons; Cristianini&Shawe-Taylor 2000). The choice of the properkernel suitable foragiven problemis crucial; the usual procedure is to try several, beginning from the simplest cases (to avoid overftting and to save on parameter tuning time) and then to move towards more complex ones in order togain accuracy. For this particular dataset we tested twokernel functions to ob-tain the most reliable classifcation outcome: the Gaussian radial basis function (GRB) and the polynomial function. The GRB is givenby k(x, x0)= exp(−γ||x − x0||2), (3) where x and x0 represent feature vectors in the input space, || · || denotes the Euclidean distance, and γ is the adjustable kernel width parameter, which is responsible for the curvature of the decision surface. The polynomialkernelis defned as k(x, x0)= (γ(x · x0)+ c0)d , (4) where x and x’ represent featurevectorsinthe input space, x · x0 is their inner product, d stands for the degree of the polynomial function, and c0 is a constant coefficient. Therefore, for the GRBkernel there are two adjustable pa-rameters, γ and C, which will determine the separation bound-ary and complete the training of the SVM classifer. In the case of the polynomialkernel, the number of adjustable parameters increases to four: in addition to γ and C, the degree d and the coefficient c0 have to be known. Then, after the most efficient kernel function is chosen, a classifcation of the new data points depends on their position relative to the boundary: SVM will assign a type to unknown objects based on which side of the separationhyperplane theyfall. Furthermore, instead of assigning discrete class labels, it is possible to determine the class probability for a given object. In the caseofbinarySVMthiscanbedonebyimplementing Platt’s a posteriori probabilities(Platt 1999;Lin et al. 2007): once the decisionvalues f of the SVM classifers are computed,asigmoid function −1 Af +B P(i|i or j, x)= 1+ e(5) is ftted (where i and j represent two classes). Then, A and B are estimatedby minimisingthenegative log-likelihood function.In order to extend the probabilities of classes to a three-class problem, all class probabilities from output of binary classifers are combined(FanWu et al. 2003). The probabilities calculated this way are used in the fnal classifcation to eliminate sources that have low probabilities of belonging to any class (meaning that each class has p < 0.5, and in some cases the three probabilities are p ∼ 0.33). Support vector machines are currently available in a va-riety of software packages; the most widely used is libsvm (Chang&Lin 2011)8, which provides robust implementationof 8 http://www.csie.ntu.edu.tw//~cjlin/libsvm/
A25, page6of 18 A.Kurczetal.:Towards automatic classifcationofallWISE sources Table 1. Numbers of objects before and after oversampling in the bins for which oversampling was applied. Magnitude limit W1< 14 W1< 15 Extinction I100 h0; 1) h1; 2) h2; 3) h3; 10) h3; 10) Oversampling Number of galaxies Number of stars Number of QSOs before after 36801 36801 6113 6113 2161 29598 before after 54417 54417 10612 10612 2556 43998 before after 22916 22916 4757 4757 1141 18398 before after 10732 10732 6409 6409 525 8798 before after 33720 33720 13822 13822 1498 27198 σW [mag] σpm [mas/yr] 0.025 84 0.025 98 0.026 99 0.026 113 0.028 165 Notes. For a sample with W1 < 14 we applied oversampling in all the extinction bins, while for W1 < 15 only for extinction in the range I100 ∈h3;10). Oversampling was implemented for quasars only. Parameters σW and σpm denote specifc σ values of Gaussian distributions, respectively for magnitudes and proper motions. SVM for both classifcation and regression. In this work we use the R (RDevelopment CoreTeam 2005)9 implementationof SVM included in the e1071 package(Dimitriadou et al. 2005), which provides an interface to libsvm. 3.1. SVM on imbalanced datasets: oversampling Our training dataset is characterised by low numbers of bright objects in the QSO sample. It was then necessary to address the problem of the accuracy decreasing as a result of this imbalance. This feature is common in many classifcation schemes, especially those which aim for the maximisation of accuracy, like SVMs(Akbani et al. 2004). If not accounted for properly, this can result in making a simple decision that is the basis of the maintenance of the highest success rate: assigning the most common class to the test objects. There are two ways of ad-dressing this problem: it can be solved eiher through rebalancing the dataset or by altering the algorithm itself. The frst solution works at the level of manipulating the data, where the underrepresented population(s) can be oversampled, or the dominant class can be undersampled. In the latter case, when reducing the number of objects contained within the majority class, distributional assumptions on the data must be made: some crucial in-formation may be lost or additional noise may be introduced. On the other hand, changing the algorithm mainly relies on costsensitive learning where a higher penalty is assigned to the mis-classifcations, resulting in a shift of the classifers towards the minority class, which improves the detection accuracy. Since SVM decision making relies solely on the supportvec-tors, it works well against any noise in the data and any light imbalance. Therefore, if the distribution of the training sets is veryskewed,the numberof supportvectorsinthe majority class overweighs the ones from the minority class. In the case of the WISE data we decided to perform oversampling of the underrepresented class of QSO, i.e. additional artifcial objects were created. The number of missing objects(Xmissing)needed to be added to QSO trainingsamples was calculated using the equation(Ma ek et al. 2013) dXmissing e10 = NG × 0.8− X, (6) where de10 stands for rounding the value up to the nearest ten, X corresponds to the number of original QSOs in the sample, and NG is the number ofgalaxies. This strategy provides fully balanced training samples, which are essential for building an effective classifer. http://www.R-project.org
Fig.
7.
Representativecolour–magnitude diagram before and afterover-sampling for quasars. We created mock samples of missing objects by slight changes in the real parameters. In the frst step a real QSO was randomly chosen, and then its parameters were reassigned by shifting the real ones by an amount drawn from a Gaussian dis-tribution with specifc standard deviations σ. Different values of σ were used, depending on the type of parameter: σpm for proper motions and σW for magnitudes. Theywere calculated as the median values of the proper motion and magnitude uncertainties, respectively. Thesevalues aregiveninTable 1, which lists the cases in which the oversampling was applied, providing numbersof objects beforeand aftertheoversampling(see Sect. 4 for detailsonthe magnitudeandextinctiondivisions). Figure 7 presents an example of a colour-magnitude diagram for quasars before and after oversampling. The distribution of objects after the oversampling closely mimics the real one, as designed. 4. Calibrating the SVMclassiferfor the WISE data In this section we present various tests of the SVM algorithm performed on the WISE × SDSS training data toverify and optimise the performance of the classifer. In particular,we tested the algorithm’s efficiency as a function of the following: i) choice of the kernel; ii) number of sources in the training samples; iii) number of parameters used for the classifcation;iv) Galactic extinction; v) limiting magnitude of the sample; and vi) use of source apparent motions. This information allowed us to prepare the SVM for the application to the all-sky WISE dataset (Sect.5). A25, page7of 18 A&A 592, A25 (2016) Table 2. ComparisonofSVM performancesfortwokernels, polynomialand radial,forthe self-checkand cross-test. SELF-CHECK Kernel Polynomial Radial Completeness Purity Contamination Completeness Purity Contamination Galaxy Stars QSO 82.9 81.0 96.9 79.4 84.1 97.8 20.6 15.9 2.2 86.3 81.0 96.9 79.9 87.1 97.9 20.1 12.9 2.1 CROSS-TEST kernel polynomial radial Completeness Purity Contamination Completeness Purity Contamination Galaxy Stars QSO 81.0 78.0 97.0 77.9 83.9 94.2 22.1 16.1 5.8 84.0 77.0 96.0 75.7 86.5 96.0 24.3 13.5 4.0 In each of the tests, we used a ten-fold cross-validation tech-nique: we divided the training set into ten subsets of equal size, selected nine of the subsets to train the classifcation model, and then tested it on the remaining subset; this was repeated ten times, leaving out a different subset each time.We then counted the training objects whose nature was correctly identifed by SVM:TS(truestar),TG(truegalaxy),TQ(trueQSO),and those misclassifedbythe algorithm:FG(falsegalaxy),FS(false star), and FQ (false QSO). Then we defne completeness c, contamination f,andpurity p forthethree classesofobjects(e.g. Soumagnacetal.2015).For galaxies we have TG cG = , (7) TG + FGS + FGQ FSG + FQG fG = , (8) TG + FSG + FQG TG pG = 1− fG = , (9) TG + FSG + FQG where TG, FGS, and FGQ stand forgalaxies classifed respectively as galaxies, stars, and quasars, and FSG (FQG) defne stars (quasars) misclassifed asgalaxies. Analogous defnitions are used for completeness, contamination, and purity of stellar (cS, fS, pS)and quasar(cQ, fQ, pQ)samples. In each case we measure completeness, purity, and contamination for two variants: a self-check and a cross-test. For the self-checkwe classifedthe same objects usedinthegiven training sample. In the cross-test we classifed objects from the sub-samples that were not in the current training set. The tests described below were performed for different com-binations of magnitude limits and Galactic extinction levels. Three fux limits were adopted: W1 < 14 mag, W1 < 15 mag, and W1 < 16 mag. In each data were also binned accord-ing to the measured Galactic dust emission. Here we chose the 100-micron intensity (I100) sky map made from a com-bination of COBE/DIRBE and IRAS 100 µm measurements (Schlegel et al. 1998).We preferred the I100 parameter over the commonly applied E(B − V)because the former was directly measured from data, while the latter was derived.We adopted four extinction bins: I100 < 1, 1 ≤ I100 < 2, 2 ≤ I100 < 3, and 3 ≤ I100 < 10 [MJy/sr]. Above I100 = 10 MJy/sr, which constitutes about 1% of the WISE × SDSS catalogue, there are practically nogalaxies or quasars in the training set. In general, the I100 ≥ 10 MJy/sr areas cover about 17% of the full sky, prac-tically only in the Galactic Plane and regions of high dust obscuration where our classifcationisnotexpectedtobe reliable. In some cases, the above splitting of the full sample left us with very small numbers of quasars in the relevant training sets, and the oversampling methodology had to be applied (see Sect. 3.1). 4.1.Kernel performance comparison The frst test served to determine the optimalkernel for our ap-plication.We comparedthe performanceofthetwokernel functions described in Sect. 3, polynomial and radial (seeTable 2), and analysed the so-called univariate histograms of projections from the self-check and cross-testof the known data.Aunivariate histogram of projections is a graphical representation of the training data for a given binary classifcation (in the case of a three-class classifer we have three two-class classifers) and the decision boundary SVM providesgiven the data.To obtain the best efficiencyof the classifcation, it is standard practice to divide the training set into two subsets: one is used for actual training and the other is a validation subset used as a verifcation of the accuracyof the createdhyperplane against other known ob-jects, even if not used for training. In this test, the training set contains 99% of the total number of sources with known classifcation, while the validation test is composed of the remaining 1% of known objects. For non-linear SVMkernels, projectedvalues f (x)(Eq.(1)) are obtained though the kernel representation in a dual space. This means that there are three support vector machines (in the case of WISE × SDSS data), each of which has its own de-cision function. Projection of an object xk from a training set onto the normal direction of a non-linear SVM boundary can be written as f (xk)=Σi∈s,ναiyiK(xi, xk)+ b, (10) where xi denotes a support vector, and classifcation of each ex-ampleis determinedbythesignofthis function.Thesoftmargin of a classifer can be written as | f (xk )|), y f (xk ) = sign(f (xk))(1 − exp(11) and the boundaries of the soft margin are then yf (xk ) ∈ [−1;1]. Then, y f (xk ) describes two aspects of each example. The frst comes from the sign: it encodes a “hard” decision whether the example xk belongs to a given class or not. The second comes from its absolute value: it represents how strong the decision is. A25, page8of 18 A.Kurczetal.:Towards automatic classifcationofallWISE sources Fig.
8.
Comparison of the histograms of projection values of the training data samples onto the normal direction of the SVM decision boundary fora radial(left column)and polynomialkernel(right column). The top row represents the division betweengalaxies and stars (red and blue, respectively), the middle row betweengalaxies and quasars (red and green), the bottom row between stars and quasars.Vertical lines represent the boundariesofthe decisionhyperplane marginsforthetwo classes(+1and−1), and0marks the positionof thehyperplane itself. This means that the farther a given example xk falls from the decision boundary, the more certain the decision is. As can be seen fromTable 2, the differences between com-pleteness and purity of the training samples and validation datasets are small for the two kernels considered. However, when we compare the histograms of projections of each point with respect to the boundary we see clear differences (Figs. 8 and 9). For the radial kernel we observe an effect of data piling on the margins, which is typical for high-dimensional data (Cherkassky&Mulier 2006). Separability of the set on which the classifer was trained does not imply that the validation or test sets will be equally well separated (see Figs. 8 and9, left columns). As the SVM optimisation aims at high separability of the training data,it penalises the data thatfall into the soft mar-gin of the separation boundary. However, the aim is also to have a good separation of the validation set (which in turn should im-prove the separability of the test sample), which is whythe preferredmodel shouldallowdatapointstofallintothesoftmargin. A25, page9of 18 Fig.
9.
SameasFig. 8,butforthevalidation dataset. Moreover, it is desirable to have as decision values that are as strong as possible; therefore, the data piling effect should be avoided, whichiswhythekernel that displaysa clearerdivision ofvalidation – the polynomialkernelin this case –is preferable, asis shownin Fig. 9. 4.2. Optimal number of objects in the training samples As our WISE × SDSS training set is much larger than in earlier SVM classifcation applications (e.g. in AKARI by Solarz et al. 2012 or in VIPERS by Ma ek et al. 2013), in the frst step, after deciding which kernel function to use, we performed a series of tests to check whether we were able to calibrate our SVM method on smaller subsamples without deteriorating the results.We conductedfour testsforeachofthe three fux-limited samples (in this case there was no extra division according to the extinction), where we randomly chose 100, 1000, 3000, and 5000 objects for each class (i.e. 100 galaxies, 100 stars, 100 quasars, etc.), and we used these training sets to compute relevant statistics as defned above. Each test was repeated ten times, and in all the cases the error bars provided represent the standard deviation from the mean of the ten tests. A25, page 10 of 18 A.Kurczetal.:Towards automatic classifcationofallWISE sources Figure10shows,asanexample,the dependenceofthe com-pleteness and purity on the number of training objects for a bin W1 < 14 mag for the self-check and cross-test. The results for other fux limits are similar. As seen in this fgure, our results stabilise for subsamples with 3000 randomly chosen objects from each class. Based on these results, the following tests were ap-plied for these numbers of objects. This allowed us to signifcantly save on computation time in the tests, as it scales highly non-linearly with the size of the training set. 4.3.Parameter spacefor classifcation After establishing the optimal size of the training samples, we performeda seriesof teststo determinethe minimum numberof parameters sufficient for optimal classifcation. Each additional parameter signifcantly extends computation time, while it does not necessarily improve overall accuracy, for instance in cases when it is noise-dominated and/or not related to the source type. We thus started by using just two, the W1 magnitude and the W1− W2colour, and thenextended the parameter spaceby frst adding the differential aperture magnitude w1mag_1 − w1mag_3, and then the apparent motions pm (cf. Sect. 2.1 for parameter descriptions).Fora proper comparison, these tests were applied on sources with “detected” proper motions, i.e. pm > sigpm. In addition, they were employed in various combinations of mag-nitude cuts and extinction bins, as described earlier. Figure 11 shows how completeness and purity change when the number of implemented parameters increases. This particu-lar example is for W1 < 14 mag and I100 < 1 [MJy/sr] , but the results were qualitatively the same for each of the magnitude cut – extinction combinations. While quasars were already very accurately classifed for the two parameters used(W1and W1− W2), for stars andgalaxies both completeness and purity signifcantly increased after the differential aperture magnitude was used as the third parameter. On the other hand, the proper motions did not bring anyimprovement, and sometimes even a slight deteriorationin accuracywas observed oncetheywereapplied. This is hardly surprising given all the caveats associated with WISE apparent motion measurements(Kirkpatrick et al. 2014): theyare not accurate enough for our type of analysis, and from now on we will thus focus on tests not using proper motions.We canexpect,however, that longer time baselines and/or better photometric accuracy possible with the NEOWISE data (Mainzer et al. 2014), once combined into the MaxWISE data product(Faherty et al. 2015)10, and with future surveys (such as the LSST) should allow proper motions to become a useful parameter for the classifcation of sources, including extragalactic ones. This would also identify a fourth type of source, one that we have not considered here as we do not have them in the training samples, although they are certainly present in the WISE database, namely minor bodies of the solar system. As is 10 See also http://wise5.ipac.caltech.edu/posters/Eisenhardt.pdf
A25, page 11 of 18 shownin Sect. 5,theymostlikely contaminatethe quasar candi-date sample in the fnal all-skyclassifcation. 4.4. Dependence of classifcation accuracy on extinction and limiting magnitude Two fnal tests of the SVM algorithm applied to WISE × SDSS data were to check its performance against varying ex-tinction levels andincreased magnitude cut. Figures 12 and 13 summarise the results, and show how completeness and purity change with varying magnitude for four I100 bins, for the three classes of sources. Here we used smaller increments (0.5 mag) thanin the other tests whereitwas1mag. At the bright end, both completeness and purity retain very high levels of greater than 90% irrespective of extinction. These numbers for stars andgalaxies gradually deteriorate forfainter sources, and some dependence on the extinction starts to appear as larger magnitudes are reached. The statistics are relatively stable and are at very good levels for quasars (where a slight in-crease in purity is actually observed at thefaint end).We note however thateven forfainter sources, star andgalaxy samples exhibit completeness of over ∼80% and purity of over 77%. De-tailed results regarding completeness, purity, and contamination for all magnitude-extinction bins for the self-check and cross-test in the three dimensional parameter-space are presented in TablesA.1 and A.2 in the Appendix. We note, however,that such statistics may be partly misleading as theyrefer to the test sample, which is statistically consis-tent with the training set. The full-skycatalogue will differ, and in particular may (and does) contain sources not represented in the training, such as asteroids. The classifer tuned to the training set will underperform in such cases, which in particular will be refected in low probabilities of such objects belonging to anyof the three classes used in the analysis.We discuss this further in the following section. 5. Application of the SVM classifer to all-sky WISE Having verifed in various ways the performance of our classifer, we fnally applied it to the all-sky WISE data limited to W1 < 16 mag. In this case, to tune the classifer we used the most comprehensive and general training data; we randomly se-lected 104 galaxies, 104 stars, and 104 quasars from the cross-matched WISE × SDSS dataset with W1 < 16 mag. Thus, the trained classifer fagged 70%/27%/3% of our WISE sources re-spectively as stars/galaxies/QSOs on the full sky. These numbers are consistent with thefact that stars dominate the source counts at the bright end of WISE(Jarrett et al. 2011, 2016); however, they should not be taken at face value. At low Galactic latitudes and in other highly crowded areas (Magellanic Clouds, Galactic extended sources such as dust clouds) the classifcation is highly unreliable. In addition, these numbers refer to the A25, page 12 of 18 A.Kurczetal.:Towards automatic classifcationofallWISE sources sources for which the probability of being of a given type was higher than that of the other two (e.g. p(star) > p(galaxy) & p(star) > p(QSO)for stars).However,foraconsiderable number of objects, especially in thegalaxy and QSO classes, the three probabilities were comparable, which means a very low level of confdence for a class assignment. Thus, to obtaingalaxy and quasar candidate catalogues based on our data, the masking of problematic areas was necessary, as was additional cleanup of low-probability sources. In the case of stars this was not needed, Fig.
13.
Dependence of the purity on the magnitude for four I100 bins when using three classifcation parameters(W1, W1− W2, and w1mag_1 − w1mag_3)forgalaxies(upper panel), stars(middle panel), and quasars(lower panel)from the WISE× SDSS sample. These re-sults are for the cross-test. frst, because we do expect them to be present in the highly crowded areas and second, their class assignment was the most robust: 99.4% of the sources classifed as stars had p(star) > 0.5. Amapof the 220 million star candidatesis shownin Fig.14.A decrease in counts at b ∼ 0◦ is causedbysaturation and blending. The catalogue of candidate galaxies, unlike stars, needed considerable purifcation. First of all, we had to cut out the most confused areas of the Galactic Plane and Bulge, using a longitude-dependent masking of the |b| < 6.5◦ sources at A25, page 13 of 18 ` = 180◦ up to |b| < 20◦ near the Galactic Centre. This re-moved almost 30 million objects out of the 84.5 × 106 preassigned to the galaxy class all-sky. As seen in Fig. 15, this mask could have been wider, but we leave it in this form to emphasise classifcation issues at low Galactic latitudes where blends become a signifcant problem for star/galaxy separation in WISE. In addition to the bright-end cut already mentioned in Sect. 2.2 (due to saturation and lack of bright sources in the training set), which affected a very small number of the objects, wealso eliminatedover400000 outliersatthefaintendinthe W2band,W2 > 16.1mag. These were located mostly near the Ecliptic Poles where WISE coverage was the highest owing to the scanning strategy. This cutout also automatically removed the sources with W1− W2 < −0.1 mag which are most cer-tainly stellar(Wrightetal.2010).Inthefnalstageofthegalaxy catalogue cleanup, we used the probabilities assigned by SVM as describedin Sect. 3,andexaminedtheskyand colour distri
bution of thegalaxy candidates after applying different thresholds in p(gal). More aggressive cuts in this probability lead to a more uniform distribution of the sources as a function of lat-itude. However, even for p(gal) > 0.7 or more, differences of over 50% in the source density remain between the Galatic Caps and |b|∼ 20◦. As the W1− W2colours of the objects with high galaxy probability were consistent with thoseofgenuinegalaxies, the effect of gradually increasing number counts from high to low Galactic latitudes must be related to both the stellar con-tamination (blending) andgalaxy incompleteness goingup with decreasing |b|. Figure 15 shows an example of all-skysource distribution of galaxy candidates. Included are 45 million sources obtained after the masking, bright-andfaint-magnitude cutouts, and placement Fig.
16.
Sources fagged by our classifer as quasar candidates in the WISE W1 < 16 mag catalogue. This sample shows6 million objects with an SVM probability p(QSO) > 0.5after appropriate cleanup and masking (see text for details). of a threshold of p(gal) > 0.6. There is clear contamination at lowGalactic latitudes,butthe classifer seemstobeworkingwell at least for half of the skydown to |b| = 30◦. The missing data in stripes on the left above the Plane and just below it on the right are due to AllWISE instrumental artefacts (saturation at the beginning of the post-cryogenic phase) as already discussed in Sect. 2.1. The quasar candidates underwent similar purifcation to the galaxy candidates. The same cutout of the Galactic plane was frst applied, which removed almost 30% of the 9.4 million all-sky sources fagged by SVM as QSO. In addition, more ag-gressive bright-end cuts than in the galaxy case were necessary to avoid dangerous extrapolation from the training sample. We removed quasar candidates with W1 < 10.4 mag or W2 < 10.1 mag, as such bright QSOs are practically non-existent in WISE × SDSS. In the SVM output, they were mostly misclassifed stars or blends of stars, localised chiefy at low Galactic lat-itudes and in the Magellanic Clouds.Afurther cleanup tokeep only the p(QSO) > 0.5sources resulted in6million objects as pictured in Fig. 16. This number is most certainly an overesti-mate for the true WISE quasar population at W1 < 16 mag. In addition to the saturation-related artefacts, we note some inter-esting featuresinthemap which arenot seenforthegalaxy candidates. First, there is a lack of sources at low Galactic latitudes, qualitatively similar to the WISE AGN distribution presented in Ferraro et al. (2015), where sources were classifed based on colour cuts. Second,various WISE scanning issues are imprinted here, the most important beingoverdensity stripes perpendicular to the ecliptic, resulting from Moon avoidance manoeuvres (cf. the mask applied in Ferraro et al. 2015). There is, however, ad-ditional spurious overdensity which seems to roughly follow the ecliptic, visible at the top right of the map and below the Bulge, to the left. This suggests some very local contamination, such as from asteroids or maybe zodiacal light, and most likely refects the presence of a fourth type of source in addition to the three types in the training set from SDSS. Unlikeinthetestphase describedinSect. 4,herewedonot know the “truth” to which we could compare the classifer’s performance; however, some indirect a posteriori tests are possible. The frst is the all-sky distribution, which for stars andgalaxy candidates(at high latitudes)is consistent withexpectations,but much less for the quasars. The second test is to verify source properties, such as colours.For identifedgalaxies, the W1− W2 colour is very consistent with the colour found in the WISE × SDSS training set (cf. Fig. 2)when the higherp(gal) cut is ap-plied.For quasar candidates the situation is different. Even for very high thresholds of p(QSO), the peak in this colour is at A25, page 14 of 18 A.Kurczetal.:Towards automatic classifcationofallWISE sources W1 − W2 ∼ 0.6 mag rather than ∼ 1 mag as in the training set(Fig. 17).Infact,the sourcesofexpected quasar naturewith W1− W2> 0.8mag(Stern et al. 2012)are only1/5of our QSO candidate sample. As already seen from their all-skymap, some of the contamination may come from solar system objects. As an additional verifcation, we checked the location of the SVM QSO in the W2−W3vs.W1−W2diagram for those sources that had a W3detection.ThebulkoftheQSO candidates are located at3 < W2− W3 < 4[mag] and0.4 < W1− W2 < 0.7[mag], which does not give a clear characterisation of their nature. In-deed, following Wrightetal. (2010),thisis wherevariousgalaxy typesoverlapin this parameter space (“normal” spirals,Seyferts, strabursts, LIRGs, etc.).This also indicates thatevenif we had used the W3 parameter (which, as we emphasise, is robustly measured only for a small subset of our sources), this degeneracybetween quasars and non-AGNgalaxieswould most likely remain. 6. Summaryand future prospects In this paper we presented an application of a machine learning algorithm – the support vector machines – to classify sources in an all-sky catalogue drawn from WISE. The algorithm was trained and tested on a sample of WISE objects cross-matched with SDSS spectroscopic data, where three main types of astro-physical sources – stars,galaxies and quasars – had been independently identifed.To optimise the performance of SVM, we frst determined thata polynomialkernelof the thirddegreeis preferred over the traditionally used radial one. We next verifed that a training sample of less than 10 000 randomly cho-sen sources was sufficient to obtain stable results; in addition, the algorithm had already performed satisfactorily for a threedimensional parameter space(W1magnitude;W1− W2colour; differential aperture mag in the W1channel). Having established the optimal set-up of the SVM method for our purposes, we performed several tests of its performance on WISE data. Here we focused on completeness and purity as a function of the limiting magnitude of the test sample, and on their dependence on Galacticextinction.For stars andgalaxies both these statistics deteriorate for increasing magnitudes, but even at thefaint end theyrarelyfall below ∼80%. On the other hand, no obvious dependence of the SVM performance on mag-nitude is observed for quasars. Finally, Galactic extinction does not seem to have infuence on the results, although we note that the tests were limited to regions of EBV . 0.3, outside of which there is practically no calibration data. We fnally applied the SVM algorithm, trained on the WISE × SDSS sample, to the full-sky WISE data fux-limited to W1 < 16 mag. About 220 million sources preselected in this way were fagged by SVM as star candidates; the remaining ob-jects required signifcant cleanup to obtaingalaxy-candidate and QSO-candidate samples. This cleanup consisted in removing the brightest sources, as well as those located in the Galactic Plane and Bulge areas, for which the classifcation is not expected to be reliable.We also used source type probabilities providedby SVM to remove the objects of insecure classifcation. As a re-sult, we obtained cataloguesof45 milliongalaxy candidates, as well as of6 million QSOs. In the latter case, however, we ob-serve signifcant contaminationby sources consistent with dusty (non-AGN)galaxies andby probable solar system objects. These shortcomings of our classifcation are related to the limitations of the training sample and to the lack of additional classifcation parameters that could be reliably used for the full sample together with the three basic ones employed here. It is possible to mitigate the former drawback thanks to forthcoming spectroscopic data, for example from SDSS-IV; however, it is not clear how much it is possible to improve the latter if only WISE data are to be used for the classifcation on the full skywhilekeepinga deep and uniform sample. Measurementsin WISE W3 (12 µm) and W4 (23 µm) channels would certainly help to break degeneracies that result in unreliable identifcation of quasars in the present approach; however, these two bands of-fer much shallower andvery inhomogeneous coverage compared to W1andW2. Some improvement in classifcation could also be expected if there were reliable proper motions for a much larger sample of WISE sources than presently available, as these data would help identify at least some of the minor bodies of the solar system which most likely contaminate our current QSO sample. In general, a natural next step in the process of classifcation of WISE sources is to expand the current scheme to a larger number of object classes, which will allow the creation of more robust catalogues or at least the purifcation of the current ones. For this to be accomplished, more classifcation parameters, and more comprehensive training sets will be necessary.We plan to explore this in forthcoming studies (Wypych et al., in prep.). Lastbutnot least,itis possibletoworkonimprovingthe training scheme itselfbyimplementing the so-called fuzzy logic (e.g. Klir&Yuan 1995)into the SVM algorithm. Whilein the classi-cal SVM approach all training examples are treated equally, the fuzzy logic procedure handles the uncertainties of the classifcation databy weighting the trainingexamples (e.g. Abe&Inoue 2002;Tsujinishi&Abe 2003). Each training point may belong to no more then one class,butby weighting the training points Fuzzy-SVM (FSVM) can ensure that the meaningful data points will be classifed correctly,while the noisier ones will have more freedom to be misclassifed in order to ensure the maximum margin beneft. Thisin turnexpands the classifcation regionsin the parameter space.However, this approachheavilyextendsthe computational time owing to the introduction of the additional free parameter, which, like other SVM parameters (e.g. misclassifcation parameter C orkernelparameters), must be tuned for best performance.While this method could help to improve the current classifcation, the uncertainties of the measurements of the objects considered in this work are relatively small. In view of the largely extended computational time, the FSVM was not favourable for the purpose of the current analysis,but it will be considered in our future studies of classifcation in noisier WISE or other data. A25, page 15 of 18 Acknowledgements. We thank the referee for the helpful review. Special thanks to MarkTaylor for theTOPCAT(Taylor 2005)and STILTS(Taylor 2006)software11. Some of the results in this paper have been derived using the HEALPix package(Gskietal.2005)12.Thisworkwas supportedbythe Polish National Science Center under contracts# UMO-2012/07/D/ST9/02785. M.B. was supported by the Netherlands Organization for Scientifc Research, NWO, through grant No. 614.001.451, by the European Research Council through FP7 grant No. 279396 and by the South African National Research Foundation (NRF). A.P.was partially supportedby the Polish-Swiss Astro Project, co-fnancedby a grant from Switzerland, through the Swiss Contribution to the enlarged European Union. This publication makes use of data products from the Wide-feld InfraredSurveyExplorer,whichisajointprojectoftheUniversityof California, Los Angeles, and the Jet Propulsion Laboratory/California Institute ofTechnology, fundedby the National Aeronautics and Space Administration. Funding for SDSS-III has been providedby the AlfredP. SloanFoundation, theParticipating Institutions,the National ScienceFoundation,andtheUS DepartmentofEnergy Office of Science. The SDSS-III web site is http://www.sdss3.org/. SDSS-IIIis managedby the Astrophysical Research Consortium for theParticipating Institutions of the SDSS-III Collaboration including the University of Arizona, the Brazilian Participation Group, Brookhaven National Laboratory, Carnegie Mellon University, University of Florida, the French Participation Group, the GermanParticipation Group, Harvard University, the Instituto de Astrofsica de Canarias, the Michigan State/Notre Dame/JINAParticipation Group, Johns Hopkins University, Lawrence BerkeleyNational Laboratory, Max Planck Institute for Astrophysics, Max Planck Institute for Extraterrestrial Physics, New Mexico StateUniversity,NewYorkUniversity,OhioStateUniversity,PennsylvaniaState University, University of Portsmouth, Princeton University, the SpanishParticipation Group, University ofTokyo, University of Utah,Vanderbilt University, UniversityofVirginia, UniversityofWashington, andYale University. References Abe, S., & Inoue, T. 2002, in European Symposium on Artifcial Neural Networks, 113 Ahn,C.P., Alexandroff, R., Allende Prieto, C., et al. 2014, ApJS, 211, 17 Akbani, R., Kwek, S., & Japkowicz, N. 2004, in Proc. of the 15th European Conference on Machine Learning (ECML), 39 Alam,S., Albareti,F.D., Allende Prieto,C.,etal.2015, ApJS,219,12 Anderson,L.D.,Bania,T.M., Balser,D.S.,etal.2014, ApJS,212,1 Assef,R.J., Stern,D.,Kochanek,C.S.,etal.2013, ApJ,772,26 Beaumont,C.N.,Williams,J.P.,&Goodman,A.A.2011, ApJ,741,14 Bilicki,M., Jarrett,T.H., Peacock,J.A., Cluver,M.E.,&Steward,L. 2014, ApJS, 210,9 Bilicki,M., Peacock,J.A., Jarrett,T.H.,etal.2016,ApJS,inpressBolton, A. S., Schlegel, D. J., Aubourg, É., et al. 2012, AJ, 144, 144 Brown,M.J.I., Jarrett,T.H.,&Cluver,M.E. 2014a, PASA,31,49 Brown,M.J.I., Moustakas,J., Smith, J.-D.T.,etal. 2014b, ApJS,212,18 Bu,Y., Chen,F.,&Pan,J. 2014, NewA,28,35 Cavuoti, S., Brescia, M., D’Abrusco, R., Longo, G., & Paolillo, M. 2014, MNRAS, 437, 968 Chang, C.-C.,&Lin, C.-J. 2011, ACMTrans. Intell. Syst.Technol.,2, 27:1 Cherkassky,V.,&Mulier,F. 2006, Learning from Data: Concepts, Theory, and Methods, Second Edition (WileyOnline Library) Cluver,M.E., Jarrett,T.H., Hopkins,A.M.,etal.2014, ApJ,782,90 Cristianini, N., & Shawe-Taylor, J. 2000, An introduction to Support Vector Machines (Cambridge University Press) Cutri,R.M., Wright,E.L., Conrow,T.,etal.2013, Explanatory Supplementto the AllWISE Data Release Products,Tech. rep. Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., & Weingessel, A. 2005, e1071: Misc Functions of the Department of Statistics (e1071), TU Wien, Version 1.5-11 Driver,S.P.,Hill,D.T.,Kelvin,L.S.,etal.2011, MNRAS,413,971 Edelson,R.,&Malkan,M.2012, ApJ,751,52 Eisenstein,D.J.,Weinberg,D.H.,Agol,E.,etal. 2011, AJ,142,72 Faherty, J. K., Alatalo, K., Anderson, L. D., et al. 2015, ArXiv e-prints [arXiv:1505.01923] FanWu,T., Lin, C.-J.,&Weng,R.C. 2003,J. Machine Learning Research,5, 975 Ferraro,S., Sherwin,B.D.,&Spergel,D.N.2015, Phys.Rev.D,91, 083533 Gski, K. M., Hivon, E., Banday, A. J., et al. 2005, ApJ, 622, 759 Hambly, N. C., MacGillivray, H.T., Read, M. A., et al. 2001, MNRAS, 326, 1279 Hsu, C.-W., Chang, C.-C.,&Lin, C.-J. 2003, Bioinformatics,1,1 Ivezi´c,Ž., Monet,D.G.,Bond,N.,etal.2008,inIAUSymp.248,eds.W.J.Jin, I. Platais,&M.A.C. Perryman, 537 Jarrett,T. H., Chester,T., Cutri, R., et al. 2000, AJ, 119, 2498 Jarrett,T. H., Cohen, M., Masci,F., et al. 2011, ApJ, 735, 112 Jarrett,T.H.,Cluver,M.E., Magoulas,C.,etal.2016,ApJ, submitted Kirkpatrick,J.D., Schneider,A.,Fajardo-Acosta,S.,etal.2014, ApJ,783,122 Klir, G. J., & Yuan, B. 1995, Fuzzy Sets and Fuzzy Logic: Theory and Applications (Upper Saddle River, NJ, USA: Prentice-Hall, Inc.) Kovács,A.,&Szapudi,I.2015,MNRAS,448,1305 Lin,H.-T.,Lin,C.-J.,&Weng,R.C.2007, Mach. Learn.,68,267 Mainzer, A., Bauer, J., Cutri, R. M., et al. 2014, ApJ, 792, 30 Ma ek, K., Solarz, A., Pollo, A., et al. 2013, A&A, 557, A16 Mateos, S., Alonso-Herrero, A., Carrera,F. J., et al. 2012, MNRAS, 426, 3271 Murakami, H., Baba, H., Barthel,P., et al. 2007, PASJ, 59, 369 Neugebauer, G., Habing, H. J., van Duinen, R., et al. 1984, ApJ, 278, L1 Nikutta, R., Hunt-Walker, N., Nenkova, M., Ivezi ´ c, Ž., & Elitzur, M. 2014, MNRAS, 442, 3361 Perryman, M. A. C., de Boer, K. S., Gilmore, G., et al. 2001, A&A, 369, 339 Platt,J.C. 1999,in Advancesin large Margin Classifers (MIT Press),61 RDevelopmentCoreTeam2005,R:ALanguageandEnvironmentfor Statistical Computing,RFoundation for Statistical Computing,Vienna, Austria Saglia,R.P.,Tonry,J.L., Bender,R.,etal.2012, ApJ,746,128 Schlegel,D.J., Finkbeiner,D.P.,&Davis,M.1998, ApJ,500,525 Secrest,N.J.,Dudik,R.P., Dorland,B.N.,etal.2015, ApJS,221,12 Shawe-Taylor, S.,&Cristianini, N. 2004,Kernel Methods forPattern Analysis (Cambridge, UK: Cambridge, UP) Skrutskie,M.F.,Cutri,R.M., Stiening,R.,etal.2006, AJ,131,1163 Solarz, A., Pollo, A.,Takeuchi,T.T., et al. 2012, A&A, 541, A50 Soumagnac,M.T., Abdalla,F.B.,Lahav,O.,etal.2015, MNRAS,450,666 Stern, D., Assef, R. J., Benford, D. J., et al. 2012, ApJ, 753, 30 Taylor, M. B. 2005, in Astronomical Data Analysis Software and Systems XIV, eds.P. Shopbell,M. Britton,&R. Ebert, ASP Conf. Ser., 347,29 Taylor, M. B. 2006, in Astronomical Data Analysis Software and Systems XV, eds.C. Gabriel,C. Arviset,D. Ponz,&S. Enrique, ASP Conf. Ser., 351, 666 Tsujinishi,D.,&Abe,S.2003, NeuralNetworks,16,785 Tu, X.,&Wang, Z.-X. 2013,RA&A, 13, 323 Vapnik,V. 1999,IEEETransactions on Neural Networks, 10, 988 Wright,E.L., Eisenhardt,P.R.M., Mainzer,A.K.,etal.2010, AJ,140,1868 Wu, X.-B.,Hao,G.,Jia,Z.,Zhang,Y.,&Peng,N.2012,AJ,144,49 Yan, L., Donoso, E., Tsai, C.-W., et al. 2013,AJ, 145, 55 York, D. G., Adelman, J., Anderson, Jr., J. E., et al. 2000,AJ, 120, 1579 11 http://www.star.bris.ac.uk/~mbt/
12 http://healpix.jpl.nasa.gov/
A25, page 16 of 18 A.Kurczetal.:Towards automatic classifcationofallWISE sources Appendix A:Tables with detailed results of the tests Inthisappendixweprovidetableswith detailed resultsofthetests describedinSect. 4.Tables A.1andA.2 summarisethe statistics of the completeness, purity, and contamination for various combinations of extinction bins and fux limits in the test sets, for the self-check and cross-test cases. Table A.1. Overall classifcation statistics (in %) for various combinations of extinction bins and fux limits for the self-check case (classifed objects were the same as in the training sample). SELF-CHECK Magnitude limit W1< 14 mag Extinction [MJy/sr] h0;1) h1;2) Completeness Purity Contamination Completeness Purity Contamination Galaxy Stars QSO 95.0 94.5 96.9 91.8 97.8 97.1 8.2 2.2 2.9 90.1 97.5 97.0 89.9 97.5 96.9 10.1 2.5 3.1 Extinction [MJy/sr] h2;3) h3;10) Completeness Purity Contamination Completeness Purity Contamination Galaxy Stars QSO 94.6 94.6 95.8 91.3 96.9 97.0 8.7 3.1 3.0 94.5 94.9 97.0 92.2 97.1 97.3 7.8 2.9 2.7 Magnitude limit W1< 15 mag Extinction [MJy/sr] h0;1) h1;2) Completeness Purity Contamination Completeness Purity Contamination Galaxy Stars QSO 93.3 91.5 97.0 89.1 94.9 98.2 10.9 5.1 1.8 92.2 90.5 96.6 87.8 93.8 98.0 12.2 6.2 2.0 Extinction [MJy/sr] h2;3) h3;10) Completeness Purity Contamination Completeness Purity Contamination Galaxy Stars QSO 92.3 91.4 96.1 88.4 93.9 97.9 11.6 6.1 2.1 92.5 92.6 95.3 88.5 94.2 98.1 11.5 5.8 1.9 Magnitude limit W1< 16 mag Extinction [MJy/sr] h0;1) h1;2) Completeness Purity Contamination Completeness Purity Contamination Galaxy Stars QSO 87.7 84.4 96.6 82.7 88.0 98.5 17.3 12.0 1.5 87.2 81.4 95.0 79.3 87.1 98.4 20.7 12.9 1.6 Extinction [MJy/sr] h2;3) h3;10) Completeness Purity Contamination Completeness Purity Contamination Galaxy Stars QSO 86.6 81.2 94.1 78.5 86.6 98.1 21.5 13.4 1.9 87.6 82.9 93.3 79.0 88.1 98.1 21.0 11.9 1.9 A25, page 17 of 18 Table A.2. Overall classifcation statistics(in%)forvarious combinationsofextinction binsand fux limitsforthe cross-test case (classifed objects were different from those in the training sample). CROSS-TEST Magnitude limit W1< 14 mag Extinction [MJy/sr] h0;1) h1;2) Completeness Purity Contamination Completeness Purity Contamination Galaxy Stars QSO 94.7 94.3 97.0 91.7 97.6 96.9 8.3 2.4 3.1 94.7 93.3 95.9 89.9 97.5 96.9 10.1 2.5 3.1 Extinction [MJy/sr] h2;3) h3;10) Completeness Purity Contamination Completeness Purity Contamination Galaxy Stars QSO 94.2 94.2 95.4 90.7 96.6 96.8 9.3 3.4 3.2 93.8 94.5 96.5 91.3 96.6 97.0 8.7 3.4 3.0 Magnitude limit W1< 15 mag Extinction [MJy/sr] h0;1) h1;2) Completeness Purity Contamination Completeness Purity Contamination Galaxy Stars QSO 92.9 91.2 96.6 88.6 94.5 98.0 11.4 5.5 2.0 91.8 90.2 96.4 87.5 93.5 97.8 12.5 6.5 2.2 Extinction [MJy/sr] h2;3) h3;10) Completeness Purity Contamination Completeness Purity Contamination Galaxy Stars QSO 91.8 90.5 96.0 87.5 93.4 97.8 12.5 6.6 2.2 92.2 92.0 95.2 87.9 94.2 97.7 12.1 5.8 2.3 Magnitude limit W1< 16 mag Extinction [MJy/sr] h0;1) h1;2) Completeness Purity Contamination Completeness Purity Contamination Galaxy Stars QSO 85.5 84.0 96.1 81.7 86.4 98 18.3 13.6 2.0 84.5 80.3 94.3 77.6 85.0 97.7 22.4 15.0 2.3 Extinction [MJy/sr] h2;3) h3;10) Completeness Purity Contamination Completeness Purity Contamination Galaxy Stars QSO 84.9 80.5 93.6 77.3 85.3 97.7 22.7 14.7 2.3 86.0 81.5 92.9 77.4 87.1 97.5 22.6 12.9 2.5 A25, page 18 of 18