Characterising the optical properties of galaxy clusters with GMPhoRCC

We introduce the Gaussian Mixture full Photometric Red sequence Cluster Characteriser (GMPhoRCC), an algorithm for determining the redshift and richness of a galaxy cluster candidate. By using data from a multi-band sky survey with photometric redshifts, a red sequence colour magnitude relation (CMR) is isolated and modelled and used to characterise the optical properties of the candidate. GMPhoRCC provides significant advantages over existing methods including, treatment of multi-modal distributions, variable width full CMR red sequence, richness extrapolation and quality control in order to algorithmically identify catastrophic failures. We present redshift comparisons for clusters from the GMBCG, NORAS, REFLEX and XCS catalogues, where the GMPhoRCC estimates are in excellent agreement with spectra, showing accurate, unbiased results with low scatter ($\sigma_{\delta z / (1+z)} \sim 0.014$). We conclude with the evaluation of GMPhoRCC performance using empirical Sloan Digital Sky Survey (SDSS) like mock galaxy clusters. GMPhoRCC is shown to produce highly pure characterisations with very low probabilities ($<1\%$) of spurious, clean characterisations. In addition GMPhoRCC is shown to demonstrate high rates of completeness with respect to recovering redshift, richness and correctly identifying the BCG.


INTRODUCTION
Galaxy clusters are excellent probes of cosmology, as the largest observable objects these are great indicators of the large scale structure and evolution of mass distribution in the universe. As this is highly sensitive to the form of the expansion of the universe, their study gives valuable constraints on cosmological models (see Peebles 1980, Sheth et al. 2001, Jenkins et al. 2001, Rozo et al. 2010, Allen et al. 2011and Tinker et al. 2012. In addition clusters provide an excellent opportunity for studying galaxies themselves particularly formation, evolution and the impact of the environment (see Gladders et al. 1998 andVoit 2005 etc.). With the recent surge in cluster detections, from the Sunyaev-Zel'dovich (SZ) signal in the CMB (Planck Collaboration et al. (2014), Reichardt et al. (2013) etc.), X-ray emission of intracluster medium (ICM) (Lloyd-Davies et al. (2011), Clerc et al. (2012) etc.), spatial and optical cluster finding (Hao et al. (2010), Murphy et al. (2012), Rykoff et al. (2014), etc.), galaxy clusters are proving to be an ever more valuable area of research.
While the most useful cosmological analysis of galaxy ⋆ E-mail: rgm@roe.ac.uk clusters involves the study of mass and redshift, these are difficult and time consuming to determine directly requiring gravitational lensing and spectroscopy. Optical characterisation offers quick estimates of cluster properties using multiband optical photometry alone and with the abundance of the such data from the Sloan Digital Sky Survey (SDSS) (Ahn et al. 2014), Canada-France-Hawaii Telescope Lensing Survey (CFHTLenS) (Heymans et al. 2012), VLT Survey Telescope (VST) ATLAS Shanks & Metcalfe 2012 and the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) 3π survey (Magnier et al. 2013), there is significant scope for such analysis.
The main focus of this research is the development of a new characterisation algorithm, the Gaussian Mixture full Photometric Red sequence Cluster Characteriser (GM-PhoRCC), which aims to provide optical characterisation of potential clusters previously detected by other observations such as X-ray emission. While the specific motivation for this lies with the determination of cluster redshifts for forthcoming XMM Cluster Survey (XCS) (Romer et al. 2001, Mehrtens et al. 2012 data releases, GMPhoRCC is designed for general use, providing characterisations for any list of positions of cluster candidates and any multi-colour galaxy catalogue with photometric redshifts.  Figure 1. A histogram of the red sequence overdensity as a function of redshift showing a multi-modal distribution resulting from projection effects. These peaks correspond to spatially overlapping clusters at different redshifts and without any additional information it is difficult to determine which represents the target cluster.
This paper is structured as follows, Section 2 discusses existing characterisation methods focusing on the motivation and key features desired in a new robust algorithm. Section 3 explores the and details of GMPhoRCC with evaluation using comparisons to known and mock clusters following in Section 4, with a detailed investigation of purity, completeness and the effectiveness of the quality control system. Finally this paper concludes with a summary and discussion in Section 5. This paper assumes a flat Λ CDM cosmology with Ωm = 0.27, Ω λ = 0.73 and h = 0.71.

CHARACTERISING THE OPTICAL PROPERTIES OF GALAXY CLUSTERS
Many cluster detection/classification algorithms have been developed in recent years, C4 (Miller et al. 2005), maxBCG (Koester et al. 2007), GMBCG (Hao et al. 2010) and redMaPPer  to name a few. While the exact details vary, the basis of these methods is to isolate the red sequence, early type galaxies with similar metallicities and colours which dominate the members, and use these to infer the bulk properties of the cluster. While optical cluster finders search for additional spatial clustering, the simplest form of red sequence modelling is to find clustering in colour space. The simplest case of characterisation, relates to a well defined easily observed red sequence as an overdensity in colour or redshift space, however many clusters do not conform to this due to projection effects and background fluctuations. Figure 1 demonstrates the redshift clustering of a field with two overlapping clusters. Due to the projection effect it is unclear which peak represents the target cluster and methods looking for maximum overdensities such as that from High et al. (2010) may fail to adequately describe the situation. With these multi-modal distributions common, found to be present in ∼ 40 percent of the GM-BCG catalogue, it is clear any stand alone characterisation algorithm must account for these ambiguous cases.
While a simple colour overdensity is a good approximation, the red sequence itself is described by a colour magnitude relation (CMR). The CMR relates to the physical properties of the cluster, where slope encodes the mass-metallicity relation and scatter age variation etc. (Gladders et al. 1998), and hence it is desirable to model the full CMR rather than simple overdensities as is the case with GMBCG, XCS etc.
With the red sequence isolated it remains to determine cluster properties. Redshift can be determined using red sequence colour-redshift models as in the cases of Koester et al. (2007) and Mehrtens et al. (2012), however this introduces model dependence and additional complexity in the analysis. In the case of GMBCG, redshift is obtained from the photometric estimate of the BCG. Although relying on correct identification, as a bright galaxy the photometric estimate is more easily obtained.
The following lists several initial features drawn from existing algorithms which have driven the development of GMPhoRCC.
(i) Red sequence detection: GMPhoRCC will isolate the red sequence and use these galaxies to infer the optical properties of the cluster.
(iii) Full red sequence CMR: The red sequence will be described by a full CMR determined by the GMPhoRCC.
(iv) Multiple red sequence bands: To maximise the efficiency of the GMPhoRCC and to cover a large range of cluster redshifts, multiple redshift dependent colour bands will be necessary as demonstrated in Hao et al. (2010).
(v) Multi-modal distributions: Without resorting to a full finder approach where overlapping clusters can be separately identified and analysed, GMPhoRCC must deal with multimodal distributions as several potential clusters.
(vi) Quality Control: Extending beyond simple error analysis it is necessary to assess the probability of catastrophic failure. By introducing quality control, several subsets can be produced where problem clusters and possible outliers can be identified and removed to produce a clean subset.

GMPHORCC
GMPhoRCC is designed as an optical follow-up tool to confirm and characterise galaxy cluster candidates. The main feature of GMPhoRCC is to identify the red sequence and use the properties of these galaxies to analyse the cluster. A basic outline of the procedure used to isolate the red sequence and ultimately determine redshift and richness is shown in Figure 2. While each of these steps is explored in detail in subsequent sections it is first noted that many of these require the modelling of cluster distributions such as colour and photometric redshifts which, as shown by Hao et al. (2009), can be approximated by a Gaussian mixture.

Modelling Cluster Distributions with Error-Corrected Gaussian Mixtures
Galaxy distributions, whether colour or redshift, are well modelled by Gaussian mixtures which use a sum of sev- eral Gaussian components to describe any features or substructures. Fitting the mixture model proceeds using the error-corrected expectation maximization procedure from Hao et al. (2009) which accounts for associated measurement errors. Considering the distribution of galaxy colours as an example, the probability of galaxy n having true colour cn given the parameters θ is defined below: where θ represents the collective parameters of the model namely µ k , σ k , P (k), the component means, standard deviations and weights respectively. Combining these with Gaussian measurement errors for all the galaxies, Hao et al. (2009) have shown that this leads to the following form from the likelihood of the parameters given the data: where cn is the observed galaxy colour with Gaussian error δn, K is the total number of components and N is the number of galaxies. Maximising this likelihood using the expectation maximisation procedure of Hao et al. (2009) gives the optimised parameters.
In order to apply this modelling to the distributions of cluster members it is necessary to account for the effect of the background when analysing a candidate field. Rather than modelling the whole field and selecting components to separately describe the cluster and background as explored by the GMBCG algorithm Hao et al. (2010), a subtraction approach is used.
where GM represents a Gaussian mixture density model. Modelling clusters in this manner with background subtraction helps removes the ambiguity in component selection seen in GMBCG. Values of interest, such as an approximate red sequence colour from the colour distribution, are determined with an associated uncertainty from the width of the distributions around the peaks. While this section describes in detail the Gaussian mixture fitting procedure for a given area, the selection of the cluster area, background and field is explored in Section 3.2.
Applying this to the colour distribution of cluster GM-BCG J197.87292-01.34109 indeed shows the Gaussian mixture to be a good representation of the cluster around a g −r colour of ∼ 1.2 magnitudes.

Red sequence CMR and Redshift
With the framework in place to model the various cluster distributions from optical data the first main goal of GMPhoRCC is to identify the red sequence, modelling a full CMR with intrinsic scatter. As the previous section described, GMPhoRCC models cluster distributions with background subtraction, hence the first step is the extraction of the cluster region and background from the candidate field. The cluster region is taken as the cone with a 30 arc minute radius centred on the detection observation, for example, the peak of the X-ray emission. The local background is taken as the annulus around this cone up to a radius of 60 arc minutes.

Initial Redshift Estimate
To aid the isolation of the red sequence, initial redshift and colour estimates for the cluster are determined which allow the selection of an appropriate colour band and provides broad filtering to remove field galaxies. While this is perhaps best achieved by modelling the joint colour-redshift density distribution, extending the fitting procedure of Section 3.1 to higher dimensions is not trivial. Even without the error correction the fitting procedure often fails to converge, fails to recover fine structure and is highly sensitive to the initial Table 1. The most suitable red sequence colour bands for the initial redshift estimate. These values overlap to account for uncertainty in the initial redshift and those close to a transition.
Red sequence band redshift range estimate of the parameters. Hence separate, error-corrected modelling of redshift and colour proceeds. Starting with redshift, Figure 4 expands procedure 1 of Figure 2 showing in detail the procedure used to arrive at an initial estimate for an inner cluster region. The redshift distribution of the inner cluster is modelled by taking the mixture model from a series of cones across a range of radii, 1 -4 arc minutes and subtracting the background model. The inner cluster radius is then selected to produce the largest sum of the amplitudes of the peaks in the distribution. In addition to ensuring peaks can still be found in the case of miss-centring, this gives preference to regions producing multi-modal distributions where each peak can be subsequently analysed and used to assist with the characterisation. While this could be found with an investigation of the radial profile, this method is less sensitive to issues with overlapping clusters which indeed can be common (∼ 20 percent of the GMBCG catalogue has a neighbour within 3 arc minutes).
For a rigorous treatment of multi-modal redshift distributions, a secondary peak is investigated as a potential cluster provided the amplitude is at least 20 percent of the primary. This threshold allows the analysis of potential structure without exploring noise or low level fluctuations in the distribution. Considering multiple peaks in this way occurs throughout GMPhoRCC resulting in a potentially large number of possible candidates from which the cluster is selected.

Initial Colour Estimate
With the initial redshift estimate, an initial colour estimate for the inner cluster region proceeds as shown in Figure 5. First an appropriate colour band is selected based on the initial redshift in line with Hao et al. (2010) shown in Table  1. These values ensure that the main spectral feature of red sequence galaxies, the 4000Å break, remains in the band at a given redshift. This is important as this ensures the strongest colour clustering and contrast against the background. Additionally the redshift values overlap to account for possible failures of the initial estimate and help clusters where the 4000Å break sits between bands.
Before estimating an initial colour, galaxies are removed from the background and the cluster region which do not conform with the initial redshift; specifically those where the galaxy redshift is more than 0.25 from the initial estimate and those fainter than would be considered in such a cluster. The faint end cut is taken as m * (z) + 2 where the redshift dependent m * is taken from Hao et al. (2010) which was derived from the luminosity function of field galaxies. Initial red sequence colour estimation proceeds in a similar manner as redshift, where the colour estimate is taken as the peak in the background-subtracted colour distribution of an inner cluster region. The inner cluster region is determined again by considering cones with a range of radii and selecting the cone which maximised the sum of the peak amplitudes. This region is used as the inner cluster region in subsequent analysis.

Red Sequence CMR
The red sequence CMR is determined using the initial estimates of redshift, colour and inner cluster radius as shown in Figure 6. First galaxies from the inner region are filtered in a manner similar to that used to identify red sequence galaxies; all galaxies within 2σ of the initial colour estimate are kept for further analysis, where A broad initial red sequence width is used with σRS = 0.1 to ensure only field contamination is removed.
With the red sequence dominating the remaining galaxies from the inner region, fitting a CMR proceeds using the bivariate correlated errors and intrinsic scatter (BCES) method (Akritas & Bershady 1996). This extends the standard least-squares method to account for intrinsic scatter and potentially correlated errors in both the dependent and independent variables. Additionally, the intrinsic width of the red sequence is determined from the distribution of colours around the CMR.
In practice the distribution of galaxies around the CMR found using BCES takes the form of a Gaussian. By correcting for the slope and using a single component, the width of the error-corrected Gaussian mixture gives a good estimate of the intrinsic scatter of the red sequence.

Red Sequence Redshift
Having isolated the red sequence from the background and inner cluster region using the determined CMR with measured intrinsic width with the 2σ approach, the peaks in the background subtracted photometric redshift distribution provide the potential cluster redshift. Additional estimates such as spectroscopic or colour model redshifts can be added at this point if desired by the user. If spectroscopic redshifts are available, averaging these for the red sequence galaxies can provide a more reliable redshift estimator. Finally the BCG is identified as the brightest galaxy on the red sequence in the inner region.

Candidate Selection
With the possibility of multiple candidates and the frequent ambiguity of cluster selection, the final step, shown in Figure 8, filters the results to produce a primary as the most likely cluster candidate and a secondary as the next likely possibility. To help this process the potential clusters are first filtered to ensure the initial redshift, red sequence redshift and BCG redshift are all appropriate for the colour band used. Reducing the redshift overlap in Table 1 helps to remove the same candidates analysed in multiple bands. If this removes all the potential clusters the filter is not applied and the selection process continues. The remaining candidates are then ranked, based first on the consistency of the three main redshift estimators, initial, red sequence and BCG. Four cleanness bands are introduced shown in Table 3 where the most desirable candidates have the highest value. The red sequence and BCG redshift disagree by more than 0.1 3 All three redshift estimators are not consistent with the colour band 4 All remaining candidates.
These consistency checks help to remove candidates which may not represent clusters but rather, random enhancements in the background or foreground. Finally, to further rank clusters and break degeneracy, those where the red sequence redshift best matches the initial estimate provides the best selection, reducing the chance that the best candidates are spurious. The primary clusters are simply the cleanest candidates which best match the primary initial redshift estimate. A secondary cluster is assigned as the cleanest candidate associated with the earliest secondary peak (i.e. initial secondary redshifts considered first etc.) which best matched the initial estimate. This selection procedure is shown in detail in Figure 7.
While comparing the GMPhoRCC estimates to previ-ously characterised spectroscopic clusters it is found that on average 35 percent of targets characterised have an associated secondary cluster and of these only 13 percent (5 percent of the total) better matches spectra than the primary. Again it has not only been shown that dealing with multimodal distributions is necessary but also that the selection process of GMPhoRCC is able to reliably select the most appropriate characterisation from the potential candidates for the cluster.

Richness
GMPhoRCC measures richness as the number of red sequence galaxies, defined by the 2σ filtering, within a given radius fainter than the BCG and brighter than some redshift dependent cut-off, m * (z) + 1. This takes the form of the maxBCG (Koester et al. 2007) and GMBCG (Hao et al. 2010) richness which ensures the red sequence i-band magnitude range is consistent as a function of redshift. In agreement with maxBCG, GMBCG and redMaPPer ) m * is taken from the luminosity function of field galaxies determined by Blanton et al. (2003).
For consistency across a range of cluster sizes, GM-PhoRCC considers n200, the richness inside the characteristic radius r200. As measuring r200 directly is only possible with gravitational lensing, an intermediate 0.5h −1 Mpc fixed aperture richness, n gals , is used following the analysis of maxBCG and GMBCG, from which r200 is found. Using the maxBCG clusters and the weak lensing derived r200 -n maxBCG 200 scaling relation from Hansen et al. (2009) the following scaling relation was found by binning clusters by n gals and fitting r200. Direct derivation of this relation using weak lensing r200 will reduce the scatter in this relation and is left for future work.
In addition to counting galaxies, richness is also estimated using the luminosity function method of High et al. (2010). This involves fitting a luminosity function within a magnitude range where the photometry is believed to be complete then integrating up to m * + 1. Finding this range is simply done by inspecting a magnitude histogram where a limit can be assigned after which the density drops with increasing magnitude. Rather than using the binning approach to fit a Schechter function a new probabilistic approach has been developed which gives more reliable fits across a range of cluster richnesses. With appropriate normalisation the luminosity function, Equation 7, gives the number of galaxies within the magnitude range m → m + dm and hence the probability that a galaxy has a particular magnitude given the parameters of the Schechter function can be approximated by Equation 8.
where θ represents the parameters of the Schechter function, namely φ * , m * and α. Using Bayes' theorem, the likelihood of the parameters, L, is given by combining the probabilities from all galaxies.
The cluster luminosity function is then defined by the parameters which maximise this likelihood. By assuming flat priors this is equivalent to minimising the log-likelihood shown in Equation 10.
In addition to using the High et al. (2010) fixed faint end (α = −1), constraining the parameters to satisfy Equation 11 greatly increases the reliability of the fit: φ(m, θ)dm = Total Number of Galaxies This ensures a reasonable luminosity function is recovered which, when integrated across the previously determined magnitude range, returns the total number of galaxies observed.
Minimising Equation 10 subject to the constraint shown in Equation 11 proceeds using the standard sequential least squares method described by Kraft (1988).
Although an improvement this method still produces unreliable results for very low numbers of galaxies hence, when fitting 5 or fewer data points, m * is fixed based on the cluster redshift and the luminosity function of field galaxies determined by Blanton et al. (2003).
Combining these methods, Figure 9 expands procedure 5 from Figure 2 in more detail showing the steps taken to estimate cluster richness. By using an input radius of 0.5h −1 Mpc the intermediate richness, n gals is determined, with n200 found by using r200.

Optical Data
Although designed for use with any optical data, initial calibration and development of GMPhoRCC has been driven with the use of optical data from the Sloan Digital Sky Survey. The tenth data release presented in Ahn et al. (2014), provides coverage of 14, 555 squared degrees in the northern hemisphere giving 95 percent completeness down to 21.3 magnitudes in i-band giving ∼ 90 million suitable galaxies, ∼ 1.9 million with spectra.
The input optical data were selected from the Galaxy view of PhotoObjAll table using the following query to ensure cleanness and completeness in photometry.
SELECT * from GALAXY WHERE (dered_i) < 21.0 AND (modelMagErr_g / dered_g) < 0.1 AND The 10 percent cut on colour errors with extinction and masking constraints, as used by Hao et al. (2010), ensures the optical data are clean which greatly improves the Gaussian mixture fitting procedure. In addition the colour cuts and i-band constraint helps to remove extreme objects with likely erroneous photometry which adversely bias the Gaussian Mixture models of the cluster candidates. While the i-band cut is specific to the SDSS, the colour and error constraints are recommended for GMPhoRCC regardless of the source of the optical data to ensure clean photometry.
In addition to multi-band photometry GMPhoRCC makes use of photometric redshifts rather than using assumed colour-redshift relations. Within the SDSS DR10, the PhotozRF table provides the most suitable redshifts, calculated using the random forest regression technique of Carliles et al. (2010). While not essential, these provide well understood Gaussian errors which are ideal for use with the error-corrected Gaussian Mixture models of Section 3.1.

Quality Control
One of the main goals of GMPhoRCC is to provide a means of quality control to help identify possible catastrophic failures. As part of this many flags have been introduced to trace how clusters propagate through the algorithm. These flags trace potential issues with fits, multi-modal distributions and inconsistent redshifts with a full list given in Appendix A.
Although a large source of ambiguity and potential failure results from the presence of multi-modal distributions, with the prevalence of these (seen in ∼ 70 percent of clusters) and the success of the candidate selection shown in Figure  7, these are not sufficient to identify catastrophic failures alone. The strongest indicators of failure however, are the presence of inconsistent redshifts or low richnesses.
Considering low richness, firstly this indicates that the distribution modelling may be unreliable, fitting many parameters to only a few data points. More importantly this could indicate an issue with the red sequence, either the candidate cannot be optically confirmed as a cluster or the red sequence has been missed altogether. In either case this is the strongest indication of catastrophic failure.
Large discrepancies between the red sequence and the BCG is also a strong indicator of catastrophic failure particularly with regards to redshift. Large discrepancies in redshift (larger than expected considering measurement error), can indicate either a problem with red sequence modelling, BCG selection or cluster redshift.
By combining these flags, quality markers are assigned to clusters as an indicator of the reliability of the optical characterisation, shown in Table 4. While Table 4. A list of the quality markers, q, assigned to clusters based on the GMPhoRCC flags.
q Description −1 no optical coverage 0 no characterisation found 1 n 200 < 1, large redshift inconsistencies, masking issues 2 n 200 < 3, small redshift inconsistencies 3 clean  Table A4), must be calibrated for specific sources of the photometric redshifts. Here ∆zcp and ∆zcs are respectively photometric and spectroscopic redshift consistency bounds where, for the SDSS DR10, ∆zcp = 0.035 and ∆zcs = 0.025. With less reliable photometric redshifts these should be relaxed to larger bounds. With these quality markers the characterisations can be separated into various quality subsets, as shown in Table  A5; 'clean' with q ≥ 3 representing the cleanest set with most problem clusters removed; 'mid' with q ≥ 2, a middle subset with only the worst clusters removed; and 'detection' with q ≥ 1, the full list of clusters considered to have been detected.

Computational Performance
GMPhoRCC is aimed primarily for use with standard desktop computers and as such does not require substantial computational resources. Development has proceeded using python 2.7.3 with the scipy 1 module providing many of the mathematics routines, particularly the sequential least squares method used to fit luminosity functions. GM-PhoRCC experiences two main bottlenecks, first from the retrieval of the optical data either from a database or local files and secondly from fitting Gaussian mixtures. While little can be done with the data retrieval, the Gaussian mixture fitting is developed using Fortran 90 which provides a factor of 10 speed improvement over native python and is twice as fast as the c++ version employed by Hao et al. (2010). The final performance improvement comes from the utilization of multiple threads available in even the most basic computers. While GMPhoRCC does not implement full parallelisation at the Fortran level, the Parallel Python 2 module allows for several cluster candidates to be analysed simultaneously. Although more were available little improvement was found beyond six threads due to restrictions in the retrieval of the optical data.
As an example of typical performance, 6 threads from an Intel 3770k 4.2GHz processor with 16GB of PC3-19200 RAM, accessing the optical data locally from a hard disk has a characterisation time of 42 seconds per cluster per thread allowing the full characterisation of the XCS catalogue, 503 clusters, within 59 minutes.

EVALUATION
Evaluation of GMPhoRCC proceeds with a two-prong approach, first by comparing characterisations with other algorithms using spectroscopic clusters and secondly by investigating mock galaxy clusters. In addition to driving the development process, particularly the calibration of the quality control system, these comparisons allow for detailed understanding of the GMPhoRCC optical selection function.

Comparison with existing catalogues
Comparisons with existing catalogues proceeded using spectroscopic clusters selected from the GMBCG (Hao et al. 2010), NORAS (Böhringer et al. 2000), RE-FLEX (Böhringer et al. 2004) and XCS (Mehrtens et al. 2012) catalogues. As richness measures are specific to the exact form of the algorithm and optical data, evaluation of the GMPhoRCC richness is thus deferred to analysis with mock clusters where comparisons with 'true' cluster values are possible.
With a total of 706 X-ray and 3795 optically detected clusters with spectra, direct evaluation of the GMPhoRCC redshift estimate is possible. Of the 4501 clusters, redshift estimates were found for 97.3 percent and compared to spectra and shown in Figure 10. Although some discrepancies are present the quality markers are shown to identify and remove the worst outliers. Additionally, while the majority of all estimates are within |zRS − zspec|/(1 + zspec) < 0.01, the clean subset attains a larger fraction within this bound and less contamination with outliers. It is noted however that at low redshifts, z < 0.1, many cluster estimates are erroneous where limitations in field area and poor contrast against the background result in cases where field galaxies dominate the cluster distributions making it difficult to isolate the red sequence. Incompleteness and increasing measurement errors in the photometry at high redshift again cause issues with the red sequence detection. In addition to these redshift limitations it expected that low richness clusters produce the most outliers, where it is more difficult to isolate and model the red sequence with a sparse number of galaxies.
In addition to comparisons with spectra, a subset of 131 XCS clusters with both spectroscopic and photometric  Figure 10. A comparison of GMPhoRCC photometric red sequence redshifts to spectra using 4501 clusters with DR10 coverage from GMBCG, NORAS, REFLEX and XCS. Left panel: A Scatter plot highlighting the quality control, showing from left to right, the detection, mid and clean subsets. While some discrepancies remain the majority of outliers have been removed in the clean subset. Although they have been correctly identified as problems, very low redshift clusters (z < 0.05) are not characterised well by GMPhoRCC due to poor contrast against the background and limitations in the field area. In addition high redshift clusters z > 0.5 are subject to large discrepancies due to incompleteness and increasing photometric errors. Right panel: The distribution of redshift comparisons where the results has been normalised and split into the separate quality subsets where the legend shows the fraction of the total clusters in each set. While the majority of all estimates are within |z RS − zspec|/(1 + zspec) < 0.01, the clean subset can again be seen to have removed the worst estimates with a greater fraction attaining this bound.
redshifts provide an excellent resource to compare the performance of GMPhoRCC and XCS. Figure 11 shows the substantial improvement offered by GMPhoRCC, providing more accurate estimates with lower scatter around the spectroscopic redshifts. In addition to providing more accurate redshifts, the estimates are independent of any colourredshift model as employed by XCS.

Richness Scaling
To assess the validity of the GMPhoRCC richness as a mass proxy, an initial investigation of richness scaling is explored for the x-ray clusters from the XCS catalogue. Of particular interest is the determination of X-ray -optical scaling relations, which, in work similar to Kloster et al. (2011) and Rykoff et al. (2008), relies on the tight correlation between X-ray observables, such as temperature, to the cluster mass, in order to calibrate the GMPhoRCC richness as an optical mass proxy. While this paper illustrates the validity of the GMPhoRCC richness as a proxy for cluster properties, a complete analysis of such richness scaling is left for future work.
The previous subset of 131 clean spectroscopic clusters from XCS are analysed to determine the correlation between GMPhoRCC richness and X-ray temperature, modelled as power law scaling relation similar to those used by Rykoff et al. (2008) and defined below: where α and β are constants. Determination of this relation proceeds by stacking the clusters in richness bins and using the BCES method of Section 3.2 and Akritas & Bershady (1996). While this is a rather simplistic approach for illustration, future analysis is indented using more sophisticated techniques, such as the Bayesian method of Rykoff et al. (2008) and Rozo & Rykoff (2014). Figure 12 demonstrates the Tx -n200 scaling relation finding, α = −0.08 ± 0.09 and β = 0.43 ± 0.03 with a scatter σ ln Tx| ln n 200 = 0.14. Again, as an illustration only, a clear correlation can be observed highlighting the validity of the GMPhoRCC richness as an optical proxy for cluster properties.

SDSS-like Mocks
While comparisons with existing catalogues are a useful tool to evaluate a characterisation method these can only The distribution of the redshift comparisons for the clean subset highlighting the substantial improvement offered by GMPhoRCC over XCS, providing more accurate estimates with a lower scatter around the spectroscopic redshift. Although the detection subset has a few more extreme outliers a much greater fraction than from XCS agree within |z phot − zspec|/(1 + zspec) < 0.01, with the clean subset attaining the highest fraction in this band. In addition to providing more accurate redshifts, the estimates are independent of any colour-redshift model as employed by XCS.
take us so far. Existing methods are subject to their own strengths, weaknesses and selection functions hence comparisons with a controlled 'truth' are considered with the use of mock galaxy clusters. Mock clusters can be constructed with known redshifts, richnesses and CMRs either through simulations (Cai et al. 2009, Murphy et al. 2012, Song et al. 2012 or empirically (Koester et al. 2007, Hao et al. 2010.
SDSS-like empirical mocks are constructed for use with GMPhoRCC by adding artificial clusters to field galaxies, derived from existing cluster detections and and SDSS optical data. This has the advantage of producing mocks tailored to match the available photometry allowing the specific evaluation of GMPhoRCC for the optical data.
Artificial clusters were generated by resampling galaxies from existing red sequences and BCGs to reproduce five main aspects of real clusters.
(i) A suitable BCG (ii) Radial profile (iii) Redshift distribution (iv) Luminosity function (v) CMRs / Colour distributions As these are dependent on the properties of the cluster it is necessary to resample from red sequences which best match the target mock. Rather than using a small number of well observed seed clusters, red sequences were identified and stacked in redshift/richness space in order to provide a source of galaxies suitable for a range of mock properties.
Using GMPhoRCC, 10, 000 very clean red sequences were identified from the C4, GMBCG, REFLEX, NORAS and XCS catalogues with very good agreement between spectroscopic and GMPhoRCC redshifts, |zRS − zspec| < 0.005. By separately stacking the BCG and red sequence galaxies of these clusters in redshift/richness bins, a larger source is produced to sample cluster properties than from considering these individually. Stacking many clusters in this way ensures that each bin is dominated by the red sequence where the bulk properties are representative of a cluster with the bin redshift and richness.
While the available richness were fixed by the original clusters, extrapolation by adding a fixed ∆z to each galaxy allowed a larger redshift range to be sampled. Photometry was then adjusted with K+e corrections to account for evolution and observations at different redshifts. K-corrections were performed using KCORRECT v4.2 from Blanton & Roweis (2007) with evolutionary corrections taken from Koester et al. (2007). Colour evolution models from Tojeiro et al. (2011) were also considered but provided no significant deviation from the main results of this section.
While extrapolating to much higher redshifts care is needed to reproduce appropriate errors. This mainly affects high redshift artificial clusters which should possess higher errors than the low redshift seed due to the fainter photometry. To reproduce appropriate errors, a sample of ∼ 500, 000 red sequence galaxies are used to model the various error distributions. For photometry errors the distribution is modelled as a function of band magnitude, with redshift considering the distribution as a function of i-band magnitude and redshift. For high redshift extrapolation, a new error is drawn from this distribution with magnitudes and redshifts updated by randomly shuffling about this error. Although these errors depend on a number of things including seeing, this method reproduces sensible results providing good agreement with existing high redshift clusters as shown in Figure 13.
Generation of the artificial clusters now proceeds as follows.
(i) Randomly select a redshift and richness. (ii) Select the closest redshift/richness bin. (iii) Resample with replacement, a BCG and red sequence.
(iv) Apply a fixed ∆z to extrapolate from bin to mock cluster redshift.
(v) K+e correct photometry   (vi) Sample suitable errors and reshuffle redshift and photometry.
Finally, to simulate SDSS completeness levels members were removed with i-band > 21 magnitudes.
Appropriate backgrounds for these artificial clusters are constructed by removing the red sequence from the original 10, 000 fields considered by GMPhoRCC. The list of backgrounds are binned in the same redshift/richness space as used previously according to the properties of the cluster. Real backgrounds are assigned by randomly selecting from the bin which best matches the properties of the artificial cluster. A total of 8745 mocks were prepared with 0.05 < z < 1.1 and 5 ≤ n200 < 75 by randomly inserting the artificial cluster within 3 arc minutes of the centre of the assigned backgrounds. This method has the advantage of modelling the background as a function of the local neighbourhood where the background densities encounter would be typical for the given properties of the artificial cluster.

Richness Consistency
With the use of multiple red sequence bands it is necessary to ensure the GMPhoRCC richness estimate is consistent across the large redshift ranges considered which is indeed confirmed by analysis of the artificial clusters. Figure 14 shows how the GMPhoRCC estimate richness of artificial clusters, generated at z = 0.1, evolves as these are extrapolated across 0.05 < z < 1.1. While incompleteness results in a loss of richness above z > 0.45, the GMPhoRCC estimate is consistent at lower redshifts. In addition the luminosity estimate has been shown to extrapolate into regions with incomplete photometry.

Comparison with Mocks
With the SDSS mocks, direct comparisons of GMPhoRCC estimates to 'true' cluster values are possible. In order to asses accuracy and bias of GMPhoRCC the i-band< 21 cut on mock members is not used, where the analysis of the full effect of incomplete photometry is deferred to a study of completeness in Section 4.6. Of the 7050 mock, estimates were found for 99.2 percent and compared to the cluster values and shown in the left panel of Figure 15. Redshift comparisons agree with those from real spectroscopic clusters where the GMPhoRCC are unbiased with the majority achieving |zRS − z mock |/(1 + z mock ) < 0.01. In addition the clean subset attains a larger fraction within this bound and less contamination with outliers again highlighting the value of the quality control system.
Richness comparisons, presented in right panel of Figure 15, confirm that the GMPhoRCC estimate is unbiased with |n200−count − n 200−mock | = 0.01 ± 0.005 and |n 200−lum −n 200−mock | = −0.03±0.02. In addition it is clear both the counting and luminosity function method are able to adequately recover cluster richness. An accurate richness estimate is far more challenging to determine than redshift as evident by the larger scatter. These difficulties arise due to the discreteness of n200 and the sensitivity to discrepancies in redshift, r200, n gals , BCG identification, CMR modelling and projection effects. In addition the luminosity method is subject to a lager scatter as a result of extra complexity and uncertainty introduced by fitting and integrating a luminosity function.

Purity
Although the target clusters for GMPhoRCC have already been detected in other wavebands (e.g. X-ray), it is important to understand purity when using the code to optically confirm a candidate or in cases where the candidate list may be contaminated. By using random real backgrounds only, purity is estimated as the fraction of fields where no cluster was detected i.e. detections in this case are impurities. While this only tackles the issue of false detections, the validity of  the various GMPhoRCC estimates are assessed further in Section 4.6, which may be incorrect for a number of reason including projection effects. Table 6 presents GMPhoRCC purity results which represent the probability that a candidate is in fact a cluster given that it was assigned a particular quality marker and richness. Very few spurious characterisations are found with high quality or richness, i.e. these have the highest probability of representing real clusters. Of particular note is the fact that candidates belonging to the clean subset, q ≥ 3, have a negligible probability of resulting from a false detection. It is noted that GMPhoRCC attains extremely high levels of purity compared with maxBCG which attains ∼ 93 percent for clusters with n200 = 10 and ∼ 99 percent for n200 = 15. Similarly compared with GMBCG which attains purity levels of ∼ 75 percent for n200 > 10 and ∼ 97 percent for n200 > 25.

Completeness
One of the most important properties to evaluate is completeness; this gives a measure of how well clusters are characterised across a range of redshifts and richnesses. Completeness is measured as the fraction of mock clusters where the estimated properties agree with the actual value within a given bound. In order to estimate the optical selection function, completeness is considered with respect to redshift, richness and BCG matching.

Redshift Recovery
Using the fraction of the clean subset which attains the bound |zrs − z mock | < 0.03, comparable to typical SDSS photometric redshift errors, Figure 16 highlights completeness as a function of both richness and redshift with full results shown in Table 7. For z mock < 0.5, the majority of the GMPhoRCC estimates are in very good agreement with the mock value, with high levels of completeness attained. Above this point photometry incompleteness results in difficulties in modelling the red sequence resulting in the lower completion. In addition to this, limitations in field area and poor contrast against the background for low redshift clusters, z < 0.1, makes the red sequence more difficult to isolate and model, resulting in the lower fraction of clusters with good redshift estimates. As expected low richness cluster suffer from lower completeness due the difficulties in modelling sparse data sets. In addition these are seen to be more susceptible to photometry cuts resulting in the earlier reduction in completeness.
Extending this completeness analysis by considering the subset of clusters with a given q and zRS, the accuracy of the GMPhoRCC redshift is estimated. Shown in the right panel of Figure 16 is the fraction of these subsets attaining the The fraction of the clean subset of mock clusters for different richness bands where the redshift estimate is within |z RS − z mock | < 0.03. Low richness clusters are more sensitive to incomplete photometry due to the already low number of galaxies. Isolating the red sequence and estimating redshift is more challenging for low richness clusters than their high richness counterparts at the same redshift. Hence the ability to reliably estimate cluster redshift drops more quickly with redshift for groups than rich clusters. Due to this difficulty the clean set is also subject to an earlier reduction in completeness and a lower fraction of low richness clusters across all redshifts. Right panel: The fraction of mock clusters with a given q which achieve the |z RS − z mock | < 0.03 bound. In addition to the clean subset, q ≥ 3, achieving a very high probability (> 97 percent) that the redshift estimate is within 0.03 of the mock value, those with lower quality for z RS < 0.45 have low probabilities (< 25 percent), again showing the ability of the quality marker to identify and remove potential outliers. The sparse number of galaxies above z > 0.45 in the SDSS DR10 and the mock background results in a low chance of spurious high redshift estimates, hence given z RS > 0.45 there is a larger probability the redshift is associated with the cluster. In addition, those with good high redshift estimates are more likely to be flagged as low richness due to the incomplete photometry. This leads to higher probabilities the estimate is associated with the cluster than expected for q < 3. redshift bound which represents the probability that given a cluster has a specific q and zrs achieves |zrs − z mock | < 0.03. In addition to the clean subset, q ≥ 3, achieving a very high probability (> 97 percent) that the redshift estimate is within 0.03 of the mock value, those with lower quality for z < 0.45 have low probabilities (< 25 percent), again showing the ability of the quality subsets to identify and remove potential outliers. The sparse number of galaxies above z > 0.45 in the SDSS DR10 and the mock background results in a low chance of spurious high redshift estimates, hence given zRS > 0.45 there is a larger probability the redshift is associated with the cluster. In addition, those with good high redshift estimates are more likely to be flagged as low richness due to the incomplete photometry. This leads Table 8. A list of the probabilities that a redshift estimate is within various bounds of the actual value given the z RS estimate and q value of the cluster, where ∆z = |z RS − z mock |. The increase in probability for low quality high redshift clusters is clear. The sparse number of galaxies above z > 0.45 in the SDSS DR10 and the mock background results in a low chance of spurious high redshift estimates hence given z RS > 0.45 there is a larger probability the redshift is associated with the cluster than expected for those with q < 3. to higher probabilities the estimate is associated with the cluster than expected for q < 3. While adjustments could be made to the quality subsets to take advantage of this increased probability it is noted that for z > 0.45 the lower quality mainly result from low numbers of galaxies due to incompleteness and hence the current quality subsets are necessary to maintain cleanliness for both redshift and richness estimates. A full set of probabilities for each quality marker and several bounds are presented in Table 8.

Richness Recovery
While two richnesses are investigated by GMPhoRCC, n200 best represents an optical mass proxy, considering galaxies within a characteristic radius, rather than the fixed aperture of n gals , and hence is the subject of this section. With the extra n gals step additional sources of error are introduced and with n200 highly sensitive to correct CMR modelling, BCG selection and redshift richness attains much a much larger spread about the mock value and thus relatively large completeness bounds are considered. Figure 17 highlights richness completeness as the fraction of the clean subset where the counting richness is within 25 percent of the mock value. Completeness in both the counting and luminosity estimate tails off above z > 0.45 due to incomplete photometry where cluster galaxies become too faint for reliable detection. It is noted that the luminosity method is able to extrapolate richness resulting in a slower reduction with redshift and higher completeness than the counting method for z > 0.45.
As stated in the previous sections, low richness clusters are difficult to model and analyse due to difficulties in fitting distributions to a small number of galaxies, and this is reflected in the lower completeness rates. In addition to this, background fluctuations which, in this case can cause as much as an 80% discrepancy due to discreteness further reduces completeness. Table 9 summarises and extends these results to the luminosity richness, n 200−lum .
Again completeness is considered with respect to subsets with a given q and zrs to estimate the accuracy of GM-PhoRCC richness. Shown in the right panel of Figure 17 is the fraction of these subsets which attain the richness bound, representing the probability that given a cluster has a specific q and zrs that the richness estimate is within 25 percent of the mock value. A full set of probabilities for each quality marker and several bounds are presented in Table  10. In addition to the clean subset, q ≥ 3, achieving a very high probability (> 80 percent) that the richness estimate is within 25 percent of the mock value, those with higher quality markers have low probabilities (< 15 percent), again showing the ability of the quality subsets to identify and remove potential outliers. In addition, while the counting richness has negligible probability of matching the mock at high redshift (0.5 < zRS < 0.8), the luminosity method is clearly able to extrapolate, achieving a 30 percent probability that the richness is with 25 percent of the mock value.

BCG Identification
Identifying the correct BCG is not only hugely important for subsequent cosmology but also for calculating cluster richness. This analysis considers two scenarios, one where the BCG is correctly identified and one where any cluster member is selected as the BCG. While correctly matching the BCG shows the strongest evidence GMPhoRCC has suitably modelled the red sequence, even matching to a cluster member suggests the CMR is a reasonable representation of the cluster.
Mismatching the BCG results from two main issues, background interlopers and poor red sequence modelling. While mismatching to a background galaxy is easier to find with the quality markers due to inconsistencies in redshift, matching to another cluster member can be more challenging to identify. Figure 18 shows the fraction of the clean subset of mocks where the BCG has been correctly matched. As photometry becomes incomplete issues with fitting the red sequence due to the lower number of galaxies gives rise to the lower fraction matched above z > 0.5. In addition to this the difficulty in modelling the red sequence at low redshift, z < 0.1, due to poor contrast against the background and limitations in the field area, result in a smaller fraction of these mocks with correctly matched BCG. Again it is expected that a smaller fraction of low richness clusters have suitably determined CMRs due to the difficulty in modelling a sparse number of galaxies and this is reflected in the lower BCG match rates. Table 11 summarises and extends these results to correct BCG matching and cluster member BCG matching for each of the quality subsets.
Again considering the BCG matching fractions with respect to subsets with a given q and zRS gives an estimate of the probability that the BCG has been correctly matched given the cluster is consistent with the subset. The right panel of Figure 18 shows these fractions for clusters where the BCG has been correctly identified. A full set of probabilities for each quality marker and the different BCG sources are presented in Table 12. In addition to the clean subset, q ≥ 3, achieving a very high probability (> 90 percent) In addition low richness clusters are more susceptible to this decline due to modelling the already low number of galaxies. In addition to this, n 200 is highly sensitive to errors in redshift, BCG selection and n gals resulting in lower completion rates than with redshift. Right panel: The fraction of mock clusters with a given q where the n 200 estimate was within 25 percent of the original value. In addition to the clean subset, q ≥ 3, achieving a high probability (> 80 percent) that the richness estimate is within 25 percent of the mock value, those with lower quality have very low probabilities (< 15 percent), again showing the ability of the quality subsets to identify and remove potential outliers. The fraction of the clean subset of mock clusters for different richness bands where the BCG has been correctly identified. Above z > 0.5 incompleteness in photometry results in difficulties fitting the red sequence resulting the reduction of matching rates. In addition with the difficulty in modelling low richness clusters these suffer from earlier declines and lower matching rates. Right panel: The fraction of mock clusters with a given q where the BCG has been correctly identified. In addition to the clean subset, q ≥ 3, achieving a very high probability (> 90 percent) that the BCG has been correctly identified, those with lower quality for z < 0.45 have low probabilities (< 10 percent), again showing the ability of the quality subsets to identify and remove cases where the red sequence has not been well modelled. Again the sparse number of galaxies above z > 0.45 in the SDSS DR10 and the mock background results in a low chance of spurious high redshift estimates hence given z RS > 0.45 there is a larger probability the CMR and BCG are associated with the cluster. In addition, those with suitable CMRs at high redshifts are more likely to be flagged as low richness due to the incomplete photometry. This leads higher probabilities the CMR and BCG is associated with the cluster than expected for q < 3.
that the BCG has been correctly identified, those with lower quality for z < 0.45 have low probabilities (< 10 percent), again showing the ability of the quality subsets to identify and remove cases where the red sequence has not been well modelled. The sparse number of galaxies above z > 0.45 in the SDSS DR10 and the mock background results in a low chance of spurious high redshift estimates hence given zRS > 0.45 there is a larger probability the CMR and BCG are associated with the cluster. In addition, those with suitable CMRs at high redshifts are more likely to be flagged as low richness due to the incomplete photometry. This leads to higher probabilities the CMR and BCG is associated with the cluster than expected for q < 3. Again no adjustments are made to the quality subsets since incomplete photometry becomes an issue for zRS > 0.45 with the current subsets necessary to maintain cleanliness for both redshift and richness estimates.

DISCUSSION AND CONCLUSIONS
Presented in this paper is the Gaussian Mixture full Photometric Red sequence Cluster Characteriser (GM-PhoRCC), which is designed to take cluster candidates, previously detected, and provide an optical confirmation and characterisation based on the red sequence. GMPhoRCC has been designed specifically to attain estimates of redshift, richness and the red sequence CMR and offers many advantages over existing algorithms including, treatment of multi-modal distributions, treatment of a variable width full CMR red sequence, richness extrapolation and quality control. One of the most important features developed is the flag and quality control procedure. By flagging issues, particularly low richness and inconsistent red sequence and BCG redshifts, potential catastrophic failures can be identified and removed from cleaner subsets. Comparisons with other characterisation methods highlights the advantages of GMPhoRCC. Using a sample of 4501 clusters taken from the GMBCG (Hao et al. 2010), NORAS (Böhringer et al. 2000), REFLEX (Böhringer et al. 2004) and XCS (Mehrtens et al. 2012) catalogues, GM-PhoRCC redshift estimates are compared to spectra showing low scatter (σ δz/(1+z) ∼ 0.026) around the actual value. In addition applying the quality control to produce a clean subset removes most outliers giving a much tighter agreement, σ δz/(1+z) ∼ 0.017 showing significant improvement over maxBCG, σ δz/(1+z) ∼ 0.025, and XCS, σ δz/(1+z) ∼ 0.050. The high accuracy of GMPhoRCC is also demonstrated with a significant percentage (∼ 75%) of all redshift estimates from the clean subset agreeing within |zRS − zspec| < 0.01.
While analysing known clusters provides useful feed- back, comparisons with those with known properties are far more valuable, hence the remaining evaluation of GM-PhoRCC proceeded with the use of empirical mock galaxy clusters. These mocks were produced by stacking red sequence galaxies from existing clusters, analysed using data from the Sloan Digital Sky Survey (SDSS), in redshiftrichness bins from which new sequences are resampled. This extends the similar approach of maxBCG and GMBCG where only rich clusters are used as seeds to generate mocks with a range of properties. Assessment of the optical selection function proceeded with the consideration of completeness, the fraction of mocks with characterisations within given bounds of the actual value. First incomplete photometry, simulated by an i-band < 21 cut, is shown to remove members for clusters with z > 0.45. Redshift completeness, the fraction of clusters within 0.03 of the mock value, is not immediately hindered by the photometry attaining 93% for 0.05 < z < 0.62 for Table 12. A list of the probabilities that the BCG has been matched to various sources given the cluster has a specific quality marker and redshift estimate. In both scenarios the clean set demonstrates the highest probability that the red sequence has been suitably modelled. clusters with a richness greater than 20. With the large scatters in the estimates, richness attains lower completeness rates, mostly due to projection effects and background fluctuations as also noted by Hao et al. (2010). The fraction of clusters within 25% of the mock value, defining completeness, is measured as 91% for 0.07 < z < 0.45 for clusters with richness greater than 20, 78% for those with richness between 10 and 20, and 64% for those with richnesses less than 10. Additionally evaluation with mocks had confirmed the value of the quality control system showing a high probability that given a cluster is in the clean set that the redshift and richness estimates are within a given bound of the mock value. Most importantly it was shown that those with lower quality markers, indicating less confidence in the characterisation, show much smaller probabilities confirming that the quality control is effective in identifying potential catastrophic failures. Multiple peaks in initial z distribution -relative heights < 5 MULTI INITIAL AMBIGUOUS 0x00000000020 Multiple peaks in initial z distribution -relative heights < 2 MULTI INITIAL CLOSE 0x00000000040 Primary and secondary peak within 0.1 of each other MULTI COLOUR 0x00000000100 Multiple peaks in the colour distribution -relative heights < 5 MULTI COLOUR AMBIGUOUS 0x00000000200 Multiple peaks in the colour distribution -relative heights < 2 MULTI COLOUR CLOSE 0x00000000400 Primary and secondary peak within 0.2 mag of each other MULTI ZRS 0x00000001000 Multiple peaks in the RS z fit -relative heights < 5 MULTI ZRS AMBIGUOUS 0x00000002000 Multiple peaks in the RS z fit -relative heights < 2 MULTI ZRS CLOSE 0x00000004000 Primary and secondary peak within 0.1 of each other Table A2. A list of GMPhoRCC flags indicating issues with the redshift or richness estimates which give the strongest indication an estimate may be erroneous. Here ∆zcp and ∆zcs are respectively photometric and spectroscopic redshift consistency bounds where, for the SDSS DR10, ∆zcp = 0.035 and ∆zcs = 0.025.

Name
Value Description SPARCE INITIAL 0x00000010000 < 5 Galaxies found in the cluster region for the initial z fit SPARCE COLOUR 0x00000020000 < 5 Galaxies found in the cluster region for the colour fit SPARCE ZRS 0x00000040000 < 5 Galaxies found in the cluster region for the RS z fit LOW RICHNESS N200 3 0x00000100000 Low counting richness recovered, n 200−count < 3 INCONSISTENT Z PHOT 0x00000200000 z RS and z BCG−phot are inconsistent with each other, |z RS − z BCG−phot | > ∆zcp INCONSISTENT Z SPEC 0x00000400000 z RS and z BCG−spec are inconsistent with each other, |z RS − z BCG−spec | > ∆zcs LOW RICHNESS N200 1 0x00001000000 Low counting richness recovered, n 200−count < 1 INCONSISTENT Z PHOT 2X 0x00002000000 z RS and z BCG−phot are inconsistent with each other, |z RS − z BCG−phot | > 2∆zcp INCONSISTENT Z SPEC 2X 0x00004000000 z RS and z BCG−spec are inconsistent with each other, |z RS − z BCG−spec | > 2∆zcs Table A3. A list of GMPhoRCC flags relating to the non-detection of a cluster overdensity.

Name
Value Description CLUSTER INSIDE MASK 0 5 MPC 0x00010000000 Empty apertures found inside r < 0.5h −1 Mpc of cluster centre CLUSTER INSIDE MASK R200 0x00020000000 Empty apertures found inside r < r 200 of cluster centre CLUSTER INSIDE MASK 5 AM 0x00040000000 Empty apertures found inside r < 5 ′ of cluster centre NO OVERDENSITY INITIAL 0x00100000000 No overdensity found in the cluster region for the initial z fit NO OVERDENSITY COLOUR 0x00200000000 No overdensity found in the cluster region for the colour fit NO OVERDENSITY ZRS 0x00400000000 No overdensity found in the cluster region for the RS z fit NO CLUSTER INITIAL 0x01000000000 0 Galaxies found in the cluster region for the initial z fit NO CLUSTER COLOUR 0x02000000000 0 Galaxies found in the cluster region for the colour fit NO CLUSTER ZRS 0x04000000000 0 Galaxies found in the cluster region for the RS z fit NO DETECTION REDSHIFT 0x10000000000 No detection in redshift module NO DETECTION RICHNESS NGALS 0x20000000000 No detection in richness, n gals < 0 for both counting and luminosity NO DETECTION RICHNESS N200 0x40000000000 No detection in richness, n 200 < 0 for both counting and luminosity NO COVERAGE 0x80000000000 No optical coverage This paper has been typeset from a T E X/L A T E X file prepared by the author. Table A4. A list of the quality markers assigned to clusters based on the GMPhoRCC flags.

Quality flags Value Description
−1 0x80000000000 ≤ flags no optical coverage 0 0x01000000000 ≤ flags < 0x80000000000 no characterisation found 1 0x00001000000 ≤ flags < 0x01000000000 n 200 < 1, large redshift inconsistencies, field masking issues 2 0x00000100000 ≤ flags < 0x00001000000 n 200 < 3, small redshift inconstancies 3 flags < 0x00000100000 clean The cleanest subset removing the majority of outliers i.e. removing cluster with low richness and discrepancies between redshift estimates Table A6. A list of outputs generated by GMPhoRCC using SDSS DR10 photometry. These properties are given for the primary and secondary cluster with the latter denoted by a ' sec' suffix. In the case where GMPhoRCC was unable to determine a property a default value of −1 is used. While the redshift labels are specific to SDSS DR10 these can be adjusted to match any optical input.

Name Description band
The red sequence colour used to detect the cluster. 0 = g − r, 1 = r − i, 2 = i − z. size The angular radius in arc minutes of the initial aperture used to model the red sequence. z initial The position of the peak of the initial redshift distribution. z initial peak The size of the peak of the initial redshift distribution (galaxies . arc minutes −2 ). z initial errorm(p) 1-sigma error on the peak in the 'minus' ('positive') direction. z initial info A flag based on how the error was determined. 0 = No issues, 1 = Extrapolation needed due to multiple peaks. rs colour (peak,error,info) The position, amplitude and error of the peak in the initial red sequence colour distribution. z rs (peak,error,info) The position, amplitude and error of the peak in the red sequence photometric redshift distribution.

BCG objID
The objID of the BCG.

BCG dis
The angular distance in arc minutes of the BCG from the cluster centre. z BCG best (err) The best redshift with error of the BCG, spectra if available, photometric otherwise. z BCG phot (err) The photometric redshift of the BCG z BCG spec (err) The spectroscopic redshift of the BCG. z gals spec (err) A spectroscopic cluster redshift based on the spectra of the 5 brightest galaxies on the red sequence. z gals spec no The number of galaxies available with spectra. cmr grad (err) The gradient of the red sequence CMR. cmr intercept (err) The intercept of the red sequence CMR. cmr width The intrinsic width the red sequence CMR. ngals count (err) n gals−count , the background-subtracted number of galaxies inside 0.5h −1 Mpc on the red sequence with poissonian error. ngals lum (err) n gals−lum , the background-subtracted richness inside 0.5h −1 Mpc from integrating a LF with poissonian error. r200 mpch-1 r 200 in h −1 Mpc. n200 count (err) n 200−count , the background-subtracted number of galaxies inside r 200 on the red sequence with error. n200 lum (err) n 200−lum , the background-subtracted richness inside r 200 from integrating a LF with error. flags A hexadecimal combination of the GMPhoRCC flags. q The quality marker based on the GMPhoRCC flags.