![]() 2013, for an overview of different techniques). The parametric encoding of the complex stellar physics coupled with the uncertainty of the parameters of the stellar population models, combine to produce redshift estimates which are little better than many non-parametric techniques (see e.g. Some templates encode our knowledge of stellar population models which result in predictions for the evolution of galaxy magnitudes and colours. ![]() Photometric redshifts can also be estimated by parametric techniques, for example from galaxy spectral energy distribution (SED) templates. ![]() This paper examines how the spectroscopic training set can be augmented (or complimented) to span an input feature space that more closely resembles that of the full photometric galaxy sample, to improve redshift estimates using machine learning. In particular the spectroscopic sample is often a biased sample of the full photometric galaxy catalogue due to the limiting magnitude that a spectroscopic redshift for a galaxy can be measured, and the deeper limiting magnitude that a galaxy may be identified photometrically. Measuring accurate spectroscopic redshifts is costly and time intensive, and is typically only performed for a small subsample of all galaxies. Photometric surveys can be maximally exploited for large-scale structure analyses once galaxies have been identified and their positions on the sky and in redshift space have been measured. These results have applications for surveys which have a spectroscopic training set which forms a biased sample of all photometric galaxies, for example if the spectroscopic detection magnitude limit is shallower than the photometric limit.Ĭatalogues, surveys, galaxies: distances and redshifts INTRODUCTION We find that at all apparent magnitudes explored, the use of data augmentation with tree-based methods provide an estimate of the galaxy redshift with a low value of bias, although the error on the recovered redshifts increases as we probe to deeper magnitudes. We finally quantify how the recovered redshifts degrade as one probes to deeper magnitudes past the artificial magnitude limit of the bright training sample. The outlier fraction is also reduced by at least 10 per cent and up to 80 per cent using data augmentation. We find that data augmentation reduces the error on the recovered redshifts by 40 per cent in both sets of analyses, when compared to the difference in error between the ideal case and the non-augmented case. We obtain redshift estimates for the remaining faint galaxy sample, which are not used during training. We construct a base training set by imposing an artificial r-band apparent magnitude cut to select only bright galaxies and then augment this base training set by using simulations and by applying the k-correct package to artificially place training set galaxies at a higher redshift. We perform two sets of analyses by selecting 800 000 (1.7 million) Sloan Digital Sky Survey Data Release 8 (Data Release 10) galaxies with spectroscopic redshifts. Data augmentation makes a training sample more closely resemble a test sample, if the two base samples differ, in order to improve measured statistics of the test sample. ![]() We present analyses of data augmentation for machine learning redshift estimation.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |