مجال
التميز
|
تميز دراسي وبحثي + إبداع علمي (براءة اختراع)
|
|
|
البحوث المنشورة
|
|
البحث (1):
|
|
عنوان البحث:
|
A sparse Gaussian process framework for photometric
redshift estimation
|
رابط إلى البحث:
|
Click
Here
|
تاريخ النشر:
|
21 November
2015
|
موجز عن البحث:
|
Accurate
photometric redshifts are a lynchpin for many future experiments to pin down
the cosmological model and for studies of galaxy evolution. In this study, a
novel sparse regression framework for photometric redshift estimation is
presented. Synthetic data set simulating the Euclid survey and real data from
SDSS DR12 are used to train and test the proposed models. We show that
approaches which include careful data preparation and model design offer a
significant improvement in comparison with several competing machine learning
algorithms. Standard implementations of most regression algorithms use the
minimization of the sum of squared errors as the objective function. For
redshift inference, this induces a bias in the posterior mean of the output
distribution, which can be problematic. In this paper, we directly minimize
the target metric Δz = (zs − zp)/(1 + zs) and address the bias problem via a
distribution-based weighting scheme, incorporated as part of the optimization
objective. The results are compared with other machine learning algorithms in
the field such as artificial neural networks (ANN), Gaussian processes (GPs)
and sparse GPs. The proposed framework reaches a mean absolute Δz =
0.0026(1 + zs), over the redshift range of 0 ≤ zs ≤ 2 on
the simulated data, and Δz = 0.0178(1 + zs) over the entire redshift range on
the SDSS DR12 survey, outperforming the standard ANNz used in the literature.
We also investigate how the relative size of the training sample affects the
photometric redshift accuracy. We find that a training sample of >30 per
cent of total sample size, provides little additional constraint on the
photometric redshifts, and note that our GP formalism strongly outperforms
ANNz in the sparse data regime for the simulated data set.
|
|
|
البحث (2):
|
|
عنوان البحث:
|
GPz: non-stationary
sparse Gaussian processes for heteroscedastic uncertainty estimation in photometric
redshifts
|
رابط إلى البحث:
|
Click
Here
|
تاريخ النشر:
|
11 July
2016
|
موجز عن البحث:
|
The next generation of cosmology experiments will be
required to use photometric redshifts rather than spectroscopic redshifts.
Obtaining accurate and well-characterized photometric redshift distributions
is therefore critical for Euclid, the Large Synoptic Survey Telescope and the
Square Kilometre Array. However, determining accurate variance predictions
alongside single point estimates is crucial, as they can be used to optimize
the sample of galaxies for the specific experiment (e.g. weak lensing, baryon
acoustic oscillations, supernovae), trading off between completeness and
reliability in the galaxy sample. The various sources of uncertainty in
measurements of the photometry and redshifts put a lower bound on the
accuracy that any model can hope to achieve. The intrinsic uncertainty
associated with estimates is often non-uniform and input-dependent, commonly
known in statistics as heteroscedastic noise. However, existing approaches
are susceptible to outliers and do not take into account variance induced by
non-uniform data density and in most cases require manual tuning of many
parameters. In this paper, we present a Bayesian machine learning approach
that jointly optimizes the model with respect to both the predictive mean and
variance we refer to as Gaussian processes for photometric redshifts (GPZ).
The predictive variance of the model takes into account both the variance due
to data density and photometric noise. Using the Sloan Digital Sky Survey
(SDSS) DR12 data, we show that our approach substantially outperforms other
machine learning methods for photo-z estimation and their associated
variance, such as TPZ and ANNZ2. We provide a MATLAB and PYTHON
implementations that are available to download at
https://github.com/OxfordML/GPz.
|
|
|
المؤتمرات العلمية:
|
|
المؤتمر (1):
|
|
عنوان المؤتمر:
|
The Euclid UK meeting
|
تاريخ الإنعقاد:
|
17 – 18
December 2015
|
مكان
الإنعقاد:
|
Edinburgh, UK
|
طبيعة المشاركة:
|
Oral presentation
|
عنوان المشاركة:
|
Sparse
Gaussian Framework for Photometric Redshift Estimation with Input-Dependent
Variance Prediction
|
ملخص المشاركة:
|
It is
vital to produce reliable confidence intervals alongside single point
estimates of photometric redshift. The various sources of uncertainty (and noise)
in measurements and subsequent calculations of photometry and spectroscopy
put a lower bound to the accuracy that any model can hope to achieve. The
results of of many machine learning approaches proposed in past years seem to
indicate that this limit has been reached. As in many complex problems, the
noise is non-uniform and input-dependent. For example, bright sources or
sources with low redshift are more reliably predicted and have lower noise.
Attempts have been made to address this issue, mainly as procedures which
supplement existing point estimates in a two-step process. These approaches,
however, are susceptible to outliers and do not take into account the
variance due to data density and in most cases require manual tuning of many
parameters. In this talk, we present a new Bayesian machine learning approach
that jointly optimizes the model with respect to the predictive mean and
variance. The predictive variance of the model takes into account both the
variance due to data density and noise. Jointly optimizing the predictive
variance has the advantage of dynamically weighing outliers less to minimize
their effect on the predictive mean function. Preliminary results show that
the approach substantially outperforms other machine learning methods for
photo-z pdf estimation, such as TPZ and ANNz2, on the SDSS DR12 survey.
|
|
|
المؤتمر (2):
|
|
عنوان المؤتمر:
|
LSST Dark Energy Science Collaboration
Meeting
|
تاريخ الإنعقاد:
|
18 – 22
July 2016
|
مكان
الإنعقاد:
|
Oxford, UK
|
طبيعة المشاركة:
|
Oral presentation
|
عنوان المشاركة:
|
Photo-z Estimation in the presence of input
and Output Uncertainties
|
ملخص المشاركة:
|
Photometric
magnitudes are often supplied with their associated uncertainties; however,
they are used as additional inputs to the models not as proper variances on
the inputs. In this talk, we will present the correct formulation on how to
incorporate these uncertainties in the sparse Gaussian process context and
how it affects the expected outputs and their uncertainties. The results are
demonstrated on the Buzzard data challenge and compared against a random
forest implementation (TPZ) and artificial neural networks (ANNz2).
|
|
|
المؤتمر (3):
|
|
عنوان المؤتمر:
|
The Euclid UK meeting
|
تاريخ الإنعقاد:
|
15 – 16
December 2016
|
مكان
الإنعقاد:
|
London, UK
|
طبيعة المشاركة:
|
Oral presentation
|
عنوان المشاركة:
|
Modeling photometric redshifts with
partially complete and noisy inputs
|
ملخص المشاركة:
|
Machine
learning methods for photometric redshifts have grown in popularity recently.
Typically, a model is trained on a certain set of features and it is assumed
that during testing all the features will be provided and are drawn from the
same distribution. While this assumption is reasonable for most machine
learning applications, it is not ideal for photometric redshifts estimation
since in some cases not all measurements will be available or they might have
variable degrees of uncertainties associated with them. This is especially
the case when attempting to predict photometric redshifts using a model
trained on a different survey or when combining data from different surveys
for training. In this talk a sparse Gaussian process model will be presented
that can train on partially available inputs, contains missing variables or
non-detect signals, or have variable degrees of uncertainties associated with
them.
|
|
|
براءة اختراع:
|
|
مسمى البراءة:
|
Automated Text-Evaluation of User Generated
Text
|
الجهة المانحة:
|
United States Patent and Trademark Office
(USPTO)
|
تاريخ تسجيل
البراءة:
|
25/05/2017
|
ملخص
البراءة:
|
A
method for an automated text-evaluation service, and more particularly a
method and apparatus for automatically evaluating text and returning a score
which represents a degree of inappropriate language. The method is
implemented in a computer infrastructure having computer executable code
tangibly embodied in a computer readable storage medium having programming
instructions. The programming instructions are configured to: receive an
input text which comprises an unstructured message at a first computing
device; process the input text according to a string-structure similarity
measure which compares each word of the input text to a predefined dictionary
to indicate whether there is similarity in meaning, and generate an
evaluation score for each word of the input text and send the evaluation
score to another computing device. The evaluation score for each input
message is based on the string-structure similarity measure between each word
of the input text and the predefined dictionary.
|