Εφαρμογές πολυπαραμετρικών στατιστικών τεχνικών στη χημική ανάλυση

Doctoral Dissertation uoadl:1308904 639 Read counter

Unit:
Κατεύθυνση Αναλυτική Χημεία
Library of the School of Science
Deposit date:
2012-07-10
Year:
2012
Author:
Φαρμάκη Ελένη
Dissertation committee:
Κ. Ευσταθίου Καθηγητής ΕΚΠΑ, Ν. Θωμαΐδης Επίκουρος Καθηγητής ΕΚΠΑ, Μ. Κουππάρης Καθηγητής ΕΚΠΑ
Original Title:
Εφαρμογές πολυπαραμετρικών στατιστικών τεχνικών στη χημική ανάλυση
Languages:
Greek
Summary:
This thesis investigated the implementation of multivariate techniques in large
classification data bases, targeting their theoretical presentation, comparison
and inference, regarding their application field, handling, potentialities and
restrictions.
Unsupervised techniques like Principal Components Analysis/Fa¬ctor¬ Analysis
(PCA/FA) and Cluster Analysis (CA) and supervised ones like Discriminant
Analysis (DA), Classification Trees (CTs) and Artificial Neural Networks (ANNs)
were used. Emphasis was placed on the techniques of CTs and ANNs (three methods
and architectures are studied respectively for each one of them). The
advantages, disadvantages and their particularities were exploited and the
classification models were optimized. All the techniques were compared to each
other in terms of their results (the percentages of samples correctly
classified) in three data bases, that concerned the determinations of a)
metals-metalloids in the three reservoirs that are used for the water supply of
Athens (Iliki, Mornos and Marathon), b) metals-metalloids and nutrients in
marine sediments from big aquacultures of the country, c) rare earth elements
(REE) in olive oil samples from different regions.
Although DA is a parametric multivariate technique, with many restrictions in
its implementation, responded to the needs of all the problems and always
provided an initial evaluation for that (capability of linear or not linear
discrimination on the basis of the Canonical plot of the analysis and initial
evaluation of the variables). The percentages of the correct classification it
provided, were frequently compared to that of the most sophisticated
techniques. CTs with 3 different methods and enough flexibility (they provided
many parameters for trials and optimization), resulted in high percentages with
the use of few or more variables (usually more than ANNs), constructing
reproducible models with generalization. ANNs were proved to be a particularly
flexible technique, with potentialities of efficient variables’ evaluation and
implementation in simple but also complicated data bases, approximating linear
and non-linear functions. Robust and flexible models were constructed. However,
over-training phenomena seemed to plague ANN and careful handling was needed
for their avoidance.
The available samples were split in three sets: except the usual training one,
validation ant test sets were used. In this way, an immediate identification of
these phenomena was achieved (so that training was automatically interrupted),
and moreover, a test of the models in new “unknown” samples was carried out, so
that generalization potentialities were checked. Samples sets were split
randomly (as modern bibliography dictates), or were based on DA pre-treatment
(a method that has never been used in the past). Moreover, the simplest
structures were used: with few parameters (variables, weights) and operating
processing units (neurons).
Keywords:
Multivariate techniques, Classification trees, Artificial Neural Networks, Overfitting phenomena, Classification models
Index:
Yes
Number of index pages:
15-23, 279-282, 363-366
Contains images:
Yes
Number of references:
406
Number of pages:
452
document.pdf (5 MB) Open in new window

 


attachments.zip
2 MB
File access is restricted.