Type: Thesis (PhD)
Dimensionality Reduction and Representation for Nearest Neighbour Learning.
PhD thesis, University of Aberdeen.
Full text not available from this archive.
An increasing number of intelligent information agents employ Nearest
Neighbour learning algorithms to provide personalised assistance to
the user. This assistance may be in the form of recognising or locating
documents that the user might nd relevant or interesting. To achieve
this, documents must be mapped into a representation that can be
presented to the learning algorithm. Simple heuristic techniques are
generally used to identify relevant terms from the documents. These
terms are then used to construct large, sparse training vectors. The
work presented here investigates an alternative representation based on
sets of terms, called set-valued attributes, and proposes a new family
of Nearest Neighbour learning algorithms that utilise this set-based
representation. The importance of discarding irrelevant terms from
the documents is then addressed, and this is generalised to examine
the behaviour of the Nearest Neighbour learning algorithm with high
dimensional data sets containing such values. A variety of selection
techniques used by other machine learning and information retrieval
systems are presented, and empirically evaluated within the context of
a Nearest Neighbour framework. The thesis concludes with a discussion
of ways in which attribute selection and dimensionality reduction
techniques may be used to improve the selection of relevant attributes,
and thus increase the reliability and predictive accuracy of the Nearest
Neighbour learning algorithm.
Archive Staff Only: edit this record