Abstract:
In Computational Learning Theory, one way to model a concept is to consider it as
a member of the Cartesian products of instances (sets), where each instance may
correspond to a binary or multi-valued domain. A concept class is a set of concepts,
and the goal of learning algorithms is to identify the target concept in a concept
class from a small number of examples, i.e., labeled instances. This thesis studies
multi-label concept classes and three important learning complexity parameters for
these classes.
The first parameter examined in this work is the Vapnik-Chervonenkis-dimension
(VCD) and its previously studied analogues for multi-label concept classes such as
the Graph-dimension, Pollard's pseudo-dimension and the Natarajan dimension. This
thesis also extends the study of sample compression schemes to multi-label concept
classes. A sample compression scheme (SCS) for a concept class C compresses every
set S of labeled examples for some concept in C to a subset, which is decompressed
to some concept that is consistent with S. The size of the SCS is the cardinality of
its largest compressed set. The best possible size of an SCS is the second parameter
studied below. This work formulates a sufficient condition, which we call the reduction
property, for a notion of VC-dimension (VCD ) to yield labeled compression schemes
for maximum classes of VCD d in which the compression sets have size at most
d. This compression scheme is in fact an extension of Floyd and Warmuth's binary scheme to the multi-label case. The same condition also yields a so-called tight SCS,
which we de ne to generalize Kuzmin and Warmuth's unlabeled binary scheme to the
multi-label case. The Graph dimension satis es our su cient condition, while neither
Pollard's pseudo-dimension nor the Natarajan dimension does. Moreover, this thesis
shows that any class of Graph-dimension 1 has an SCS of size 1.
The third parameter studied in this thesis is the recursive teaching dimension
(RTD), a complexity parameter of the recently introduced recursive teaching model,
which is of interest in the study of binary concept classes because of its connection
to the VCD. In the recursive teaching model, the teacher and learner agree to start
the learning process with the easiest concept in the class, i.e., a concept with the
minimum teaching dimension (minimum number of labeled examples needed to identify
a concept among all other concepts in the class). This concept is then removed
from the concept class and a concept with the minimum teaching dimension in the
remaining class is chosen to teach and remove, and the process continues until no concepts
are left in the class. This procedure is called a recursive teaching plan and the
recursive teaching dimension of the class is the largest minimum teaching dimension
encountered in the plan.
This thesis establishes a further connection between RTD and VCD in both the
multi-label and binary cases, by providing an upper bound on the size of classes
of a given RTD, analogous to Sauer's bound on the size of classes of a given VCD .
Maximum (largest possible) classes of a given VCD are proven to be RTD-maximum
as well. It is shown that for any VCD -maximum class C where VCD fulfills the
reduction property, there is a teaching plan for C in which the recursive teaching
sets coincide with the compression sets resulting from a tight compression scheme.
Methodologically, an algebraic approach turns out to be useful, for example to prove
an interesting graph-theoretic property of VCD -maximum classes.
Description:
A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial Fulfillment of the Requirements For the Degree of Doctor of Philosophy In Computer Science, University of Regina. xi, 155 p.