"How to Use Expert Advice"
Nicoló Cesa-Bianchi, Yoav Freund, David P. Helmbold, David Haussler,
Robert E. Schapire, and Manfred K. Warmuth.
[Postscript]
Abstract:
We analyze algorithms that predict a binary value by combining the
predictions of several prediction strategies, called experts.
Our analysis is for worst-case situations, i.e., we make no assumptions
about the way the sequence of bits to be predicted is generated. We
measure the performance of the algorithm by the difference between
the expected number of mistakes it makes on the bit sequence and the
expected number of mistakes made by the best expert on this sequence,
where the expectation is taken with respect to the randomization in the
predictions. We show that the minimum achievable difference is on the
order of the square root of the number of mistakes of the best expert,
and we give efficient algorithms that achieve this. Our upper and lower
bounds have matching leading constants in most cases. We then show how
this leads to certain kinds of pattern recognition/learning algorithms
with performance bounds that improve on the best results currently
known in this context. We also extend our analysis to the case in
which log loss is used instead of the expected number of mistakes.