"Mutual Information, Metric Entropy, and Cumulative Relative Entropy Risk," (1996) (with Manfred Opper)
Submitted to Annals of Statistics.
[472k postscript]
Abstract:
Assume $\{P_\theta: \theta \in \Theta\}$ is a set of probability distributions
with a common dominating measure
on a complete separable metric space $Y$. A state $\theta^* \in \Theta$ is
chosen by Nature. A statistician gets $n$ independent observations
$Y_1, \ldots, Y_n$ from $Y$ distributed according to $P_{\theta^*}$.
For each time $t$ between 1 and $n$, based on the observations $Y_1, \ldots, Y_{t-1}$,
the statistician
produces an estimated distribution $\hat{P}_t$ for $P_{\theta^*}$, and
suffers a loss $L(P_{\theta^*},\hat{P}_t)$.
The cumulative risk for the statistician is the average total loss up to time $n$.
Of special interest in information theory, data compression, mathematical finance,
computational learning theory and statistical mechanics is the special
case when the loss $L(P_{\theta^*},\hat{P}_t)$ is the relative entropy between the true distribution $P_{\theta^*}$
and the estimated distribution $\hat{P}_t$.
Here the cumulative Bayes risk from time 1 to $n$ is the mutual information
between the random parameter $\Theta^*$ and the observations $Y_1, \ldots, Y_n$.
New bounds on this mutual information are given in terms of the Laplace
transform of the Hellinger distance between pairs of distributions indexed by parameters in
$\Theta$.
From these, bounds on the cumulative minimax risk are given in terms of the metric entropy
of $\Theta$ with respect to the Hellinger distance. The assumptions required
for these bounds are very general and do not depend
on the choice of the dominating measure. They apply to both finite and
infinite dimensional $\Theta$. They apply in some cases where $Y$ is
infinite dimensional, in some cases where $Y$ is not
compact, in some cases where the distributions are not smooth,
and in some parametric cases where asymptotic normality of the posterior
distribution fails.