Send to

Choose Destination
See comment in PubMed Commons below
Neural Netw. 2001 Dec;14(10):1419-29.

Upper bound of the expected training error of neural network regression for a Gaussian noise sequence.

Author information

  • 1Faculty of Physics Engineering, Mie University, Tsu, Japan.


In neural network regression problems, often referred to as additive noise models, NIC (Network Information Criterion) has been proposed as a general model selection criterion to determine the optimal network size with high generalization performance. Although NIC has been derived using asymptotic expansion, it has been pointed out that this technique cannot be applied under the assumption that a target function is in a family of assumed networks and the family is not minimal for representing the target true function, i.e. the overrealizable case, in which NIC reduces to the well-known AIC (Akaike Information Criterion) and others depending on a loss function. Because NIC is the unbiased estimator of generalization error based on training error, it is required to derive the expectations of errors for neural networks for such cases. This paper gives upper bounds of the expectations of training errors with respect to the distribution of training data, which we call the expected training error, for some types of networks under the squared error loss. In the overrealizable case, because the errors are determined by fitting properties of networks to noise components, including in data, the target set of data is taken to be a Gaussian noise sequence. For radial basis function networks and 3-layered neural networks with bell shaped activation function in the hidden layer, the expected training error is bounded above by sigma2* - 2nsigma2*logT/T, where sigma2* is the variance of noise, n is the number of basis functions or the number of hidden units and T is the number of data. Furthermore, for 3-layered neural networks with sigmoidal activation function in the hidden layer, we obtained the upper bound of sigma2* - O(log T/T) when n > 2. If the number of data is large enough, these bounds of the expected training error are smaller than sigma2* - N(n)sigma2*/T as evaluated in NIC, where N(n) is the number of all network parameters.

[PubMed - indexed for MEDLINE]
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Loading ...
    Support Center