Friday 16 June 2017

Robust model selection for Bootstrap-Aggregated Neural Network regression applied to small, noisy datasets

Robust model selection for Bootstrap-Aggregated Neural 
Network regression applied to small, noisy datasets


This is a small piece of work in the area of machine learning. Bootstrap aggregating is known to be an excellent approach to fit regression models when the dataset is small, and potentially polluted by noise. 

The picture given below represents, in blue, a reference 2D function, polluted by white noise and cut by a plane for visualisation purposes. We want to reconstruct this surface from a small sample of function values. We do this by drawing repeatedly from the small dataset, with replacement, and systematically fitting a standard Neural Network (NN) regression (all the thin red curves). Then, all the NN replicates are averaged out. We obtain the thick black curve. Whilst each individual bootstrap replicate clearly overfits (we have only 86 2D datapoints), the bootstrap aggregated regression behaves nicely.



The 2D plot of the aggregated regression is displayed below.

The novelty of the work lies in the estimation of the prediction error. We rely on an Out-Of-Bag approach to estimate the predictive coefficient of determination, and derive frequentist confidence interval for this statistics through a non-parametric bootstrap. The confidence interval allows us to
- select the number of replicates that lead to a correct estimation of the predictive power of the regression.
- select the optimal number of Neurons of the Aggregated Neural Network in a robust manner, through early stopping.


No comments:

Post a Comment