Overestimation of effect sizes and lack of reproducibility is a problem endemic to all medical and psychological research (Ioannidis, 2005a/b, 2008; Open Science Collaboration, 2015; Ware & Munafo, 2014), and has become increasingly addressed in the field of neuroimaging in recent years (see Pernet & Poline, 2015; Poldrack & Poline, 2015; Ioannidis et al., 2014). Given that small sample sizes are a source of low reproducibility in neuroimaging (Button et al., 2013), a move toward big data and analysis approaches developed for data science has been called for by  many neuroimaging researchers.

Tools such as machine learning have become frequently used in neuroimaging, but these tools were not developed with neuroimaging data in mind, and their suitability for neuroimaging data has not been established.

Our Research

The following publications from our group evaluate how machine learning methods are being used with neuroimaging data, and what potential issues may arise:

When optimism hurts: inflated predictions in psychiatric neuroimaging.

The clinical added value of imaging: A perspective from outcome prediction.

Neuromarkers for mental disorders: Harnessing population neuroscience. (in press)


We have empirically evaluated to what extent sample size, the number of neuroimaging variables, the selection of the machine learning algorithm, feature selection, regularization, and bootstrap aggregation affect the accuracy of results. Our findings are reported in the following works:

A method for the optimisation of feature selection with imaging data.

Reproducible prediction for large neuroimaging data (in preparation)

Much of the code we use for our machine learning projects is shared on GitHub


To establish the best tools to use for machine learning/prediction analyses with neuroimaging data and evaluate the impact of sample size and feature set size on accuracy.