Overestimation of effect sizes and lack of reproducibility is a problem endemic to all medical and psychological research (Ioannidis, 2005a/b, 2008; Open Science Collaboration, 2015; Ware & Munafo, 2014), and has become increasingly addressed in the field of neuroimaging in recent years (see Pernet & Poline, 2015; Poldrack & Poline, 2015; Ioannidis et al., 2014). Given that small sample sizes are a source of low reproducibility in neuroimaging (Button et al., 2013), a move toward big data and analysis approaches developed for data science has been called for by many neuroimaging researchers.
Tools such as machine learning have become frequently used in neuroimaging, but these tools were not developed with neuroimaging data in mind, and their suitability for neuroimaging data has not been established.
The following publications from our group evaluate how machine learning methods are being used with neuroimaging data, and what potential issues may arise:
Neuromarkers for mental disorders: Harnessing population neuroscience. (in press)
We have empirically evaluated to what extent sample size, the number of neuroimaging variables, the selection of the machine learning algorithm, feature selection, regularization, and bootstrap aggregation affect the accuracy of results. Our findings are reported in the following works:
Reproducible prediction for large neuroimaging data (in preparation)
Much of the code we use for our machine learning projects is shared on GitHub
To establish the best tools to use for machine learning/prediction analyses with neuroimaging data and evaluate the impact of sample size and feature set size on accuracy.