Advanced search
1 file | 1.53 MB

The interplay between sample size and replicability of results in fMRI studies

Han Bossier (UGent) , Ruth Seurinck (UGent) , Sanne Roels (UGent) , Simone Kühn and Beatrijs Moerkerke (UGent)
(2018)
Author
Organization
Abstract
1 Introduction Over the past decades, it has become clear that results from fMRI studies suffer from a limited reproducibility (Button et al., 2013; Carp, 2012). Several explanations and solutions have been investigated. Important problems are a high amount of noise associated with fMRI data and the cost of scanning subjects resulting in a limited power to detect reasonable effect sizes observed in the literature (Poldrack et al., 2017). Large-scale collaborations and databases such as the Human Connectome Project (HCP, N=900, Van Essen et al., 2013), the UK Biobank (N = 500.000, Sudlow et al., 2015) and the IMAGEN consortium (N = 1500, Schumann et al., 2010) provide opportunities for researchers to study reproducibility of fMRI results. Turner, Paul, Miller, & Barbey (2017) use data from the HCP to investigate replicability of fMRI results. They investigate image similarity between fMRI group analyses containing up to 121 subjects. Noticeably, they observe a limited replicability even when the maximum amount of subjects is used. In this project and building on our own work in Bossier et al. (2016), we use a similar approach to investigate replicability. Replicability is defined as the ability to reproduce an entire experiment by gathering new data using the same materials and methods (Patil, Peng, & Leek, 2016). We use the IMAGEN database that allows for a maximum number of 700 subjects per group analysis. Our goal is to further study the influence of an increasing sample size on fMRI replicability for voxelwise analyses. 2 Method The IMAGEN project is a neuroimaging and genetic study on reinforcement-related behaviour in adolescents. The database contains fMRI data from 1500 adolescents. In an event-related design, participants are instructed to do a series of cognitive tasks. We use the contrast MATH > LANGUAGE. The general sampling procedure to create independent images is straightforward. We start with N = 10 and sample at random N subjects for group X and Y. We combine the subjects of each group in a mixed effects group analysis using FLAME1 from the FSL software library (Smith et al., 2004). We then calculate two measurements of replicability (see below). Next, we increment the sample size of each group with 10 and repeat until we reach the maximum of N = 700 subjects for each group. We run the entire sampling process for 50 iterations. We consider two measurements. First, we look at the similarity between non-thresholded statistical parametric maps containing a test statistic in each voxel. We calculate the Pearson product-moment correlation coefficient on the vectors corresponding to group analysis X and Y. Second, we look at the similarity of thresholded maps. These maps are obtained after significance testing where we control the voxelwise false discovery rate at level 0.05. We then use the binary maps (active or non-active) to calculate the percent overlap of activation (Maitra, 2010). 3 Results In Figure 1, we plot the average observed Pearson product-moment correlation coefficient between the non-thresholded test statistic images of an fMRI experiment and a replication. When N = 60, we observe a relatively high correlation (> 0.80). In Figure 2, we plot the average percent overlap of activation for thresholded images. Results show limited overlap as the overlap is not higher than 0.4 when N = 60. In fact, only when N = 200 we get 60% overlap between two thresholded images. 4 Conclusion Similarity between a thresholded image and a replication can be limited. The correlation between test-statistic images is high. This is promising for applications focusing on prediction or decoding (Gorgolewski et al., 2015). However, we need large sample sizes to achieve higher spatial overlap of activation between two fMRI replications. Based on a different study, using an independent database, we confirm the main conclusions of Turner et al. (2017). By studying larger sample sizes, we provide further insight into the interplay between sample size and reproducibility.
Keywords
fMRI, replicability, statistical power

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 1.53 MB

Citation

Please use this url to cite or link to this publication:

Chicago
Bossier, Han, Ruth Seurinck, Sanne Roels, Simone Kühn, and Beatrijs Moerkerke. 2018. “The Interplay Between Sample Size and Replicability of Results in fMRI Studies.” In .
APA
Bossier, H., Seurinck, R., Roels, S., Kühn, S., & Moerkerke, B. (2018). The interplay between sample size and replicability of results in fMRI studies. Presented at the The 24th Annual Meeting of the OHBM.
Vancouver
1.
Bossier H, Seurinck R, Roels S, Kühn S, Moerkerke B. The interplay between sample size and replicability of results in fMRI studies. 2018.
MLA
Bossier, Han et al. “The Interplay Between Sample Size and Replicability of Results in fMRI Studies.” 2018. Print.
@inproceedings{8615450,
  abstract     = {1	Introduction 
Over the past decades, it has become clear that results from fMRI studies suffer from a limited reproducibility (Button et al., 2013; Carp, 2012). Several explanations and solutions have been investigated. Important problems are a high amount of noise associated with fMRI data and the cost of scanning subjects resulting in a limited power to detect reasonable effect sizes observed in the literature (Poldrack et al., 2017). Large-scale collaborations and databases such as the Human Connectome Project (HCP, N=900, Van Essen et al., 2013), the UK Biobank (N = 500.000, Sudlow et al., 2015) and the IMAGEN consortium (N = 1500, Schumann et al., 2010) provide opportunities for researchers to study reproducibility of fMRI results. 
Turner, Paul, Miller, & Barbey (2017) use data from the HCP to investigate replicability of fMRI results. They investigate image similarity between fMRI group analyses containing up to 121 subjects. Noticeably, they observe a limited replicability even when the maximum amount of subjects is used. 
In this project and building on our own work in Bossier et al. (2016), we use a similar approach to investigate replicability. Replicability is defined as the ability to reproduce an entire experiment by gathering new data using the same materials and methods (Patil, Peng, & Leek, 2016). We use the IMAGEN database that allows for a maximum number of 700 subjects per group analysis. Our goal is to further study the influence of an increasing sample size on fMRI replicability for voxelwise analyses. 
2	Method
The IMAGEN project is a neuroimaging and genetic study on reinforcement-related behaviour in adolescents. The database contains fMRI data from 1500 adolescents. In an event-related design, participants are instructed to do a series of cognitive tasks. We use the contrast MATH > LANGUAGE.
The general sampling procedure to create independent images is straightforward. We start with N = 10 and sample at random N subjects for group X and Y. We combine the subjects of each group in a mixed effects group analysis using FLAME1 from the FSL software library (Smith et al., 2004). We then calculate two measurements of replicability (see below). Next, we increment the sample size of each group with 10 and repeat until we reach the maximum of N = 700 subjects for each group. We run the entire sampling process for 50 iterations. 
We consider two measurements. First, we look at the similarity between non-thresholded statistical parametric maps containing a test statistic in each voxel. We calculate the Pearson product-moment correlation coefficient on the vectors corresponding to group analysis X and Y. Second, we look at the similarity of thresholded maps. These maps are obtained after significance testing where we control the voxelwise false discovery rate at level 0.05. We then use the binary maps (active or non-active) to calculate the percent overlap of activation (Maitra, 2010). 
3	Results
In Figure 1, we plot the average observed Pearson product-moment correlation coefficient between the non-thresholded test statistic images of an fMRI experiment and a replication. When N = 60, we observe a relatively high correlation (> 0.80). 
In Figure 2, we plot the average percent overlap of activation for thresholded images. Results show limited overlap as the overlap is not higher than 0.4 when N = 60. In fact, only when N = 200 we get 60% overlap between two thresholded images. 
4	Conclusion
Similarity between a thresholded image and a replication can be limited. The correlation between test-statistic images is high. This is promising for applications focusing on prediction or decoding (Gorgolewski et al., 2015). However, we need large sample sizes to achieve higher spatial overlap of activation between two fMRI replications. Based on a different study, using an independent database, we confirm the main conclusions of Turner et al. (2017). By studying larger sample sizes, we provide further insight into the interplay between sample size and reproducibility.
},
  author       = {Bossier, Han and Seurinck, Ruth and Roels, Sanne and Kühn, Simone  and Moerkerke, Beatrijs},
  keywords     = {fMRI,replicability,statistical power},
  location     = {Singapore},
  title        = {The interplay between sample size and replicability of results in fMRI studies},
  year         = {2018},
}