STEP 7: Data analysis#

Commonly, this step overlaps with Step 6. Initial data analysis refers to the process of data inspection and reorganization that needs to be carried out before formal statistical analyses (Hueber et al., 2016). This process includes metadata setup, data cleaning/screening/refining, updating the research analysis plan, version control, and the documentation of initial data analysis procedures (see Baillie et al., 2022). Ideally, the data analysis procedure for the current project has been thoroughly planned and fixed in advance during Step 3. But even then, many new decisions have to be made at this stage, which may affect the next steps, e.g., how the data can be best shared with others, how it allows for collaborative data analysis, or how results are best visualized (Kroon et al., 2022).

Choosing the right analysis framework one feels comfortable with is just one of the many challenges in this step (➜ RStudio, ➜ JASP, or ➜ Jupyter Notebook). If the study was not preregistered, statistical approaches that are suitable for the research question need to be chosen (e.g., Bayesian versus frequentist statistics; Pek & Van Zandt, 2020; van Zyl, 2018). If applicable, correction methods for multiple comparisons should be considered (Alberton et al., 2020; Noble, 2009), to avoid a potential increase in Type I error rate (see glossary). Crucially, in recent times, there has been a shift in the focus of group-level to individual trajectory analyses, which has a significant impact on the required sample size and the effect size (Marek et al., 2022).

To overcome inherent inaccuracies associated with estimating effect sizes, sequential analyses involve monitoring data collection as it progresses and controlling for Type 1 error rate (Lakens, 2014). During sequential analyses, at a predetermined stage in the project (e.g., defined in Step 2), an interim analysis can be conducted to determine whether the collected data provide sufficient evidence to conclude that an effect is present, whether more data should be gathered, or whether the study should be terminated if the predicted effect is unlikely to be observed (Lakens, 2014). Importantly, this analysis approach should ideally also be preregistered. Of note, data analysis is a critical step that has attracted much attention recently in light of the so-called “replicability crisis” (Anvari & Lakens, 2018), as this is a stage with high researcher degrees of freedom where questionable research practices (John et al., 2012) and biases may occur (even inadvertently) (see e.g., Ivimey-Cook et al., 2023 for improving the data extraction in meta-analyses).

Finally, in the process of analyzing results, it is also essential to consider the role of visualizations. Effective visual representations can enhance the comprehension of complex data sets and findings (➜ BioRender, ➜ Mermaid, or ➜ Nipype).

Key questions in this step#

  • What are specific analysis pipelines and programs that can be used for specific types of data (e.g., EEG, (f)MRI, behavior)?

  • What open-source software is a good alternative to proprietary products?*

  • Which tools allow complete replicability of an analysis pipeline, independent of the specific operating system of a user or continuous software updates?

  • How are results visualized in a captivating, yet transparent and maximally inclusive way?

Questions with an asterisk indicate that these should ideally already be explored before starting a project.