Improving reproducibility through AI-powered data extraction: A case study involving Statcheck
Verifying the analytical reproducibility of research findings requires access to the raw data, however this is often not possible due to ethical or practical reasons. That being said, one can sometimes still check the consistency of a set of reported numbers. For example, based on the test statistic and degrees of freedom, one can calculate the corresponding p-value and check whether it matches with the reported value. The Statcheck application (Nuijten & Epskamp, 2024) automates this procedure for any text uploaded by the user. However, results that do not exactly match the intended APA format are not recognized by Statcheck, and neither are corrections for multiple testing or assumption violations. The present study used AI to examine whether the extraction and subsequent verification of such tests can be improved. Preliminary findings using the gpt-4o-mini model on a set of manually coded papers, suggest marked improvements compared to Statcheck.