Comparison of Human, AI-Assisted, and Quasi-Automated Approaches to Assessing Research Reproducibility in Quantitative Social Science
This paper is part of replication studies that I4R of Ottawa University coordinated in different events for replication games. This study evaluates the effectiveness of varying levels of human and artificial intelligence (AI) integration in reproducibility assessments of quantitative social science research. We computationally reproduced quantitative results from published articles in the social sciences with 288 researchers, randomly assigned to 103 teams across three groups — human-only teams, human teams with AI assistance (“cyborg”), and AI teams ("machine"). Findings reveal that most “cyborg” and “human” teams completed reproducibility checks (91% and 94%, respectively), while only 37% of “machine” teams successfully reproduced findings, highlighting challenges for automated only reproduction. The “human” teams detected statistically significant coding errors compared to the other groups. 94% of teams proposed at least one quality robustness check, with “human” and “cyborg” teams performing better. These results underscore the strengths and limitations of AI assistance reproduction.