The field of science is currently facing a crisis in terms of reproducibility. One potential solution that has been proposed is the incorporation of data analysis replications into classrooms. However, the feasibility of this approach and what stakeholders can expect from it, including students, educators, and scientists, remains unclear. This study aims to address these questions by incorporating data analysis replications into the Applied Data Analysis course taught at EPFL to a total of 354 students.

The findings of this study, based on pre-registered surveys administered throughout the course, reveal that students are capable of replicating previously published scientific papers, both qualitatively and, in some cases, exactly. The study also highlights the disparities between students’ expectations of data analysis replications and their actual experiences, which indicates a shift in attitudes towards fostering critical thinking.

Furthermore, this study provides educators with insights into the amount of overhead required to incorporate replications into the classroom and identifies concerns that may arise compared to more traditional assignments. Additionally, the study demonstrates the tangible benefits of in-class data analysis replications for scientific communities, including the creation of replication reports and insights into replication barriers that should be avoided in future scientific work.

Overall, this study demonstrates that the inclusion of replication tasks in a large data science class can enhance the reproducibility of scientific work as a by-product of data science instruction. This approach proves to be beneficial for both science and students.