SIPS 2025 Budapest
Unleash your creativity and join us in the fight against questionable research practices (QRPs)! In this hackathon, we’ll transform cold, boring data into eye-catching info cards that expose QRPs for what they are: sneaky, harmful, and entirely preventable. These info cards will define each QRP, highlight its harms, reveal its telltale signs, and share tips to slay it—all wrapped up in engaging visuals.
The heavy lifting is done: the text is ready. Now, we need you to find the perfect illustrations and help craft cards that will shine in classrooms and social media feeds alike. No design experience? No problem. If you can google, drag, drop, and dream big, you’re ready. Let’s make QRPs infamous, one info card at a time!
Bring your sass, your style, and your laptop. Together, we’ll make research better—and look good doing it.
Researchers and decision-makers of tomorrow need training that is grounded in open science. However, in social psychology in particular, an area severely affected by the replication crisis, adapted teaching formats are rarely available. Textbooks are generally not available in open access formats, and the only OER textbook remains largely silent on the controversy surrounding the resilience of social psychological research.
In this hackathon, our goal will be providing updated learning materials on key findings in social psychology. In this knowledge co-production process, we will work on a textbook that documents shifts in the field and revisits classic studies in light of recent replication efforts. Hackathoners work together to draft or revise chapters of an open textbook for undergraduate education. Contributing to this hackathon by drafting or revising a chapter will be credited with co-authorship of the relevant chapter.
Registered Reports have now been around for over a decade, and have stood the test of time establishing themselves in mainstream publication processes. Over time Registered Reports have grown from a few psychology/neuroscience journals to now hundreds of journals spanning broad research domains. Numerous resources have been developed to support researchers in learning about and implementing Registered Reports, from Peer Community In Registered Reports, which provides two-stage peer review independent of a journal, to Registered Reports Now, which aims to encourage new journals to adopt Registered Reports. However, Registered Reports currently lack a key feature: discoverability. While teams are working on incorporating Registered Report status in meta-data, this hackathon pursues an alternative goal: A well-maintained database of published Registered Reports. Join us to work as a collaborative group to brainstorm, design, and implement the first comprehensive and maintainable database of all published Registered Reports.
Research on neurodiversity has immense potential to improve our understanding of neurodevelopmental clinical conditions (e.g., autism, ADHD, dyslexia) while also making wider contributions to equality, diversity, and inclusion research and practices. However, research on neurodiversity, especially in the field of psychology, has considerable room for improvement. There is an urgent need to: 1) refine theoretical frameworks born out of the neurodiversity concept (e.g., Double Empathy; Livingston, Hargitai, & Shah, 2024, Psychol. Rev.), 2) promote the widespread adoption of open science practices (e.g., replications and extensions, Leung et al., 2024, Autism; open data and open code, Waldren et al., 2024, Cortex), and 3) increase interdisciplinary collaborations to drive meaningful progression in the field (e.g., Layinka et al., 2024, eLife). Drawing on our own research, and work submitted to the international journal, Neurodiversity, we will host a roundtable on the current state and potential future of neurodiversity research.
Theories are built on presumed causal relations, but these are often established on the basis of associational research. This is because explicit causality is avoided outside of experiments. Using associations as proxies for causal effects makes intransparent assumptions that no one would endorse if they were explicit. Recent tools make explicit causal analysis more attractive and invite debate about its assumptions. The aim of this unconference is to bring together such tools and to generate ideas on how to make adressing causality more common and convenient in psychological research. A very rough framework of a meta-tool can be discussed that navigates not only through existing tools, but also through preceding qualitative decisions such as: Is a causal study needed? What are the arguments for and against a causal effect? How could the existence or non-existence of an effect have turned out to be wrong?
The field of cognitive neuroscience has accumulated a vast amount of EEG data, yet new studies often prioritize fresh data collection over utilizing these existing resources. Repurposing publicly available EEG datasets presents a cost-effective and efficient way to explore novel hypotheses. However, EEG analysis involves complex preprocessing and interpretation steps, where subjective decisions, such as artifact rejection, filtering techniques, and statistical methods, can significantly impact results. To enhance transparency and reliability, a collaborative framework in which more than one researcher independently analyses the same EEG dataset and identifies robust findings can be a significant step forward. This approach not only mitigates individual biases but also promotes best practices in EEG research within the open science movement. In this talk, we highlight the potential of existing EEG datasets, discuss strategies for fostering community-driven EEG analysis, and call for collective collaboration in this endeavor.
Psychology has a problem with the robustness of its evidence base. This leads to a lack of reproducibility of previously established effects, as well as a shaky base to build theories on. We therefore propose to introduce the phenomenon (Bogen, & Woodward, 1988) as an empirical benchmark. Phenomena are robust and stable patterns, generally evidenced by multiple sets of data. Empirical efforts should strive to establish phenomena instead of testing and constructing theories. Additionally, theories should be constructed to explain and tested on phenomena instead of data. To facilitate this three-level picture we are constructing a database, PsychoFacts, where researchers can submit their phenomena and discuss previously submitted entries. Having a repository for phenomena that are robust and reliable will help researchers construct theories based on relevant empirical findings, as well as test their theories on whether they are able to explain these findings.
Factor analysis is a powerful tool for examining the dimensionality of psychological scales and other measurement instruments. While several R packages, such as lavaan, facilitate the estimation of latent variable models, they often lack essential diagnostic tools for assessing key assumptions of the measurement model. Ignoring these assumptions can lead to misleading conclusions. For instance, poor global model fit may result from nonlinear relationships between a factor and its indicators, even when the items are fundamentally unidimensional. To address this gap, we developed the lavaanDiag package, which streamlines the diagnostic process for factor models estimated in lavaan. This package provides functions to visualize residual correlations, examine relationships between latent variable estimates, and compare model-implied and empirical factor-indicator relationships. By offering these tools, lavaanDiag enhances model evaluation and improves the validity of factor analytic research.
Researchers’ degrees of freedom in data analysis presents significant challenges in social sciences, where different analytical decisions can lead to varying conclusions. In this work, we propose a framework of an exploratory multiverse simulation to empirically compare various decision pathways to identify how arbitrary analytical decisions affect the conclusions of a study. The framework is demonstrated on the Congruency Sequence Effect (CSE), a well-studied phenomenon in cognitive control research. We reviewed existing literature to identify common non-theory-specific analytical decisions, such as outlier exclusion criteria and hypothesis testing methods and incorporated these into our simulation framework. Using a large number of simulated datasets, we compared the True Positive Rates (TPR) and False Positive Rates (FPR), and observed effect sizes across different decision pathways. We recommend this framework as a tool to quantify the impact of different analytical decisions on study conclusions in notable well-studied effects.
Questions abound about how to navigate academic psychology (e.g., publishing and reviewing, teaching and mentoring, securing a job). Although there are extensive resources available, they are not organized in a single, searchable location. The few collections that are available are books, which a) are solo-authored or are long chapters written by senior figures in the field, b) are written almost exclusively by authors from the U.S. writing about the U.S. context, and c) are not freely accessible. The purpose of this hackathon is to develop a new resource, The Open Academic: A Career Guide for Psychology, which will consist of numerous, brief entries on a large number of topics. The development of the guide has already begun, and all infrastructure will be in place prior to the conference. The hackathon session will be focused on adding and editing as much content for the Guide as possible.
Diversity, equity, and inclusion are essential principles that shape policies and codes of conduct aimed at fostering fairness and justice. However, conventional measures of diversity often rely on limited variables such as nationality, geographic location, race, or gender, which overlook hidden disparities and confounding factors. This hackathon serves as the first step in a broader BTS project aimed at comprehensively documenting diversity and inequity in academia.
Participants will begin by identifying and exploring variables beyond traditional measures, including income levels, access to facilities, research support, grant opportunities, teaching loads, and publication fee agreements. The group will then collaboratively discuss methodologies for data collection. The event will conclude with the formation of working teams and commitments to ensure progress. The final objective is to launch a global survey with an ultimate publication goal, that would contribute to create a more comprehensive framework for assessing diversity and inequity in academic settings.
Artificial intelligence is everywhere, but there are very few resources and guidance around how to use AI for good in psychological science. How are researchers using AI to power research and collaborations? Which tools and practices should we be implementing, and which ones are fraught with controversy and peril? How can (or should) AI be used in idea generation, paper writing, coding and analysis work, and dissemination? This session will not be led by an AI expert who will simply drop knowledge on everyone else, nor does the session require any previous AI experience. Instead, this session is for AI-curious individuals to talk through issues, ideas, controversies, and tips/tricks that we can all learn and use. The goal is for everyone to leave this session with one or two practical ideas for incorporating AI in our research and lab groups in the coming year.
To improve generalizability of our inferences from human subjects research, we should randomly sample humans from all over the globe. This is a difficult thing to do, and simple random sampling will never be possible. But there are other random sampling techniques we can explore to increase coverage. We propose a random probability proportional to size sampling at the country level, so more populous countries have more potential representation. With Big Team Science, this starts to become possible. Mixing this sampling with non-response bias weighting (country-level) can help us see how indicators like GDP, inequality indices, and nation-level indicators can, quantitatively, inform gaps in our understanding. Only through the work of distributed networks of researchers can we start working towards randomly sampling the globe. This is only a first step, and future steps will improve on this system. Discussion and feasibility and alternative ideas are welcome.
Typically, researchers only hypothesize their expected effects are not null (significant), while neglecting to evaluate their actual hypotheses directly. Therefore, their tests are weakly informative and do not represent the underlying theories. As a powerful, theory-based alternative to p-values, GORICA evaluates the relative evidence for multiple hypotheses simultaneously. A researcher can, for instance, directly evaluate the hypothesis that the number of children (βNrC) is a stronger predictor for happiness than income (βInc) and age (βAge): βNrC > {βInc, βAge}. This would be impossible with p-values. Additionally, GORICA naturally ties into the goals of open science, as it requires precisely and transparently stated (and preregistered) expectations.
We present a hands-on opportunity to apply GORICA to common statistical models (like ANOVA and/or SEM, using R functions/packages lm, lavaan, or lme4) accompanied with a theoretical introduction to GORICA.
Means and standard deviations based on data on rating scale-data are not independent of each other: Means around the scale’s center can coincide with larger standard deviation than means towards either limit of the scale. Heathers et al. (2018) plot all combination of means and standard deviations for a sample of ten on a 5-point scale and observe that the results scatter in the shape of an umbrella. Based on the Bernoulli distribution, we developed a simple formalization of these umbrella restrictions that generalizes across samples sizes. We introduce that formalization and explore its implications to statistical power and meta-analytical effect heterogeneity. Further, we describe how the formalization adds to the error-detection toolbox.
There have been calls for more replication since the early days of the replication crisis. However, replication studies are rarely conducted or published due to a lack of incentives. We aim to change this by providing an interdisciplinary, high-quality, diamond open access journal dedicated to replications, reproductions, and methods discussions. Our goal is to facilitate, disseminate, and reward replication research.
Research involving psychological constructs often require operational definitions. As such research areas evolve, different operational definitions tend to be chosen consistently across “camps”. However, psychological and neurobiological mechanisms that underpin constructs should be captured independently of the tools and definitions used.
One key pitfall of relying on a single operational definition is that it can be confused with the underlying construct. This can lead to an unintentional circularity where a construct is treated synonymously with whatever the measurement tool captures. Such circularity can stifle psychological research by considering the “meaning” of a construct as established by particular operationalizations.
Using data from the Canadian Longitudinal Study on Aging, this talk examines how varying operational definitions of “SuperAgers”—individuals with attenuated age-related cognitive decline—affect measurement. The findings highlight the consequences of relying on specific operational definitions and underscore the importance of grounding research in clearly defined constructs.
Replication studies, while crucial for scientific advancement, require substantial resources and careful planning. We introduce the concept of pre-replication—a systematic evaluation framework to be implemented before conducting replications. Pre-replication encompasses three fundamental assessments: epistemological goals, methodological quality, and axiological rationale for replication. This framework serves as both a practical tool and a critical thinking exercise, encouraging researchers to thoroughly evaluate potential replications before committing resources. Importantly, pre-replication can lead to the justified conclusion that certain findings are not worth replicating (which we label as “negative pre-replication”), thereby preventing unnecessary resource expenditure. While researchers can include pre-replication reports in their studies, the primary benefit lies in the preliminary critical reflection process, which may even invalidate findings without requiring actual replication. Pre-replication is thus intended to help researchers assess whether a particular study meets the minimum criteria for replication.
The aim of this study is to validate a personalized implicit measure of job satisfaction by employing semantically relevant stimuli tailored to this construct. The research addresses the need for tools that can reliably assess implicit attitudes toward job satisfaction, complementing traditional explicit measures.
Participants perform a computerized Implicit Association Test (IAT), adapted to focus on job satisfaction. The task categorizes stimuli—related to participants’ work experience and paired with satisfaction/dissatisfaction valence.
The expected results will demonstrate the construct validity and reliability of the personalized IAT for job satisfaction. We will also examine correlations between implicit and explicit measures to explore their convergent validity.
This study is part of a larger project aimed at developing implicit tools tailored to organizational settings. Future research will investigate their application in diverse workplace environments and their potential to predict key organizational outcomes, such as employee performance and retention.
Perceived anonymity online (PA) plays a critical role in shaping online interactions, influencing communication and behavior in virtual environments. Despite its importance, research on perceived anonymity online remains fragmented, particularly in understanding differences between how individuals perceive their own anonymity versus that of others. This study aims to address these gaps by analyzing data measuring PA of self and of others from a newly developed scale on PA online. The scale was developed drawing from established psychological theories and was systematically tested across three separate data collections on a robust Czech sample of adults (18-54; N=1971 in the first round). This study specifically aims to explore the similarities and differences in the structure in relation to responses to self and other anonymity. The goal is to present the result of the analytical process and seek feedback on the procedure and interpretation from colleagues with experiences with similar comparisons.
Clinical psychology is a field that has been slow to take up open science practices. The importance of rigour in this area is clear, with more transparent and replicable methods leading to greater trustworthiness of treatments. Metascience research in clinical psychology would help to establish whether research and patient outcomes are improved as a result of implementing these practices. However, it is unclear how best to assess the ways these practices could improve the quality of clinical psychological science. The uptake of open science practices may be limited by factors including competitive research funding and the challenges of recruitment, and measures of transparency may be hard to assess beyond other aspects of reporting on clinical trials. These practices have not yet been systematically measured in clinical psychology. This poster will highlight areas for metascientific study, as well as practices that may be promising for clinical psychologists to implement.
Many-analysts studies (Silberzahn et al., 2018) – whereby multiple analysis teams independently analyse single datasets to test predefined hypotheses – demonstrate the existence and effects of the so-called “garden of forking paths” (Gelman & Loken, 2014). Eye-tracking-based research could greatly benefit from a many-analysts approach, as validity in the field is threatened by the abundance of independent and dependent variables available for hypothesis testing (Orquin & Holmqvist, 2017). We therefore propose a many-analysts study using a visual attention and reading dataset to investigate the heterogeneity in pre-processing and analysing eye-tracking data. In this unconference, we will discuss the progress of the Stage 1 Registered Report being developed for the project, creating opportunities to receive feedback from the community, as well as invite further colleagues to join the project.
We recently developed a task battery comprising 12 tablet-based tasks to assess the development of executive and semantic control abilities in children (4–10 years) and explore the relationship between these processes. Preliminary data are promising: beyond addressing our research question, the battery shows potential to become a valuable tool for digital cognitive testing, which is currently lacking for cognitive control. To realize this potential, we aim to establish its validity and reliability.
We seek expert input on optimal validation approaches for construct validity assessment. Approaches we are considering are correlating task performance with the cross-culturally applicable Colored Progressive Matrices and Structural Equation Modeling. Additionally, reliability and test-retest stability will be assessed. In the long term, we aim for cross-linguistic validation.
We invite open discussion on best practices for validation and welcome collaborations (e.g., for multi-site replication) to ensure robust and generalizable findings.
In today's digital world, understanding how children filter out irrelevant information is crucial for cognitive science and education. We present a task battery designed to assess semantic and executive control in children aged 4-10 years, with initial data already collected in Italian. Semantic control tasks involve recognizing relations in meaning between stimuli, while executive tasks rely on non-semantic visuospatial features. Control demands are manipulated by pairing target stimuli with distractors that vary in their semantic or visuospatial similarity. Our next goal is to develop a multilingual, online app that integrates gamification techniques into these tasks. This platform will allow children to complete short, engaging game-based sessions from home using tablets or smartphones, enabling large-scale, cross-linguistic assessment and training. We seek expert feedback on methodological considerations in gamifying cognitive assessments for children, strategies for cross-linguistic adaptation, maintaining data quality in an online, gamified environment.
Despite the prevalence of open data, reusing existing datasets for new studies remains uncommon. Reasons include poor documentation, narrow datasets, and lower prestige compared to primary data, but greater uptake in secondary data analysis could save the field millions.
This hackathon aims to address this issue by creating a strategy document in three parts:
- Barriers: Identify obstacles inhibiting wider adoption of secondary data analysis, such as data access, licensing, documentation, training, and incentives.
- Solutions: Brainstorm practical solutions to overcome these challenges, including OSF features, training curricula, publishing and funding instruments, and team science for higher-value data.
- Actions: Translate solutions into actionable steps. What can researchers, lab groups, or institutions do to normalize secondary data analysis as both sharer and reuser?
Our output will be an action plan in Google Docs for making secondary data analysis more accessible and rewarding, improving the resource efficiency of psychological research.
Numerous studies confirm that researchers frequently misinterpret key statistics in published articles. A particularly prevalent issue identified previously is the tendency of researchers to misinterpret nonsignificance as representing no true effect. Accordingly, the present study aims to re-investigate this issue – to clarify the prevalence of nonsignificance misinterpretations in published psychology articles and it's changes over time. To achieve this, we looked at nonsignificance statements in the discussion sections of 599 articles across three time points (2009, 2015, 2021) from ten psychology journals of varying impact factors. We then coded each statement as correctly or incorrectly interpreting nonsignificance. Our results reveal a higher prevalence of these misinterpretations compared to prior studies (81% incorrect). Based on these findings, we urge researchers to reconsider how they report and interpret their results, with a focus on improving accuracy and transparency in the interpretation of statistically nonsignificant results.
This unconference will address two things regarding sensitive qualitative data: sharing such datasets for scientific (and other) reuse, and triangulating interpretations of such datasets by means of approaching them from different perspectives. The unconference starts with a short presentation about the challenges and solutions in our ongoing longitudinal project involving qualitative mental health data of minors and adults, across multiple countries such as Finland, Korea, and Slovakia. The presentation is followed by showing examples where participants of the unconference can test and reflect on specific anonymisation decisions. We end with a discussion about how a chosen qualitative analytic approach always leaves some known information hidden and thus leads to an outcome paradox: unlike in many quantitative analyses where each test can be shared via code, qualitative analyses necessarily leave some interpretations hidden and unreported—how we deal with that?
Most empirical research articles present a single analysis conducted by the authors. However, many-analysts studies have shown that research teams often use distinct analytical approaches to the same data, which frequently leads to varying conclusions.
To promote robustness and encourage a more diverse statistical perspective in research, we launched the diamond open-access Journal of Robustness Reports. The journal publishes concise 500-word summaries of alternative analyses conducted by independent analysts, which enables a more comprehensive and balanced interpretation of empirical data.
In this hackathon, we invite participants to propose target articles for Robustness Reports, that is, influential and widely debated scientific studies where alternative analyses could complement the original findings and provide valuable new insights. Participants will then break out into groups to conduct the reanalyses, write submission-ready Robustness Reports, and present their findings in a plenary discussion.
Qualtrics is one of the most used data collection platforms. Due to relatively easy-to-use point-and-click functionalities Qualtrics is used by both researchers and students. Some of these functionalities allow the owner of a Qualtrics survey project to modify previously collected answers and thus engage in actions that raise suspicions of data fabrication or falsification – two of “the clearest examples of research misconduct” (Netherlands Code of Conduct for Research Integrity, 2018, Chapter 5, section 5.2.A.1). Another way to fabricate data in Qualtrics studies is by taking the survey multiple times while claiming to be a different respondent. In this workshop, I will show how to check for signs of (1) changes of previously collected answers, and (2) repeated entries from a single respondent. This has implications for the design and implementation of open data policies, and for procedures that verify compliance with these policies.
The next generation of researchers and consumers of science must be equipped with knowledge of open and reproducible research to maintain and further scientific standards. Thus, educators and mentors must be able to provide a strong foundation in Open Science training.
To facilitate this, a team at the Framework for Open and Reproducible Research Training (FORRT) is designing a pedagogically-informed, evidence-based, self-guided program to support the teaching of Open Science. Three modules for educators interested in teaching Open Science are being developed through a 1) positive, 2) participatory, and 3) inclusive lens.
Our goal is to develop this program in close consultation with the community involved in Open Science. In this session, we would like to i) present this program; ii) discuss areas for improvement; and iii) identify opportunities to amplify its reach to educators in different academic fields.
Confirmatory factor analysis (CFA) is one of the most central analytic techniques in social sciences. However, model fit assessment still heavily relies on arbitrary cutoffs (e.g., RMSEA ≤ .05, CFI ≥ .95), which lack universal validity. Misinterpreting these thresholds contributes to the accumulation of misspecified models in the literature. In our talk, we introduce an up-to-date CFA fit assessment framework based on conceptual understanding, local fit evaluation, and dynamic benchmarks. Our tutorial comes with all the resources needed to bring CFA-fitting into the new century: an R script, empirical examples, and a decision tree. By abandoning inflexible cutoffs and embracing a more nuanced model fit evaluation, we can shrink the number of misspecified models in the literature and, consequently, refine our theories.
Existing methods to compare idiographic networks are global tests that indicate the presence/absence of difference. They typically release the equality constrains for all edges once the global test indicates heterogeneity, which leads to false positive for group difference in network structures. We therefore present the invariance partial pruning (IVPP) approach, which first evaluate the presence of heterogeneity with network invariance test and then determine the exact locus of edge equality and difference with partial pruning. Simulation results indicated that the invariance test based on AIC and BIC performed better than the invariance test based on LRT, and partial pruning successfully uncovered specific edge difference with high sensitivity and specificity. IVPP is an essential supplement to the existing network methodology by allowing the comparison of networks from time-series and panel data, and also allowing the test of specific edge difference. We implemented the algorithm in the R-package IVPP
The psychology research climate reveals two groups with complementary strengths. Junior researchers often excel in open-source software but need research experience and broader networks. Senior researchers possess these but may struggle with technological advancements like open-source tools. To bridge this gap, I collaborated with the eScience Center Amsterdam to create a need-based online matching platform. This platform connects advanced software users with beginners for 1:1 teaching sessions, leveraging eScience Center resources. It enables software novices to start quickly without lengthy courses while providing juniors with teaching experience and networking opportunities with senior researchers who could foster collaborations. This initiative fosters team science, promotes open-source software in work and organizational psychology, and addresses the lack of transparency in research methods observed in many papers. Enhancing transparency can improve research quality, replicability, and trust in our field, making research more impactful for academia and industry.
Despite a myriad of resources on the topic, Open Science is advocated more than it is practiced. Researchers seeking guidance often encounter information-dense websites with extensive, unorganized lists of resources. This can be overwhelming and sometimes even discouraging. Alternatively, researchers may turn to AI platforms for support, but these lack formal scientific quality checks. To address this issue, we have developed JUST-OS, an AI-based chatbot designed to streamline Open Science resources. JUST-OS helps researchers navigate initiatives, tools, and best practices related to Open Science. It categorizes resources based on discipline, specific practice, and the type of knowledge researchers are looking for. The tool leverages retrieval-augmented generation, an advanced AI-based natural language processing method, operating on a curated and regularly updated Open Science database hosted by FORRT. This ensures that JUST-OS provides researchers with accurate, discipline-specific, and practical guidance on open science practices.
Despite existing research practices, a gap remains between the acknowledgment of open science (OS) principles and their implementation among Canadian biomedical researchers. This cross-sectional study evaluated the use of reporting guidelines (RGs) and OS practices in Canadian-funded articles from 2022. Data was extracted from multiple databases and analyzed descriptively. Out of the 307 articles examined, only 18 (6%) reported using RGs, 9 (3%) adhered to a completed checklist, and 3 (1%) registered their studies. Data sharing remained limited, with 60 (19%) making their data available. Transparency measures were scarce, with 14 (5%) using a study protocol. Regarding OS practices, 146 (48%) of articles were open access, yet preprint use (5.5%) and data management plans (0.3%) were rare. Only one replication study was identified. These findings reveal gaps in research transparency and OS adoption, emphasizing the need for awareness to improve research quality and reproducibility in Canada.
Artificial intelligence systems, particularly in social media, are under increasing scrutiny (Reviglio & Agosti, 2020). TikTok’s recommendation system, renowned for its high degree of personalization, exemplifies this trend (Bhandari & Bimo, 2022). This lightning talk will present insights from an ongoing field experiment pre-registered involving 700 TikTok users. Participants were divided into two groups: a control group using TikTok as normal, and an experimental group that disabled personalization in their news feed for two weeks. Pre- and post-experiment measures include mental health, political polarization, and problematic TikTok use. By sharing our design, early findings, and challenges, we seek feedback from the scientific community to refine our approach. This talk also aims to spark discussion on innovative ways psychological researchers can study Human–AI Interaction, particularly in the context of highly adaptive systems like TikTok.
There exist strong theoretical and methodological ties between the disciplines of psychology and management. Yet almost none of the "top" journals in the field of management/ organizational behavior accept registered reports, despite many espousing a commitment to open science and reforms aligned with the credibility revolution. In this lightning talk I will provide information on the uptake of registered reports in the field of management, detail ongoing advocacy efforts for greater adoption (and invite others in attendance to join), and discuss personal experiences related to misperceptions of and objections to registered reports in the field of management/organizational behavior.
Verifying the analytical reproducibility of research findings requires access to the raw data, however this is often not possible due to ethical or practical reasons. That being said, one can sometimes still check the consistency of a set of reported numbers. For example, based on the test statistic and degrees of freedom, one can calculate the corresponding p-value and check whether it matches with the reported value. The Statcheck application (Nuijten & Epskamp, 2024) automates this procedure for any text uploaded by the user. However, results that do not exactly match the intended APA format are not recognized by Statcheck, and neither are corrections for multiple testing or assumption violations. The present study used AI to examine whether the extraction and subsequent verification of such tests can be improved. Preliminary findings using the gpt-4o-mini model on a set of manually coded papers, suggest marked improvements compared to Statcheck.
A fair and transparent attribution of authorship remains a pressing issue in academia. While established guidelines like APA’s and frameworks such as the Contributor Role Taxonomy (CRediT) offer structured approaches, students’ contributions to science often go unrecognized. Our research shows that 86.2% of German psychology students and 38.9% of researchers are unaware of existing authorship guidelines, and conflicts over authorship are widespread. To address this, a task force at RUB university has developed a guideline to systematically acknowledge student contributions using CRediT. This initiative integrates authorship education into curricula and fosters a culture of transparency in collaboration between researchers and students. By implementing such criteria, we aim to promote fairness in publication practices and encourage student engagement in academia. Our talk presents insights from our survey, outlines the development process of the guideline, and discusses its implications for academic institutions striving for equitable recognition of research contributions.
The current SIPS president will open the conference with the story of how they became involved in SIPS.
Keynotes: Simine Vazire, Eiko Fried, Fiona Fidler, Moin Syed, and Eric-Jan Wagenmakers
Moderator: Balazs Aczel
This year, we are departing from tradition at SIPS. Rather than inviting individual keynote speakers to open and close the conference, we will host two open roundtable discussions.
The conference will begin with a discussion featuring Simine Vazire, Eiko Fried, Fiona Fidler, Moin Syed, and Eric-Jan Wagenmakers, reflecting on the past, present, and future of scientific reform. We are interested in hearing speakers' personal stories and perspectives. At some point, the conversation will be opened to all SIPSers present. We are confident that the audience will gain new ideas and inspiration from the discussion and carry these into the SIPS sessions.
Note: This is a follow-up hackathon from 2024 Nairobi.
Understanding statistics is essential to researchers at all levels to interpret and apply findings and for the general population to critically evaluate information. While learning statistics can be challenging for those new to it, it has been shown that such experience can be improved through animation interactivity, at least for basic concepts (Wang et al., 2011) and Bayesian reasoning (Mosca et al., 2021). However, creating such interactive resources is time-consuming and requires specific skill sets. In 2024, SIPS, the group decided to build a website with a directory of other resources. Following that, this hackathon aims to start creating vignettes for statistic concepts and develop structured questions for better understanding.
Many metascience initiatives—such as reporting guidelines, checklists, and preregistration—have been introduced to enhance the rigor and replicability of scientific research. However, adherence to these good practices remains limited. Changing human behavior is challenging, and researchers are no exception. In this hackathon, we will explore the potential of lab manuals as a means to bridge the gap between methodological recommendations and everyday research practices. By establishing explicit norms within local research communities, lab manuals can serve as practical tools for fostering compliance with metascientific initiatives. Our goal is to develop a toolset and intervention protocol for metascientists who seek to facilitate this transition by supporting laboratories in creating their own tailored manuals.
A comprehensive meta-analysis was conducted to examine the effects and associations of apologies with forgiveness and unforgiveness (e.g., aggression, revenge) by synthesizing data from 87 journal articles, including both experimental and correlational studies, using Bayesian methods. The meta-analysis revealed that apologies were positively associated with forgiveness and were shown to promote forgiveness. Apologies were also negatively associated with unforgiveness, and were effective in reducing unforgiveness. Subgroup analyses revealed the influence of moderators such as relationship type, transgression context, and unforgiveness measures. The Bayesian framework, underutilized in social psychology, offers nuanced insights through direct probability statements and adjustments for potential bias. This research not only synthesized past research on the role of apologies in conflict resolution but also highlighted the value of Bayesian meta-analysis in psychological research.
At the in-person event, attendees will have the opportunity to submit an "on-the-fly" session. However, acceptance will depend on the number of competing sessions and the level of interest, so we cannot guarantee a slot.
Other attendees will be able to vote on these sessions, helping to shape the program in real-time. If you’re giving a lightning talk or presenting a poster, this could also be a great way to promote your session and attract interest. Stay tuned and keep an eye out for updates!
This unconference explores the potential of the Nix package manager to address these challenges and revolutionize reproducibility in psychological research. Nix offers a powerful solution for creating fully reproducible software environments, allowing researchers to specify exact versions of R, packages, and system dependencies.
This approach goes beyond traditional package management, ensuring that analyses can be reproduced across different systems and time periods. Special attention will be given to Rix, an innovative R package that harnesses the power of Nix for people unfamiliar with Nix.
The unconference will cover practical applications of Nix and Rix in psychological research, addressing challenges such as dependency management, version control, and long-term reproducibility. Participants will gain insights into how these tools can enhance the credibility and replicability of their research, potentially transforming the landscape of psychological science methodology.
Notes:
- Nix only runs on macOS and Linux
-Familiarity with programming and version control is recommended
Building custom web applications for psychological and behavioural research can seem daunting, but accessible Large Language Models (LLMs) and affordable front- and back-end solutions are making it easier than ever. This workshop provides a hands-on exploration of these advancements through two case studies: Port, a data donation platform for collecting app usage data, and O-ELiDDI, a web-based diary for time-use data. Participants will learn how to fork, modify, and deploy these open-source web applications with minimal coding, leveraging GitHub for version control, LLMs for feature development, and low-cost data storage services.
Whether you are interested in building open-source instruments, gathering user-data for psychological research, or conducting time-use studies, this workshop offers practical strategies for creating or tweaking your own customized solutions suited to your specific research needs.
Measurement is the foundation for any field of science. Many social scientists take for granted that survey instruments measure the supposed construct of interest such as depression, intelligence, and happiness. Social science research on COVID-19 has made fast progress in studying psychological attitudes and behaviors, but the same fast progress comes at the expense of disregarding measurement recommendations that researchers propose to improve the field. Rating scales may fail to capture respondent's underlying attitudes because of differences in the statement wording, the response options available, and whether they form a composite. Discussion regarding the origins of SARS-CoV-2 has evolved over time and differences in measurement can obscure the magnitude of the supposed change in public attitudes and beliefs regarding the topic. This talk proposes an open-source database that compiles survey item statements relevant to measuring beliefs and attitudes on the origins of SARS-CoV-2.
The integration of refugees presents complex challenges, particularly in non-WEIRD contexts. This talk examines the practical and ethical challenges of researching Ukrainian refugee integration in secondary cities of Central and Eastern Europe (CEE), including Košice, Miskolc, Poznań, Kraków, Brno, and Uzhhorod. These cities are transitioning from historically migrant-sending regions to more permanent refugee destinations.
Researching in non-WEIRD settings, especially with vulnerable populations, requires a context-sensitive approach. In regions with limited institutional oversight, safeguarding welfare, obtaining informed consent, and ensuring confidentiality are critical. The study highlights the importance of considering local cultural norms and resource limitations.
The talk also emphasizes the need to involve refugees as active collaborators, ensuring their voices inform meaningful change.
Keywords: Ukrainian refugees, secondary cities, CEE, ethical and practical considerations
Funding: Supported by the International Visegrad Fund (#22330013) and the Slovak Academy of Sciences Postdoctoral Grant Program "PostdokGrant" No. APD0061.
Because many strong theories make clear predictions, in principle there may be no advantage to registering such a theory’s predictions in advance of data collection (Szollosi & Donkin, 2021). Some researchers have therefore argued that preregistration is not worthwhile (Szollosi et al., 2020). Yet psychologists have not prioritized the development of strong theories, or even the use of existing ones. Can we do something about this? I will argue that preregistration is, ironically, an important part of the answer. Strong theories are underappreciated in part because of the scarcity of both the skills and the time required to determine that a theory is strong and makes clear predictions. But because everyone recognizes the value of correct predictions, preregistration can help strong theories rise to the top. However, additional practices such as head-to-head comparisons of competing theories (Dutilh et al., 2018) may also be necessary.
This lightning talk will pitch how large language models (LLMs) can serve as intellectual partners in the classification of psychological phenomena in text (“psychological text classification”). Drawing on empirical work (Bunt et al., in press) where we developed and tested the validity of LLM-driven classifiers for phenomena such as reported speech and conversational repairs, I will argue that prompt-based interactions with LLMs can quickly generate insights on how to refine conceptualisations and operationalisations in the realm of text classification. Rather than replacing human coders, LLMs act as “collaborators” in an iterative cycle of classification and feedback, helping researchers spot ambiguities in definitions, catch errors, and challenge assumptions. This synergy can strengthen validity in psychological measurement, enabling both more robust conceptualisation of psychological phenomena in text and the efficient scaling of text-based research. By embracing LLMs as intellectual partners, we can advance methodological rigour and improve psychological science.
We describe how we use "repliCATS in the Classroom" as a tool for introducing first-year undergraduate students to the credibility crisis in Psychology and building their critical thinking skills. We present data from two iterations of this exercise, involving around 3,800 students at the University of Melbourne, demonstrating how structured deliberation process scaffolds—such as nudges to mitigate individual and social biases, along with methods for aggregating perspectives based on the "wisdom of crowds"—shape students' understanding of replicability and its significance within the scientific process. We also compare their evaluative judgments with those of field experts and actual replication outcomes for the studies they assessed. Overall, we argue that repliCATS in the Classroom offers an authentic and engaging opportunity for students to evaluate psychological research critically and subsequently responds to calls to foster critical thinking and organised scepticism within Psychology.
This paper is part of replication studies that I4R of Ottawa University coordinated in different events for replication games. This study evaluates the effectiveness of varying levels of human and artificial intelligence (AI) integration in reproducibility assessments of quantitative social science research. We computationally reproduced quantitative results from published articles in the social sciences with 288 researchers, randomly assigned to 103 teams across three groups — human-only teams, human teams with AI assistance (“cyborg”), and AI teams ("machine"). Findings reveal that most “cyborg” and “human” teams completed reproducibility checks (91% and 94%, respectively), while only 37% of “machine” teams successfully reproduced findings, highlighting challenges for automated only reproduction. The “human” teams detected statistically significant coding errors compared to the other groups. 94% of teams proposed at least one quality robustness check, with “human” and “cyborg” teams performing better. These results underscore the strengths and limitations of AI assistance reproduction.
For decades, anonymity has been used in explanations for increased antisocial behaviors. Anonymity has been further problematized in an online environment that allows for diverse forms of self-presentation and privacy management. Yet, anonymity remains insufficiently defined and vaguely measured, limiting our understanding of its role in online communication and behaviors. In our study, we developed a new complex approach to anonymity as a subjective perception. In this contribution, we will elaborate on the process of identifying the diverse dimensions of perceived anonymity in online interactions and their operationalization. The new multidimensional measure of perceived anonymity online was tested on a Czech adult population (N = 1,971). We will focus on the issues linked to the identification of the (sub)dimensions and discuss their reflective and formative nature. We propose a new approach to the concept of online anonymity that helps in understanding its complex nature.
Existing psychological models often focus narrowly on individual mechanisms, such as motivation and cognition, or broader societal factors, such as culture and ideology, without integrating these perspectives. My PhD project seeks to address this gap by developing a comprehensive model of pro-environmental behavior using Doise’s (1982) levels of analysis. This approach combines internal psychological processes with social and cultural factors to better predict behavior.
Key challenges include identifying relevant models, isolating variables, integrating them, and ensuring robust statistical power. In this communication, we evaluate the strengths and limitations of various review methodologies, such as rapid reviews, systematic reviews, umbrella reviews, and scoping reviews (e.g., Grant et al., 2009). We highlight the PRISMA-S method (Rethlefsen et al., 2021), which ensures transparent and rigorous reporting of article selection. Finally, we explore data extraction and its application through structural equation modeling (e.g., Dash & Paul, 2021) to synthesize variables.
Naming practices such as mentioning the sample’s country of origin in your title/abstract has shown to be skewed towards Global South countries. Routinely, scholars researching WEIRD populations don’t mention the sample geography that their findings are in relation to. In this lightning talk, I will discuss the implications of this localization on key indicators of academic impact, such as social media mentions, policy references, and traditional citation-based metrics. Open-source databases like OpenAlex, with improved coverage of Global South research, allow us to investigate these bibliographic questions of meta-scientific interest. By assessing the relationship between localization in article titles and research impact, we can critically evaluate the persistent norm of broad generalisation in scientific findings. This discussion seeks to challenge overgeneralization and the inappropriate application of research conclusions across diverse global contexts.
Concerns about the construct validity of psychological measurements are increasing. In this lightening talk, I propose a distinction between the validity potential of psychological measurement procedures (e.g., personality tests) and the realised validity of the scores they produce in specific instance (e.g., personality test scores collected in a study) and describe how this distinction can facilitate better construct validation practices and encourage more frequent validity evidence reporting.
We investigated the Picture-Word Interference Effect by parametrically manipulating semantic similarity while controlling for lexical association and orthographic/phonological similarity. We also aimed to control for lexical-semantic variables at both the picture and word levels, picture-specific visual properties, and stimulus-independent confounders. However, collinearity is a major challenge in psycholinguistics as linguistic variables are highly interrelated. Additionally, with predictors for both pictures and words (e.g., word frequency for the picture name versus the distractor word), it becomes unclear which are more critical for performance modulation, complicating predictor selection. This affects model building, interpretability, and robustness. We seek guidance on detecting and mitigating collinearity while ensuring theoretically relevant controls. Should we use dimensionality reduction, residualization, or penalized regression? Furthermore, we welcome insights on structuring mixed-effects models to balance complexity and convergence. Feedback from experts in statistical modeling and psycholinguistics would be invaluable in refining our approach.
Calls to improve the construct validity of psychological measurements are increasing. However, there is less information available about how to improve, or even evaluate, construct validity. One potential barrier to developing clear construct validation guidelines is the undifferentiated inclusion of measurement and non-measurement (e.g., prediction, diagnosis) uses of test scores under the umbrella of construct validity. This can be problematic because the characteristics that make a test a good measure of a construct can differ from the characteristics that make a test good for other uses like prediction or diagnosis. In this hackathon, we will develop a framework for evaluating psychological tests that distinguishes between key test uses and summarises the most relevant sources of validity evidence based on each test’s type and use.
The FORRT community has prepared 200+ summaries of Open and Reproducible Science literature. The purpose of these summaries is to reduce some of the burden on educators looking to incorporate open and reproducible research principles into their teaching as well as facilitate the edification of anyone wishing to learn or disseminate open and reproducible science tenets. In this hackathon, we invite you to review the summaries, i.e., checking that the content of the summary faithfully represents the original article and improving the text to your best capacity. Contributors will be acknowledged on the website and those who fulfil our requirements (i.e., 10 reviews) will be invited to co-author any resulting manuscripts. The summaries will serve as a valuable resource for those with limited time or access, promoting educational equity.
As academics, we often encounter situations where our ideals conflict with the incentive structures of academia. Navigating these challenges while staying true to our values—both as individuals and as scientists—can be daunting. These conflicts may range from career-defining decisions to moments of self-censorship on academic social media platforms and everything in between.
This unconference provides a space to openly and respectfully explore these issues. While individuals may differ on what constitutes the "right" course of action in specific cases, the goal is to focus on the "how" and the "why" of maintaining both personal and academic integrity, even when incentive structures seem misaligned.
Suggested discussion topics might include, but will not be limited to: university policy, manuscript review, academic social media conflict, getting along with colleagues, unreasonable demands of higher-end journals, publication of replication of null results, dissertation or tenure requirements, etc.
Critical thinking is essential for good science and a well functioning society. Therefore it has a central position in academic education, as a methods as well as a goal. But what is critical thinking exactly? How do we best teach it to students? How do we deal with deviations from critical thinking in peers? In this unconference we will explore different perspectives on critical thinking and discuss concrete ways to foster it in both education and research. Possible topics are (but are not limited to): The nature of critical thinking; Teaching critical thinking to students effectively, and; Applying critical thinking in research practice.
Whether you teach, mentor, or are making an effort to think critically yourself, join us to work together actively, exchange ideas and strategies, reflect, and learn.
Participants will practice creating a GitHub repository and adding/editing code and text in Rstudio. A few tools will be provided to help them take in these practices and apply them in their everyday workflows. As a result, they are expected to have resources to help them make their research more open and reproducible, including the whole research workflow and not just its outcomes.
Outline:
- Creating a repository in GitHub.
- Cloning the repository locally with Rstudio.
- Tracking changes.
- A few good practices for versioning research projects.
Intended audience: Participants should ideally be proficient in R (or Python); if they are not familiar with these, they can still participate but they will not be able to get the full potential from it. For best use of the workshop, a "BYOP" (Bring your own project) format is encouraged: Participants can use their own data, script(s), notebook(s), and/or computable documents.
In this unconference, we start with an informal debate about the role of financial profit in science. We make arguments for and against financial as well as other profits, and envision possible futures with and without publishers. We invite the participants to share their critical and positive views about the economic system of science — not in the spirit of teaming up against corporate academic stakeholders, but pragmatically looking at what is (and historically has been) im/possible. We don't know what will happen in the unconference, beyond this general outline. We expect people to share unexpected views, and we prepare a list of difficult questions, for example, related to volunteer labor that goes into current initiatives such as registered reports and non-profit platforms.
The FORRT Replication Hub is a pioneering open-science initiative that houses the largest database of replication studies in the world. Our mission is to increase the visibility, accessibility, and impact of replication research across disciplines, ensuring that replications are embedded in research, education, and policy. This unconference aims to engage the SIPS community in a dynamic discussion on how we can leverage our growing repository of replications, develop best practices for integrating replication work into mainstream science, and explore innovative uses of our interactive tools. We will highlight two core Shiny applications developed to enhance the usability of replication data:
- Replication Annotator – a tool to assess the replicability of research findings in reference lists and syllabi.
- Replication Explorer – an interactive platform for visualizing replication effects and conducting meta-scientific analyses.
- Replications and Reversals initiative
- Collaboration with COS's SCORE and SMART Projects (and other partnerships)
Has your experimental design ever been constrained by the tools you had available to present stimuli and collect responses? Have you ever had to re-create an experiment from scratch because the tools used for the original were expensive, proprietary, or obsolete? Open psychological science works best with open tools, but what tools exist, and where are there gaps? This session has two parts: First, what kinds of tools already exist that support open, rigorous, and replicable research in different subfields? This discussion may lead to a new resource to help researchers find the best tools for a given project. Second, what does it actually take to make a new open science tool for running experiments? I’ll talk from my own experience developing software tools for developmental psychology research, and the discussion will focus on how we as individuals and as a field can create the tools we need.
Errors are an inevitable part of research, yet academia often lacks a constructive and systematic approach to error management. Fear of reputational damage when errors are uncovered discourages data sharing and the adoption of open science practices. The stigmatization of errors in science undermines the sharing of research data whose availability is central to the reproducibility of results.
This unconference explores how systemic, group, and individual factors shape researchers' error-handling and data-sharing behaviors. Discussions focus on the role of disciplinary norms, research and publication infrastructures, and their influence on perceptions of errors. Different taxonomies of errors are discussed in relation to our everyday research experiences, highlighting gaps and inconsistencies in current approaches.
These discussions are intended to lay the foundation for a future empirical study among researchers from different disciplines on the handling and perception of errors in the scientific process.
It is increasingly common for editors to ask for high-powered, direct replications of studies when papers are invited for revision. However, norms regarding what to do when these replications fail do not exist, either for authors attempting to reframe their findings, or reviewers/editors reevaluating the work. A lack of clear expectations and norms risks not only confusion and consternation, but possibly reinforcing perverse publication incentives where only “statistically significant” results are valued rather than strong and valid tests of theoretical questions. This Unconference will begin with the organizer and participants discussing their experiences with failed replications during peer review. The group will then discuss the “ideal” way these are handled, from the perspective of both authors and reviewers/editors. Given interest, the group will then potentially organize to produce a short methodological manuscript that addresses these questions and provides guidance for editors and authors confronting these issues.
A pilot study typically refers to a smaller-scale, preliminary study conducted to refine methods and procedures ahead of planned data collection. Piloting is common in psychological research, and can significantly influence research outcomes, but little guidance exists on how to design, conduct and report such studies. To find out how to improve transparency of piloting in psychology, we conducted an international survey on piloting practices, attitudes toward reporting pilots, and perceived barriers to doing so across psychological subfields. Based on data from N = 135 researchers, we found that researchers do not consistently report their pilot studies, but they agree on the importance of including basic pilot study information. The survey also highlighted the diversity of piloting practices and their influence on the research process. In the current talk we will present the findings from the survey as well as future avenues to improve transparency of pilot reporting in psychology.
Construal Level Theory (CLT) posits a positive, reciprocal relationship between psychological distance and construal level (abstract vs. concrete). Despite a sizable literature, the evidence might not be as strong as it appears due to low power and publication bias. The goal of this empirical audit was to perform a large-scale replication of a randomly selected set of CLT studies with larger sample sizes (2.5 times the original sample sizes, total N = 6513; Simonsohn, 2015). We replicated 20 published studies, sampled through our comprehensive search and selection procedure. Nineteen out of the 20 studies failed to replicate, with average effect size less than r = .10, which was significantly smaller than the original studies. The results suggest that if the originally reported effects do indeed exist, that the original studies were inadequately powered to detect them.
Bago et al. (2023) suggested that elaborate reasoning enhances motivated reasoning, where individuals process information in ways that align with their beliefs. To test and refine these findings, we plan a two-stage study: First, a replication of the original study will be conducted with the reduction of potential sampling biases and a power analysis based on the original article. Second, we will proceed with a replication extension involving the manipulation of motivated reasoning. For this, we plan manipulation check studies to assess the effectiveness, and then we will submit the replication extension as a Registered Report. This process generates challenges, particularly regarding research practices (e.g., accessibility of data and analyses, choice of statistical models), as well as methodological difficulties, especially in conducting a priori power analyses for linear mixed models. These challenges and the strategies employed to address them will be presented and discussed.
We replicated Hsee (1998), which found that when evaluating choices jointly, people compare and judge the option higher on desirable attributes as better (“more is better”). However, when people evaluate options separately, they rely on contextual cues and reference points, sometimes resulting in evaluating the option with less as better (“less is better”). Results support the (surprising) “less is better” effect across all studies (N=403; Study 1 original d = .70, replication d = .99; Study 2 original d = .74, replication d = .32; Study 4 original d = .97, replication d = .76), with weaker support for the (obvious) “more is better” (Study 2 original d = .92, replication dz = .33; Study 4 original d = .37, replication dz = .09). I will discuss interpreting the meaning of a study when the “surprising” part replicates but the “obvious” part doesn’t. It suggests the purported mechanism behind the effect may not be supported.
Previous research has indicated that sleep promotes memory consolidation. Further, it has
often been suggested that sleep mainly consolidates emotional memories, and that sleep
reduces the emotional reactivity associated with aversive experiences. Recent meta-analyses
have, however, revealed that previous studies on this topic have been strongly underpowered,
and that selective publishing of positive findings is a major problem in the field. We aim to
remedy this through a pre-registered, well-powered, multi-lab collaboration study in which we
will examine A) whether sleep, compared to wake, increases memory consolidation, B)
whether this putative sleep-dependent consolidation benefit is more pronounced for negative
compared to neutral stimuli, C) whether sleep, to a larger degree than time spent awake, decreases
emotional reactions to previously viewed negative images, and D) whether any sleep stage
will be particularly associated to either memory consolidation or in decreasing emotional
responses.
Virtual reality (VR) offers innovative opportunities research empathy-related processes in controlled, immersive environments (Riva et al., 2016). This presentation highlights how VR can be leveraged to examine responses to infant crying, a distress signal that elicits emotional and physiological processes in caregivers (Zeifman, 2003). By simulating infant crying scenarios, VR permits assessing empathy responses in real-time while minimizing ethical and logistical challenges of observing such processes in naturalistic settings (Slater, 2017). This presentation will present the design of a VR setup that simulates infant crying with integrated real-time self-report ratings for empathic concern and personal distress. Additionally, biometric correlates (e.g., heart rate variability and pupil dilation) will be measured to corroborate self-report data. Implications for understanding empathy in caregiving will be discussed. This research bridges developmental science and technology to inform interventions that enhance caregiver sensitivity.
Causal claims are key to the evolution and consolidation of theoretical frameworks in social science. However, manual identification of causal claims, extraction of cause-effect pairs, and synthesis of them into direct acyclic graphs (DAGs) remains an important challenge. The volume of literature, cognitive biases in interpretation, lack of transparency, and reproducibility make manual synthesis ineffective and unreliable. To overcome these challenges, moving to a new paradigm using natural language processing techniques (NLPs) is imperative.
In this talk, I will present our NLP pipeline that automates extracting causal claims from social science papers, identifies cause-effect pairs with polarity (positive, negative, neutral), and constructs DAGs representing these relationships. This pipeline uses large language models (LLMs) and other NLP techniques to improve the model precision in DAG construction. I will explain how this system works and highlight its potential application to theoretical research and evidence synthesis.
Psychological research faces many intertwined challenges concerning theory development,
cognitive and behavioral modeling, measurement precision, replicability of results, and the
ethics of experimentation. A pressing question is: are our approaches to studying
psychology sufficient for (eventually) reaching goals such as predicting and explaining human behavior and consciousness? In this hackathon, we employ a simulation approach to start tackling this question. In particular, we create several “worlds” for researchers to investigate, each following rules that become progressively more complex. These rules are informed by challenges that researchers face when studying
psychological phenomena. For each world, an agent-based model is created and
output is provided to the teams of “scientists”. They are invited to figure out these
rules for which they can use face-value data (e.g., number of agents), perform
measurements (e.g., movement speed of agents), and conduct experiments (e.g., manipulate
the environment of the agents).
Significant reforms have been made to ensure the quality of confirmatory research, however, little exists on understanding how to determine the quality of exploratory research. The ability to assess the quality of exploratory research is vital to increase publication and funding of exploratory work. During this hackathon we will look at the different types of exploratory research and question the criteria required to assess its quality.
Co-production/public engagement (PE) practices are being implemented in research as a reckoning with academia’s longstanding history of exploitative processes which have damaged the trust between researchers and the communities they research. When done well, PE can produce research which appropriately reflects the lives of those at the focal point of our research. However, for communities historically marginalised by academia, many still experience these endeavours as coming from a “tick box” mentality for individual career development, e.g tokenistic engagement of youth who use drugs in harm reduction research led to feelings of further marginalisation (Stowe et al., 2022; https://doi.org/10.1186/s12954-022-00663-z). Drawing from theories of community organising and mutual aid, this “unconference” is a space for sharing best (and worst) practice. Together we will discuss PE approaches that decentre the academy, and re-imagine research as a tool for collective growth.
In this hackathon, we, researchers from developing countries, will share our experience adopting open science practices in resource-limited settings. Each speaker will share their perspective on the opportunities the open science movement presented and the macro- and micro-level barriers they faced. Moreover, each speaker will share their experiences and backstories in adopting open science practices in their careers. These experiences have been collectively summarized as a four-level guide to help researchers engage in open science at their own pace: (1) utilizing open resources to establish a solid foundation for rigorous research, (2) adopting low-cost and easily implementable open science practices, (3) contributing to open science communities through feasible actions, and (4) assuming leadership roles or initiating local communities to foster cultural shifts. We will also discuss the potential caveats of engaging in open science and proposing concrete steps for future collective actions.
Knowledge advances by theories and their iterative testing. Yet, psychology is facing a theory crisis, characterized by ambiguous theories and the lack of formalization (Muthukrishna & Henrich, 2019; Oberauer & Lewandowsky, 2019). In this unconference, we want to bring together theory enthusiasts from various fields and discuss measures to facilitate and incentivize more rigorous theorizing in psychological research. As a concrete example, we introduce a set of indicators for the assessment of theoretical rigor that are meant to establish hygiene standards for the development and testing of theories in psychological research. After a brief input, we invite all participants to gather in groups and discuss the utility of such indicators in practice or develop their own ideas to improve and incentivize more precise theorizing in psychological research. Feedback and ideas will be collected, and we will explore the potential for tangible initiatives that could follow from this unconference.
Join other SIPS attendees on a sightseeing boat on the Danube River! The boat will cruise between 6 pm and 8 pm, but the venue will be available until 9 pm. Finger food will be served, including vegan, vegetarian, glucose- and lactose-free options.
One ticket to this event is included with regular meeting registration. To purchase additional tickets for friends and family, go to https://sips.wildapricot.org/event-6177322
We at the Leibniz Institute for Psychology (ZPID), an open science institute for psychology in Germany, are revising our preregistration platform "PreReg". To tailor it even more closely to the needs of the psychological research community, we want to involve the community in all steps of the development process. We have already conducted a survey to find out which features are considered important, and we are currently working on a prototype that we want to scrutinize together with the hackathon participants. Specifically, we want to conduct a joint test of the platform to collect issues and ideas for improvements. Additionally, we want to consider with the participants which metadata preregistrations should contain to ensure they fulfill the FAIR principles.
With our hackathon, we want to allow participants to help shape our preregistration platform directly. To recognize the participants’ contributions, we will thank everyone on the "PreReg" website.
At the in-person event, attendees will have the opportunity to submit an "on-the-fly" session. However, acceptance will depend on the number of competing sessions and the level of interest, so we cannot guarantee a slot.
Other attendees will be able to vote on these sessions, helping to shape the program in real-time. If you’re giving a lightning talk or presenting a poster, this could also be a great way to promote your session and attract interest. Stay tuned and keep an eye out for updates!
Trust was at the core of the scientific system that led into the replications crisis: We trusted that individual researchers are well trained, have expertise in the methods they use, apply them as intended, and report the results honestly.
This trust was shown to be unwarranted in several cases.
The Open Science movements' effort to regain trust is predominantly centered around methodological rigor and extensive transparency.
While these efforts are important and commendable, we argue that this focus on methods and openness is probably short-sighted when it comes to dealing with a profound crisis of trust, even leading to unwanted side effects.
To broaden the perspective on the matter, we'd like to explore the concept of "trust" in IT security, draw parallels to the field of psychology, and propose a framework for the role of trust in the replication crisis.
Funders and institutions have put increasing emphasis on researchers disseminating their findings to wider audiences via social media, popular science books, public-facing talks and other platforms. However, public understanding of science is impeded by poor communication of study findings, leading to misinformation and confusion. With this confusion comes pressure on policymakers to be guided by public sentiment rather than rigorous research findings, which may diverge in their conclusions.
In this Unconference, we want to lead a discussion about how the psychological research community can adapt its public engagement to a new era. Topics may include:
- The risks posed by the increased use of preprints;
- How to combat public misinterpretation of study findings;
- Ways to ensure that expert researchers are centred;
- The promotion of pseudopsychological content by influencers on social media and
- Direct threats against those who seek to highlight pseudoscientific misinformation in the public sphere.
“Preprints”—scholarly manuscripts not yet captured by the publication industry—have greatly facilitated science communication speed and accessibility. Yet, the “intellectual perestroika” of online prepublications (Harnad, 1990) hasn’t been realized: Preprints continue to be treated as less authoritative versions of their “published” counterparts. Moreover, the services that underlie this gap in perceived authoritativeness—editorship, peer-review, publicizing, discovery, etc.—can be provided for preprints but commonly aren’t, and are provided by academics but incorrectly credited to the publishing industry. Why? To answer, we will conduct thematic discussions to identify and examine factors that hinder the appeal and adoption of preprints as first-class science communication citizens. We will then develop methods for overcoming these obstacles. By doing so we hope to move towards Harnad’s (1998) vision of the “final state toward which the learned journal literature is evolving”: Preprints are all we need.
While the social sciences have adopted preregistration as a preferred method to prevent bias, astrophysics and cosmology have embraced analysis blinding to safeguard confirmatory research since the early 2000s. In this workshop, I will discuss the strengths and challenges of analysis blinding, a technique where data is temporarily altered before analysis. I will briefly discuss empirical findings comparing analysis blinding to preregistration and highlight the types of projects where this approach is particularly valuable. As a practical exercise, participants will have the opportunity to apply analysis blinding to an empirical dataset.
Some publishers perpetuate practices that conflict with the interests of scholars. High publication-related fees are one example. An emphasis on crude metrics such as impact factor rather than quality evaluation of journal practices is another. We will describe initiatives that provide alternatives and work on actions to support these. Peer Community Ins perform all the intellectual work of a journal in a way that is free for authors and readers; relevant communities including PCI Registered Reports and PCI Psychology. In the metascience area, MetaROR.org provides peer reviews of preprints (which can optionally be submitted to journals). The Free Journal Network curates scholar-controlled diamond OA (free to publish in and free to read) journals, provides relevant advice, and can support editors interested in “flipping” to this model. In this hackathon, we will develop strategies to facilitate the movement of psychologists and psychology journals towards these publication practices.
Piloting, the pre-testing of a method ahead of planned data collection, plays a vital role in psychological research. Yet, information about the what, how, and why of pilot studies is rarely included in final publications.
To help motivate a culture where reporting piloting studies is normative and easy, our working group has developed templates for reporting pilot studies. This hackathon aims to bring together researchers from diverse backgrounds and fields to test and refine these templates to ensure they are both useful and usable before promoting widespread use.
The session will begin with a short overview of the template development process. Participants will then engage in ‘user testing’ and apply the templates to their research. We will end with group discussion and feedback on suggested adaptations. Those interested in participating further will also be welcome to join the working group.
Pre-registration and Registered Reports can help diagnose the verifiability of science. However, they cannot inform about when or why research deviated from the intended plan. Moreover, questionable research practices (QRPs) will prevail throughout any research process —even completely honest researchers will make mistakes.
Radical Transparency (RT) is the practice of not only making public research outcomes (research plans, protocols, code, data, results) but also the whole process of developing them, in a “collaborative open-source-like” fashion. While RT may help improve the openness and transparency of science, many questions remain about what it really means and how we can implement it:
What is (not) RT?
Can RT actually be achieved? How, if so?
Is it worth pursuing RT?
How do current/future technologies afford the requirements of RT?
What are the uses (and misuses) of RT?
This unconference intends to address unknowns like these.
In this unconference, we want to explore the potential of the many-analysts approach in the context of exploratory research. To date, the many-analysts approach—where multiple research teams address the same research question using the same dataset—has been primarily applied to confirmatory research. In that domain, it has demonstrated a surprising diversity in how researchers preprocess data, operationalize key constructs, and select statistical models to test hypotheses of interest. However, its potential for exploratory research has received relatively little attention.
We believe that many-analysts approaches could provide valuable insights into intriguing patterns in the data and enable a systematic exploration of the variable space. These insights, in turn, can inform auxiliary assumptions, inform theory development, and guide the design of subsequent confirmatory studies.
Causal inference is vital in psychological science, but clearly defining causal questions and relationships remains challenging and rarely achieved—weakening the robustness and clarity of analyses in observational studies. A directed acyclic graph (DAG) is a tool to help researchers illustrate their understanding of causal relationships between variables. Addressing concerns about how the time spent using digital technologies affects young people’s wellbeing, we aim to use DAGs to identify bias and inform appropriate adjustment strategies for analysis of observational time-use data.
We will employ an adapted collaborative DAG development procedure and gather feedback for its refinement. Our goal is to create a DAG that transparently captures experts’ knowledge of sources of bias for the relationship between digital technology time-use and wellbeing. The hackathon offers a chance to develop a DAG relevant to a broader range of young people by encouraging cross-disciplinary discussion with researchers at SIPS.
In clinical psychology, as in almost every research field, the significance of collaborative efforts cannot be overestimated. Therefore, the question is not whether the field needs Big Team Science (BTS) to reveal the complexity of the social mind, but how researchers shall achieve this, particularly when studying vulnerable populations.
While a large sample size holds the promise of investigating key theoretical assumptions with sufficient statistical power, the complexity of a BTS collaboration poses challenges and limitations. I would like to engage in a discussion with the community regarding potential study designs and tangible strategies (e.g., networking, patient access, funding, etc.) for establishing a BTS group in the domain of clinical psychology.
2024 has again been the hottest year on record, with average global temperatures exceeding 1.5°C. We know that climate change is human-made. Increasingly researchers are looking at their own (professional) behaviour and its environmental impact.
In this session, we quickly present estimates on CO2 emissions from a specific conference. Together with the audience, we will discuss:
1) Their personal reaction to data on academic’s (travel)behaviour and environmental impact
2) Concrete intervention approaches to raise scientists’ awareness and act upon their green conscience.
3) Levers for scientific societies and future psychological conference organizers to decrease our collective environmental impact
We welcome ideas on how to tackle academic environmental impact. If you would like to pitch ideas with slides just drop us an email: lisa.warner@medicalschool-berlin.de (some days before).
Let’s create a hive mind on how to make psychological academia more pro-environmental!
Researcher diversity remains a significant challenge in academia, with minority scholars in both the Global North and Global South continuing to be underrepresented. The exclusion of women and other marginalised groups within the Global South is further overlooked. Existing diversity metrics, such as using institutional affiliation as a proxy, fail to capture these disparities, while API-based predictive tools like Genderize, Ageify, and Nationalize can misclassify authors' gender or nationality, reinforcing existing biases. In this unconference, we will examine the limitations of current diversity measurement tools and methods. Are there alternative, more nuanced approaches for measuring researcher diversity in psychology? How can we develop metrics that truly reflect the diversity of the global research community without oversimplifying representation? By unpacking these questions, we aim to expand the conversation on authorship diversity and reimagine diversity data in ways that better support the academic pipeline for all scholars.
In this hackathon we will work on a benchmark for AI evaluations of preregistrations. When it comes to the assessment of preregistrations, human labeled data scales poorly due to the need for a lot of expert labor. Instead, we will focus on synthetically generated preregistrations in which we know the ground truth: what is described adequately and what is missing/inadequately described.
Possible project tasks:
Gathering or creating flawless preregistrations.
Find ways in which important components of a preregistration can be broken
Formulate prompts and AI workflows for generating versions of the flawless preregistrations that are wrong in specific ways.
Create a database for synthetic preregistrations and ground truth about which components are adequate and which are not
Write code that would allow quick benchmarking by AI or Human coders.
Validate synthetic preregistrations by evaluating adequacy of the synthetically generated preregistrations.
Scientific conclusions often rely on generic statements (e.g., “X improves Y”), which may imply unwarranted generalisability. Prior research suggests that generic claims are perceived as more important than qualified statements (e.g., “X improves some of Y”), raising concerns about scientific communication. However, it remains unclear whether wording affects reproducibility—the extent to which different researchers reviewing the same data reach the same conclusion. To investigate this, I propose a vignette study where researchers evaluate study results and conclusions framed either generically or with a qualifier. If non-universal wording increases agreement, this suggests that qualifying claims improves robustness. This study builds on Multi100, a many-analyst project showing low analytical robustness in the social sciences. By exploring how language influences reproducibility, this research aims to improve best practices in scientific reporting and enhance the clarity and reliability of published findings.
In cognitive sciences, behavioral studies often assume that increasing the number of trials improves the precision and magnitude of observed effects, enhancing the likelihood of detecting the effect of interest. This study tests this assumption by examining the performance of N participants across three widely used experimental paradigms (Simon effect, SNARC effect, Task Switching) with a high number of trials (>400). Surprisingly, our findings reveal that precision is not directly proportional to the number of trials. Instead, each paradigm shows a task-specific suboptimal precision point, beyond which additional trials do not improve statistical estimates and effect sizes. These results challenge the conventional notion that "more trials yield stronger effects" and highlight that excessive trial counts can be counterproductive. We discuss the implications of these findings for experimental design and data analysis in cognitive research, emphasizing the importance of optimizing trial numbers for robust and efficient outcomes.
Join SIPS Executive Board members in discussing the accomplishments and challenges of SIPS and psychological science, with the aim of establishing goals for SIPS's future.
Keynotes: Alexandra Sarafoglou, Marton Kovacs, Jeffrey Lees, Lisa Spitzer, and Agata Bochynska.
Moderator: Ekaterina Pronizius
This year, we are departing from tradition at SIPS. Rather than inviting individual keynote speakers to open and close the conference, we will host two open roundtable discussions.
For the final day, we would like to host a second roundtable—this time highlighting the rising stars of open science. We would love to hear their personal stories and perspectives, and at one point, we plan to open the conversation to all SIPS attendees. We are confident that this discussion will provide fresh ideas and inspiration that will carry into the SIPS sessions.