SIPS 2026 Online

oHA3: One Column Name to Rule Them All: Can we agree on how to Label Participant Identifier Columns in our Datasets?
2026-05-07 , Track 1

A major challenge with reusing shared datasets is the naming conventions used for columns in those datasets. Reuse, as part of FAIR principles, is made more difficult by the paucity of metadata provided alongside datasets, often leaving researchers to guess what data is contained in different columns. Recent work has found substantial variability even for commonly-used column identifiers. Here, we will work collaboratively, cataloguing datasets from the published literature in an effort to find a simple starting point to solve this issue. Following the Psych-DS project, which has developed a standard for organising data files, together we will work to develop recommendations for what researchers should use to refer to the most common column across all Psychology datasets: namely, the participant identifier column. From this starting point we can then, beyond the session, work together to create standards and recommendations for column naming conventions across the discipline.


Landing page: Landing page Please classify your session as the theme it fits best in:: Pedagogy/Curriculum/Mentoring - Content related to educating students What is your end product?:

We'll have a collection of datasets that can be used to create a standard for recommendations of column naming in psychology. At the very least, we'll have recommendations for what to name the participant identifier column, but hopefully. may have more than that at the end of the session. We'd like to publish the recommendations and the data examined to generate those recommendations in order to document and provide an evidence base for everything.

How will the session's content foster diversity & inclusion (e.g., who will present, who will it serve), and how will it improve psychological science?:

We'll be bringing in a range of speakers at different career stages to help support the session, and the session should be of value to researchers across the discipline. It should aid psychological science be increasing the reusability of our datasets when they are shared.

Please note any pre-requisite knowledge/expertise you will expect from attendees (i.e., is the session most appropriate for someone who already has experience with a topic?).:

No prior experience is needed. There will be a range of different activities we'll be working on so there will be something for everyone.

I am currently a Professor of Cognitive Analytics at Harrisburg University of Science and Technology - a STEM school in Pennsylvania. I teach computational linguistics courses in our Analytics and Data Science programs, such as Natural Language Processing, Sentiment Analysis, and Human Language. I also teach a bunch of statistics courses and you can learn more about me at: https://www.aggieerin.com.