AI as a field, and as a buzzword, has exploded in the past few years. But for many, AI seems like something conjured out of the ether: a computerized intelligence created with nothing but code. This is not the case. The explosion in AI has related directly to an explosion in the available data for training: these algorithms are fed massive datasets and learn to imitate the human behaviors that went into creating them. In this workshop, we'll explore a quick history of the datasets behind AI and the human labor that made them, discussing the advent of web scraping, ImageNet, Mechanical Turk, and more. After that, we'll discuss the ethical considerations of dataset creation, then go into a hands-on portion where we create and label our own image and text datasets for use, to give participants firsthand experience of the messy, human process of making them.
I've got a fair amount of experience with varying sessions, so if there end up being 30, I'd run it much more as a "follow along and follow up with me later" session rather than necessarily working through every process with each participant. If there are 3, all the better - that allows time and space for in-depth discussion and hands-on technical help.
What is the goal and/or outcome of your session?:Hopefully, participants will come away with a better understanding of what, exactly, AI is trying to do. Rather than seeing AI as some external power conjured by programmers, I hope for participants to realize it's designed to mimic human behavior, and that the data used to train it is created by people just like them.
I'd also like participants to come away with a greater awareness of the very human labor that goes into Big Data and AI, and of the often abusive labor practices that go into its creation, documented expertly by Caroline Sinders in her own Mozilla project: https://foundation.mozilla.org/en/blog/examining-human-labor-behind-ai/ . There's no better way of understanding how data becomes biased, how AI reflects human thought, good and bad, than sitting and experiencing the day-to-day experience of Mechanical Turk data labelers across the world.
We're hoping that many efforts and discussions will continue after Mozfest. Share any ideas you already have for how to continue the work from your session.:I'm already working with Code.org on their dataset creation and data ethics process for their new AI unit for K-12 students, and I'd love to bring any ideas or thoughts that emerge from the session into my work there. In addition, I'm working on an article for ITP's Adjacent Magazine (https://itp.nyu.edu/adjacent/) on the myths and truths of artificial intelligence that I'd love to bring workshop findings into.
Beyond that, I open source all the resources for workshops that I teach on GitHub, and I plan to share those with session participants and encourage them to bring them to their own communities, as well as being available in person and on slack to discuss data, AI, and ethics anytime folks want to.
Brent Bailey is a researcher, programmer, and artist based in Brooklyn, New York. His work focuses on the use of data and AI for education, resistance, and subversion.