Most algorithms to detect hate speech are built within closed doors within social media platforms, with no clarity on definitions, rules, edge cases, and evidence for if the AI built is working well enough.
In 2020, Factmata built a suite of algorithms to detect hate speech, sexism, racism, toxicity, obscenity, propaganda and threatening language. We then built a dashboard to do a daily scan of Twitter and see if we could find anything that was not being removed on time.
In this talk we will go through what we found, get feedback from workshop participants around where the algorithm goes well and doesn't go well. We will spend some time discussing how we can build unifying, transparent definitions of harmful speech in a scalable manner, and involve third party startups, social groups and non-profits into the debate, not just big tech.
Further discussion:
- How to build unified definitions of hate speech and propaganda in the open
- Risks of openness in this space
- What platforms can do to be more open about their definitions
- How startups, non-profits and third parties should be involved, not just big tech, in hate speech/propaganda
- How regulation should work in this space from an openness/transparency space without "feeding the trolls" or allowing gaming of the system
We will only start the session if 5+ participants attend.
What is the goal and/or outcome of your session?:Our goal will be examine where machine learning products like Bleepr, which flag hate speech and racism on Twitter and other platforms in the future, succeed and fail. We will deep dive into a collective experience on how we can improve the project in the future.
Founder of Factmata, detecting fake news and misinformation online.