2023-04-19 –, B07-B08
As we are in an era of big data where large groups of information are assimilated and analyzed, for insights into human behavior, data privacy has become a hot topic. Since there is a lot of private information which once leaked can be misused, all data cannot be released for research. This talk aims to discuss Differential Privacy, a cutting-edge technique of cybersecurity that claims to preserve an individual’s privacy, how it is employed to minimize the risks with private data, its applications in various domains, and how Python eases the task of employing it in our models with PyDP.
Since there is a lot of private information which once leaked can be misused, how should privacy be protected? One might think that simply making personally identifiable fields in the dataset anonymous might be useful, but this can lead to the entire dataset becoming useless and not fit for analysis. And research has proven that by statistically studying both the datasets, private information can easily be re-extracted!
The session will start with a brief on the current standards of privacy, and the possible risks of handling customer data. This will lay the foundation for introducing Differential Privacy, a cutting-edge technique of cybersecurity that claims to preserve an individual’s privacy, by manipulating data in such a way as to not render it useless for data analysis. Developers will gain an insight into the concept of Differential Privacy, how it is employed to minimize the risks associated with private data, its practical applications in various domains, and how Python eases the task of employing it in our models with PyDP. As the talk progresses, a walkthrough of a real-life practical example, along with a nifty visualization will acquaint the audience with PyDP, and how differential private results come out to be in approximation to what unfiltered data would have provided.
Novice
Abstract as a tweet:What if I tell you I could answer everything about you without knowing you using Differential Privacy
Expected audience expertise: Domain:Novice
Vikram is a Computer Science master's student at Columbia University with a focus on Machine Learning. He completed his bachelor's in Computer Science and Mathematics in India from BITS Pilani. Before Columbia, he was a part of the engineering and strategy team at Goldman Sachs, where he built scalable and efficient trading tools. He has also had research experience working at TU Leibniz, Germany in the area of Reinforcement Learning and Parallel Programming. He presented his research at the International Conference on Mining and Learning on Graphs in 2020 in Vienna, Austria. He was a teaching assistant for three courses during his academic career, which involved conducting seminars (NumPy, Pytorch, etc.), organizing technical meetings and organizing research fairs. He has also tutored for the website 'Chegg' for more than two years where he taught Math and Programming to high school and university students.
Sarthika Dhawan is a Software Engineer at Microsoft, where she has worked with a variety of technologies and teams. She is actively involved in the software development and research community, and has authored and presented a conference paper at IJCAI 2019. She is an ACM-W and AICTE-INAE scholarship recipient and has attended various conferences like GHCI and IJCAI. She has given technical talks, provided mentorships and volunteered as a tutor at NGOs to educate economically less fortunate kids in various disciplines. She has participated in multiple hackathons as she believes that’s an amazing way to keep yourself involved and updated.