2022-05-27 –, PyData Room
Almost every time in a data analysis, you will inevitable find the presence of unexpected or weird values in your data. The majority of statistical and machine learning algorithms will fail to converge or generalize with dirty data, therefore It is critical for the analyst to know how to identify and remove outliers in the data.
In this talk, I will show you the most common techniques to eliminate outliers in the data using Python, and will give you useful tips on how to spot them.
- What are outliers
- How outliers can affect your analysis
- Difference between noise and outliers
- Visualizations tools
- Statistical methods
- Automatic methods
- Removing outliers
- Conclusions and Recommendations
PyData, data science
Sara is Business Intelligence analyst at NHS and a machine learning enthusiast. She is active in the Python community and efforts in empowering women in tech by leading and organizing events focused on increasing the visibility of women in stem careers.