10-31, 16:30–16:55 (UTC), Room 2
People leave online communities after some time. However, the likelihood that a particular user leaves the project is dependent on the time they have been on the project already: people who have only spend a brief time in the project are more likely to leave than people who are long-term members. This is similar to the so-called Lindy Effect: "... a theorized phenomenon by which the future life expectancy of some non-perishable things, like a technology or an idea, is proportional to their current age." (from Wikipedia).
The Lindy Effect in Wikidata user retention holds only if the observed age accounts follow a power-law (Pareto) probability distribution. We have tested this assumption on ~400K Wikidata accounts, obtaining individual, full revision histories and singling out active (>=5 edits) and inactive months. We have also developed a binary classifier machine learning model w. XGBoost to predict if a user will continue to contribute to Wikidata in the immediate future (next month) or not, with satisfying initial results. We share the datasets and the code repository with the community and briefly describe the data acquisition procedures.
The participants will learn about a scientifically solidly grounded approach to the problem of user retention in Wikidata.Language –
YesOther links –
GitHub repo: https://github.com/wikimedia/analytics-wmde-WD-WikidataAdHocAnalytics/tree/master/WD_UserRetentionLink to slides –
https://github.com/wikimedia/analytics-wmde-WD-WikidataAdHocAnalytics/tree/master/WD_UserRetention/_presentationLink to notes –
Cognitive and Data Scientist. An exact mind by training, a philosopher by birth. Data Scientist for Wikidata w. WMDE.
Anthropology and mixed-methods Research