Goran S. Milovanovic
Cognitive and Data Scientist. An exact mind by training, a philosopher by birth. Data Scientist for Wikidata w. WMDE.
People leave online communities after some time. However, the likelihood that a particular user leaves the project is dependent on the time they have been on the project already: people who have only spend a brief time in the project are more likely to leave than people who are long-term members. This is similar to the so-called Lindy Effect: "... a theorized phenomenon by which the future life expectancy of some non-perishable things, like a technology or an idea, is proportional to their current age." (from Wikipedia).
The Lindy Effect in Wikidata user retention holds only if the observed age accounts follow a power-law (Pareto) probability distribution. We have tested this assumption on ~400K Wikidata accounts, obtaining individual, full revision histories and singling out active (>=5 edits) and inactive months. We have also developed a binary classifier machine learning model w. XGBoost to predict if a user will continue to contribute to Wikidata in the immediate future (next month) or not, with satisfying initial results. We share the datasets and the code repository with the community and briefly describe the data acquisition procedures.