WikidataCon 2021

The Lindy Effect in Wikidata User Retention
31/10/2021 , Room 2

People leave online communities after some time. However, the likelihood that a particular user leaves the project is dependent on the time they have been on the project already: people who have only spend a brief time in the project are more likely to leave than people who are long-term members. This is similar to the so-called Lindy Effect: "... a theorized phenomenon by which the future life expectancy of some non-perishable things, like a technology or an idea, is proportional to their current age." (from Wikipedia).

The Lindy Effect in Wikidata user retention holds only if the observed age accounts follow a power-law (Pareto) probability distribution. We have tested this assumption on ~400K Wikidata accounts, obtaining individual, full revision histories and singling out active (>=5 edits) and inactive months. We have also developed a binary classifier machine learning model w. XGBoost to predict if a user will continue to contribute to Wikidata in the immediate future (next month) or not, with satisfying initial results. We share the datasets and the code repository with the community and briefly describe the data acquisition procedures.


Quais conhecimentos os participantes irão obter nesta sessão?:

The participants will learn about a scientifically solidly grounded approach to the problem of user retention in Wikidata.

Idioma:

English

Link dos slides:

https://github.com/wikimedia/analytics-wmde-WD-WikidataAdHocAnalytics/tree/master/WD_UserRetention/_presentation

Link to notes:

https://etherpad.wikimedia.org/p/WikidataCon2021-TheLindyEffectinWikidataUserRe

Outros links:

GitHub repo: https://github.com/wikimedia/analytics-wmde-WD-WikidataAdHocAnalytics/tree/master/WD_UserRetention

Gravação:

Yes

Veja também: The Lindy Effect in Wikidata User Retention (Slide Deck, PDF) (843,1 KB)

Cognitive and Data Scientist. An exact mind by training, a philosopher by birth. Data Scientist for Wikidata w. WMDE.

Anthropology and mixed-methods Research