2025-06-05 –, 2306/2309
Language: English
Introduction
As the implementation of automatic indexing for MeSH terms becomes more well-known, concerns are being raised in the field of health sciences librarianship on how these changes impact established searching practices. While other ongoing research investigates the accuracy and reliability of automatic indexing on a per-citation basis, this work analyzes overall trends and performance of the algorithm for chemistry and genetics research.
Methods
4302 citations published between November 2020 and March 2023 had their relevant information fields extracted via NLM’s efetch and xtract tools on July 25th, 2024. This data was combined with MeSH data downloaded from the Ontology Lookup Service on September 4th, 2024. All analyses were completed in R. To evaluate the potential impact of stemming on term overlap, Porter Stemming was applied using the Tokenizer and SnowballC packages.
Results
Results generally support the claim that automatic indexing decreases the time required for a citation to receive indexing. There is variability in how well search fields overlap for indexing methods, but overall topic harmonization increases as terms are tokenized and stemmed. Manually indexed citations tend to have a higher degree of field overlap, which aligns with the finding that the average number of MeSH terms is higher for manually indexed citations.
Discussion
This work builds on feedback from previous presentations and provides a more detailed and large-scale investigation into the impact of the automatic indexing algorithm and its impact on health librarianship.