Evaluating the performance of broad and narrow search strategies when using machine learning-based software for title/abstract screening
Introduction: A number of machine learning tools have been developed to enhance the efficiency of title/abstract screening in review projects. While guidance articles suggest these tools make broader, high-yield searches more feasible by saving time, the actual performance of sensitive (broad) versus precise (narrow) search strategies in this context remains under explored.
Methods: Using ASReview, an open-source systematic review tool, I evaluated search strategy performance in a sample of completed reviews. For each, one database search was selected and revised to broaden (n = 9) or narrow (n = 1) its scope. Search results were labeled as relevant or irrelevant based on each review’s included articles. These labeled sets were uploaded into ASReview’s simulation module. Performance was assessed using metrics such as time to achieve true recall at 0.95.
Results: To reach a true recall of ≥0.95, broader searches increased screening time by 10–875%, with a median increase of 25%. Systematic reviews had a smaller median increase (18%) compared to other review types (374%). At a rate of one record per minute, the added time to reach this level of recall ranged from 0.4 to 35.1 hours (median: 1.8 hours). In two cases, machine learning screening with broader searches took longer than the full manual screening of the narrower result set.
Discussion: These findings suggest efficiencies associated with machine learning tools do not always offset the extra screening time required by broader searches. In this case study, efficiency losses were most notable in non-systematic and non-quantitative reviews.