Automating Causal Claim Extraction and DAG Construction from Social Science Scholarly Papers Using an NLP Pipeline
Causal claims are key to the evolution and consolidation of theoretical frameworks in social science. However, manual identification of causal claims, extraction of cause-effect pairs, and synthesis of them into direct acyclic graphs (DAGs) remains an important challenge. The volume of literature, cognitive biases in interpretation, lack of transparency, and reproducibility make manual synthesis ineffective and unreliable. To overcome these challenges, moving to a new paradigm using natural language processing techniques (NLPs) is imperative.
In this talk, I will present our NLP pipeline that automates extracting causal claims from social science papers, identifies cause-effect pairs with polarity (positive, negative, neutral), and constructs DAGs representing these relationships. This pipeline uses large language models (LLMs) and other NLP techniques to improve the model precision in DAG construction. I will explain how this system works and highlight its potential application to theoretical research and evidence synthesis.