An Exploration of LLM-Guided Detection of Discursive Patterns in Dutch Social Media
An Exploration of LLM-Guided Detection of Discursive Patterns in Dutch Social Media
Samenvatting
This paper presents a generative large language model (LLM)- guided approach to detect discursive patterns in
Dutch social media. Newsrooms of municipalities and public organizations follow public debate on social media
to be aware of and prepare for local and global issues and rumours. The onset of these issues and rumours are
now detected by communication specialists in newsrooms. Using discourse analysis, we can ground their findings
in theory. Devices from discursive psychology such as emotional evaluations are the lowest level components
that can help spot and understand issues[ 1]. Thus, a rule-based NLP approach was developed to highlight these
devices in a learning environment1. As a next step, we compare the rule-based approach to a large language
modeling approach in order to assess the risks and benefits of both methods. We analyze the detection of two
discursive patterns in Dutch tweets: magnifying (exaggerated) language use and assigning negative labels to
persons or organizations. We compare the generated responses from two Dutch conversational LLMs finetuned
on the Dutch language - Geitje-7B-Ultra and Fietje-2-Chat - and a rule-based NLP baseline using a two-fold
evaluation process. The results show mixed performance, with the highest performing LLM setups yielding an
accuracy of 64% for the maximizing language category and 73% for the negative labels to organizations/persons
category. In comparison, the rule-based algorithm achieves an accuracy of 68% for both categories. Although the
LLMs perform well in precision, they frequently find patterns in examples where no discursive markers were
annotated. Moreover, the rationale analysis shows relatively poor results, pertaining to multiple factors including
model size and interpretation of instructions. The results indicate that although there is merit in conducting
discursive analysis using generative language models, it comes with the above risks. Recommendations for future
work include combining the usage of language models with the rule-based setup for more robust detection as
well as further indication of guidelines to improve upon the reasoning process.
Organisatie | HAN University of Applied Sciences |
Afdeling | Academie IT en Mediadesign |
Lectoraten | |
Lectoraat | Data & Knowledge Engineering |
Datum | 2024-11-26 |
Type | Conferentiebijdrage |
Taal | Onbekend |