De grootste kennisbank van het HBO

Inspiratie op jouw vakgebied

Vrij toegankelijk

Terug naar zoekresultatenDeel deze publicatie

Conceptualization of an authorship attribution pipeline for blog articles

Rechten: Alle rechten voorbehouden

Conceptualization of an authorship attribution pipeline for blog articles

Rechten: Alle rechten voorbehouden

Samenvatting

The question which author has written a certain text has been around since people have started to put their ideas and thoughts into writing. Recent advancements in statistics and the capabilities of powerful machine learning algorithms have created a research field which is known as authorship attribution. This field is of great significance for many applications in humanities, journalism and law. For instance, authorship attribution can be used to detect fraudulent product reviews on popular online platforms.
The company codecentric AG, as an innovator in agile software development, is constantly interested in state-of-the-art technologies and best practices. This bachelor thesis project is concerned with the conceptualization of an automated authorship attribution pipeline that contributes to the product portfolio of codecentric AG.
This project systematically analyzes the main components of machine learning based authorship attribution by conducting a literature review. Furthermore, comparison criteria are defined which are used to assess the ability of machine learning models to detect the author of a text. Two attribution approaches and a set of different stylistic markers are empirically compared in experiments using a real-world blog article dataset.
Finally, the insights of the literature research and the experiments are integrated into a reusable authorship attribution library. This project specifies the requirements of the library from a functional and non-functional perspective. Several design issues to make this library reusable and extensible are discussed and solved by using popular software engineering design patterns. A prototype of this library is implemented, that unifies modern natural language processing technologies in the Python ecosystem.

Toon meer
OrganisatieFontys Hogescholen
OpleidingSoftware Engineering en Business Informatics
Partnerscodecentric AG, Solingen
Datum2017-06-12
TypeBachelor
TaalEngels

Op de HBO Kennisbank vind je publicaties van 26 hogescholen

De grootste kennisbank van het HBO

Inspiratie op jouw vakgebied

Vrij toegankelijk