In this chapter, we have presented a first study on using style markers extracted using sequential data mining techniques for authorship attribution. We have considered extracting linguistically motivated markers using a sequential rule mining technique based on function word and Part-Of-Speech tags. To evaluate the effectiveness of these markers, we conducted experiments on a classic French corpus. Our preliminary results show that sequential rules can achieve a high attribution performance that can reach an F1 score of 93%. Yet, they still do not outperform low-level features, such as frequencies of function words.

Based on the current study, we have identified several future research directions. First, we will explore the effectiveness of using probabilistic heuristics to find a minimal feature set that still allows good attribution performance, which would be very helpful for stylistic and literary analysis. Second, this study will be expanded to include sequential patterns (n-gram with gaps) as style markers. Third, we intend to experiment with this new type of style markers for other languages and text sizes using standard corpora employed in the wider field.

< Prev   CONTENTS   Source   Next >