Title
CLiPS Stylometry Investigation (CSI) corpus : a Dutch corpus for the detection of age, gender, personality, sentiment and deception in text CLiPS Stylometry Investigation (CSI) corpus : a Dutch corpus for the detection of age, gender, personality, sentiment and deception in text
Author
Faculty/Department
Faculty of Arts. Linguistics and Literature
Publication type
conferenceObject
Publication
Paris :European language resources assoc-elra ,
Subject
Linguistics
Source (journal)
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION
Source (book)
9th International Conference on Language Resources and Evaluation (LREC), MAY 26-31, 2014, Reykjavik, ICELAND
Volume/pages
(2014) , p. 3081-3085
ISBN
978-2-9517408-8-4
ISI
000355611004115
Carrier
E
Target language
English (eng)
Affiliation
University of Antwerp
Abstract
We present the CLiPS Stylometry Investigation (CSI) corpus, a new Dutch corpus containing reviews and essays written by university students. It is designed to serve multiple purposes: detection of age, gender, authorship, personality, sentiment, deception, topic and genre. Another major advantage is its planned yearly expansion with each year's new students. The corpus currently contains about 305,000 tokens spread over 749 documents. The average review length is 128 tokens; the average essay length is 1126 tokens. The corpus will be made available on the CLiPS website (www.clips.uantwerpen.be/datasets) and can freely be used for academic research purposes. An initial deception detection experiment was performed on this data. Deception detection is the task of automatically classifying a text as being either truthful or deceptive, in our case by examining the writing style of the author. This task has never been investigated for Dutch before. We performed a supervised machine learning experiment using the SVM algorithm in a 10-fold cross-validation setup. The only features were the token unigrams present in the training data. Using this simple method, we reached a state-of-the-art F-score of 72.2%.
E-info
http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000355611004115&DestLinkType=RelatedRecords&DestApp=ALL_WOS&UsrCustomerID=ef845e08c439e550330acc77c7d2d848
http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000355611004115&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=ef845e08c439e550330acc77c7d2d848
Full text (open access)
https://repository.uantwerpen.be/docman/irua/bb1be2/127791.pdf
Handle