Titel: Data-Driven vs. Dictionary-Based Word n-Gram Feature Induction for Sentiment Analysis
Personen:Remus, Robert/Rill, Sven
Jahr: 2013
Typ: Aufsatz
Verlag: Springer
Ortsangabe: Heidelberg/Berlin
In: Gurevych, Iryna/Biemann, Chris/Zesch, Torsten: Language Processing and Knowledge in the Web. Proceedings of the 25th International Conference, GSCL 2013, Darmstadt, Germany, 25 - 27 September 2013
Seiten: 176-183
Untersuchte Sprachen: Deutsch*German - Englisch*English
Schlagwörter: Benutzungsforschung*usage research
Datenmodellierung*data modelling
Fachlexikografie*specialised lexicography/LSP lexicography
Abstract: We address the question which word n-gram feature induction approach yields the most accurate discriminative model for machine learning-based sentiment analysis within a specific domain: a purely data-driven word n-gram feature induction or a word n-gram feature induction based on a domain-specific or domain-non-specific polarity dictionary. We evaluate both approaches in document-level polarity classification experiments in 2 languages, English and German, for 4 analog domains each: user-written product reviews on books, DVDs, electronics and music. We conclude that while dictionary-based feature induction leads to large dimensionality reductions, purely data-driven feature induction yields more accurate discriminative models.