Titel: |
Data-Driven vs. Dictionary-Based Word n-Gram Feature Induction for Sentiment Analysis |
Personen: | Remus, Robert/Rill, Sven |
Jahr: |
2013 |
Typ: |
Aufsatz |
Verlag: |
Springer |
Ortsangabe: |
Heidelberg/Berlin |
In: |
Gurevych, Iryna/Biemann, Chris/Zesch, Torsten: Language Processing and Knowledge in the Web. Proceedings of the 25th International Conference, GSCL 2013, Darmstadt, Germany, 25 - 27 September 2013 |
Seiten: |
176-183 |
Untersuchte Sprachen: |
Deutsch*German - Englisch*English |
Schlagwörter: |
Benutzungsforschung*usage research
Datenmodellierung*data modelling
Fachlexikografie*specialised lexicography/LSP lexicography
|
Abstract: |
We address the question which word n-gram feature induction approach yields the most accurate discriminative model for machine learning-based sentiment analysis within a specific domain: a purely data-driven word n-gram feature induction or a word n-gram feature induction based on a domain-specific or domain-non-specific polarity dictionary. We evaluate both approaches in document-level polarity classification experiments in 2 languages, English and German, for 4 analog domains each: user-written product reviews on books, DVDs, electronics and music. We conclude that while dictionary-based feature induction leads to large dimensionality reductions, purely data-driven feature induction yields more accurate discriminative models. |