Titel:	Data-Driven vs. Dictionary-Based Word n-Gram Feature Induction for Sentiment Analysis
Personen:	Remus, Robert/Rill, Sven
Jahr:	2013
Typ:	Aufsatz
Verlag:	Springer
Ortsangabe:	Heidelberg/Berlin
In:	Gurevych, Iryna/Biemann, Chris/Zesch, Torsten: Language Processing and Knowledge in the Web. Proceedings of the 25th International Conference, GSCL 2013, Darmstadt, Germany, 25 - 27 September 2013
Seiten:	176-183
Untersuchte Sprachen:	DeutschGerman - EnglischEnglish
Schlagwörter:	Benutzungsforschungusage research Datenmodellierungdata modelling Fachlexikografie*specialised lexicography/LSP lexicography
Abstract:	We address the question which word n-gram feature induction approach yields the most accurate discriminative model for machine learning-based sentiment analysis within a specific domain: a purely data-driven word n-gram feature induction or a word n-gram feature induction based on a domain-specific or domain-non-specific polarity dictionary. We evaluate both approaches in document-level polarity classification experiments in 2 languages, English and German, for 4 analog domains each: user-written product reviews on books, DVDs, electronics and music. We conclude that while dictionary-based feature induction leads to large dimensionality reductions, purely data-driven feature induction yields more accurate discriminative models.