• Articles • Previous Articles     Next Articles

Constructing Maximum Entropy Language Models for Movie Review Subjectivity Analysis

Bo Chen, Hui He, and Jun Guo   

  1. Pattern Recognition and Intelligent System Laboratory, School of Information Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Revised:2007-09-23 Online:2008-03-15 Published:2008-03-10

Document subjectivity analysis has become an important aspect of web text content mining. This problem is similar to traditional text categorization, thus many related classification techniques can be adapted here. However, there is one significant difference that more language or semantic information is required for better estimating the subjectivity of a document. Therefore, in this paper, our focuses are mainly on two aspects. One is how to extract useful and meaningful language features, and the other is how to construct appropriate language models efficiently for this special task. For the first issue, we conduct a Global-Filtering and Local-Weighting strategy to select and evaluate language features in a series of n-grams with different orders and within various distance-windows. For the second issue, we adopt Maximum Entropy (MaxEnt) modeling methods to construct our language model framework. Besides the classical MaxEnt models, we have also constructed two kinds of improved models with Gaussian and exponential priors respectively. Detailed experiments given in this paper show that with well selected and weighted language features, MaxEnt models with exponential priors are significantly more suitable for the text subjectivity analysis task.

Key words: protocols; self fault-tolerance; formal method; multimedia communications; protocol engineering; S-T protocol; semantics;

[1] Das S R, Chen M Y. Yahoo! for Amazon: Sentiment extraction from small talk on the web. Working paper, Santa Clara University, Available at http://scumis.scu.edu/srdas/chat.pdf.

[2] Chesley P, Vincent B, Xu L, Srihari R. Using verbs and adjectives to automatically classify blog sentiment. In -\it Proc. Computational Approaches to Analyzing Weblogs}: Papers from the 2006 Spring Symposium, Nicolov N, Salvetti F, Liberman M, Maartin J H (eds.), AAAI Press, Menlo Park, CA, Technical Report SS-06-03, 2006, pp.27--29.

[3] Gamon M. Sentiment classification on customer feedback data: Noisy data, large feature vectors, and the role of language analysis. In -\it Proc. 20th Int. Conf. Computational Languages}, Geneva, CH, 2004, pp.841--847.

[4] Kennedy A, Inkpen D. Sentiment classification of movie and product reviews using contextual valence shifters. -\it Computational Intelligence}, 2006, 22(2): 110--125.

[5] Berger A L, Della Pietra S A, Della Pietra V J. A maximum entropy approach to natural language processing. -\it Computational Languages}, 1996, 22(1): 39--71.

[6] Rosenfeld R. A maximum entropy approach to adaptive statistical language modeling. -\it Computer, Speech and Language}, 1996, 10: 187--228.

[7] Sebastiani F. Machine learning in automated text categorization: A survey. -\it Tech. Rep. IEI-B4-31-1999}, Istituto di Elaborazione dell'Informazione, Consiglio Nazionale delle Ricerche, Pisa, IT, 1999.

[8] Yang Y. An evaluation of statistical approaches to text categorization. -\it Journal of Information Retrieval}, 1999, 1: 69--90.

[9] Pang B, Lee L, Vaithyanathan S. Thumbs up Sentiment classification using machine learning techniques. In -\it Proc. Conf. Empirical Methods in Natural Language Processing}, Philadelphia, US, 2002, pp.79--86.

[10] Pang B, Lee L. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In -\it Proc. 42nd Meeting of the Association for Computational Languages}, Barcelona, ES, 2004, pp.271--278.

[11] Chen B, He H, Guo J. Language feature mining for document subjectivity analysis. In -\it Proc. 1st Int. Symp. Data}, -\it Privacy, $\&$ E-Commerce}, Chengdu, China, November 1--3, 2007, pp.62--67.

[12] Huang X D, Alleva F, Hon H W, Hwang M Y, Lee K F, Rosenfeld R. The SPHINX-II speech recognition system: An overview. -\it Computer, Speech and Language}, 1993, 2: 137--148.

[13] Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. -\it Information Processing and Management}, 1988, 24(5): 513--523.

[14] Della Pietra S A, Della Pietra V J, Lafferty J. Inducing features of random fields. -\it IEEE Transactions on Pattern Analysis and Machine Intelligence}, 1997, 19(4): 380--393.

[15] Bahl L, Jelinek F, Mercer R. A maximum likelihood approach to continuous speech recognition. -\it IEEE Transactions on Pattern Analysis and Machine Intelligence}, 1983, 5(2): 179--190.

[16] Chen S F, Goodman J. An empirical study of smoothing techniques for language modeling. -Tech. Rep. TR-10-98}, Harvard University, 1998.

[17] Berger A. Convexity, maximum likelihood and all that, 1996. http://www.cs.cmu.edu/afs/cs/user/aberger/www/ps/co\-n\-ve\-x.ps.

[18] Chen S F, Rosenfeld R. A Gaussian prior for smoothing maximum entropy models. -\it Tech. Rep. CMUCS-99-108}, Carnegie Mellon University, 1999.

[19] Kazama J, Tsujii J. Evaluation and extension of maximum entropy models with inequality constraints. In -\it Proc. EMNLP 2003}, 2003, pp.137--144.

[20] Goodman J. Exponential priors for maximum entropy models. -\it Microsoft Research Tech. Rep.}, 2003.

[21] Cormack G. TREC 2006 spam track overview. In -\it Proc. TREC 2006}, Gaithersburg, MD, 2006.
[1] Inès Mouakher, Fatma Dhaou, and J. Christian Attiogbé. Event-Based Semantics of UML 2.X Concurrent Sequence Diagrams for Formal Verification [J]. Journal of Computer Science and Technology, 2022, 37(1): 4-28.
[2] Li-Li Xiao, Hui-Biao Zhu, Qi-Wen Xu. Trace Semantics and Algebraic Laws for Total Store Order Memory Model [J]. Journal of Computer Science and Technology, 2021, 36(6): 1269-1290.
[3] Rim Mahouachi. Search-Based Cost-Effective Software Remodularization [J]. Journal of Computer Science and Technology, 2018, 33(6): 1320-1336.
[4] Pei-Feng Li, Guo-Dong Zhou. Three-Layer Joint Modeling of Chinese Trigger Extraction with Constraints on Trigger and Argument Semantics [J]. , 2017, 32(5): 1044-1056.
[5] Yang Liu, Xuan-Dong Li, Yan Ma. A Game-Based Approach for PCTL* Stochastic Model Checking with Evidence [J]. , 2016, 31(1): 198-216.
[6] Jia-Jun Zhang, Fei-Fei Zhai and Cheng-Qing Zong. A Substitution-Translation-Restoration Framework for Handling Unknown Words in Statistical Machine Translation [J]. , 2013, 28(5): 907-918.
[7] Mei-Xia Qu, Jun-Feng Luan, Da-Ming Zhu, and Meng Du. On the Toggling-Branching Recurrence of Computability Logic [J]. , 2013, 28(2): 278-284.
[8] Yu Zhou, Luciano Baresi, and Matteo Rossi. Towards a Formal Semantics for UML/MARTE State Machines Based on Hierarchical Timed Automata [J]. , 2013, 28(1): 188-202.
[9] Yu Zhang (张宇), Member, CCF, ACM, and Tong Yu (于彤). Mining Trust Relationships from Online Social Networks [J]. , 2012, 27(3): 492-505.
[10] Xu-Tao Du (杜旭涛), Chun-Xiao Xing (邢春晓), Member, CCF, IEEE and Li-Zhu Zhou (周立柱), Member, ACM. Modeling and Verifying Concurrent Programs with Finite Chu Spaces [J]. , 2010, 25(6): 1168-1183.
[11] Osman Hasan and Sofiéne Tahar, Senior Member, IEEE, Member, ACM . Formally Analyzing Expected Time Complexity of Algorithms Using Theorem Proving [J]. , 2010, 25(6): 1305-1320.
[12] Jing Zhou, Member, ACM, Wendy Hall, Member, ACM, and David De Roure, Member, ACM. Building a Distributed Infrastructure for Scalable Triple Stores [J]. , 2009, 24(3): 447-462.
[13] Chao Cai, Zong-Yan Qiu, Senior Member, CCF, Member, IEEE, Hong-Li Yang, and Xiang-Peng Zhao. Global-to-Local Approach to Rigorously Developing Distributed System with Exception Handling [J]. , 2009, 24(2): 238-249.
[14] Xi-Shun Zhao and Yu-Ping Shen. Comparison of Semantics of Disjunctive Logic Programs Based on Model-Equivalent Reduction [J]. , 2007, 22(4): 562-568 .
[15] Zhi-Hong Tao, Hans Kleine Büning, and Li-Fu Wang. Direct Model Checking Matrix Algorithm [J]. , 2006, 21(6): 944-949 .
Full text



No Suggested Reading articles found!

ISSN 1000-9000(Print)

CN 11-2296/TP

Editorial Board
Author Guidelines
Journal of Computer Science and Technology
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: jcst@ict.ac.cn
  Copyright ©2015 JCST, All Rights Reserved