Processing math: 100%
We use cookies to improve your experience with our site.

Indexed in:

SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.

Submission System
(Author / Reviewer / Editor)
Bo Chen, Hui He, Jun Guo. Constructing Maximum Entropy Language Models for Movie Review Subjectivity Analysis[J]. Journal of Computer Science and Technology, 2008, 23(2): 231-239.
Citation: Bo Chen, Hui He, Jun Guo. Constructing Maximum Entropy Language Models for Movie Review Subjectivity Analysis[J]. Journal of Computer Science and Technology, 2008, 23(2): 231-239.

Constructing Maximum Entropy Language Models for Movie Review Subjectivity Analysis

More Information
  • Revised Date: September 22, 2007
  • Published Date: March 14, 2008
  • Document subjectivity analysis has become an important aspect of webtext content mining. This problem is similar to traditional textcategorization, thus many related classification techniques can beadapted here. However, there is one significant difference that morelanguage or semantic information is required for better estimating thesubjectivity of a document. Therefore, in this paper, our focuses aremainly on two aspects. One is how to extract useful and meaningfullanguage features, and the other is how to construct appropriatelanguage models efficiently for this special task. For the first issue,we conduct a Global-Filtering and Local-Weighting strategy to select andevaluate language features in a series of n-grams with different ordersand within various distance-windows. For the second issue, we adoptMaximum Entropy (MaxEnt) modeling methods to construct our languagemodel framework. Besides the classical MaxEnt models, we have alsoconstructed two kinds of improved models with Gaussian and exponentialpriors respectively. Detailed experiments given in this paper show thatwith well selected and weighted language features, MaxEnt models withexponential priors are significantly more suitable for the textsubjectivity analysis task.
  • [1]
    Das S R, Chen M Y. Yahoo! for Amazon: Sentiment extraction from small talk on the web. Working paper, Santa Clara University, Available at http://scumis.scu.edu/srdas/chat.pdf.
    [2]
    Chesley P, Vincent B, Xu L, Srihari R. Using verbs and adjectives to automatically classify blog sentiment. In -\it Proc. Computational Approaches to Analyzing Weblogs}: Papers from the 2006 Spring Symposium, Nicolov N, Salvetti F, Liberman M, Maartin J H (eds.), AAAI Press, Menlo Park, CA, Technical Report SS-06-03, 2006, pp.27--29.
    [3]
    Gamon M. Sentiment classification on customer feedback data: Noisy data, large feature vectors, and the role of language analysis. In -\it Proc. 20th Int. Conf. Computational Languages}, Geneva, CH, 2004, pp.841--847.
    [4]
    Kennedy A, Inkpen D. Sentiment classification of movie and product reviews using contextual valence shifters. -\it Computational Intelligence}, 2006, 22(2): 110--125.
    [5]
    Berger A L, Della Pietra S A, Della Pietra V J. A maximum entropy approach to natural language processing. -\it Computational Languages}, 1996, 22(1): 39--71.
    [6]
    Rosenfeld R. A maximum entropy approach to adaptive statistical language modeling. -\it Computer, Speech and Language}, 1996, 10: 187--228.
    [7]
    Sebastiani F. Machine learning in automated text categorization: A survey. -\it Tech. Rep. IEI-B4-31-1999}, Istituto di Elaborazione dell'Informazione, Consiglio Nazionale delle Ricerche, Pisa, IT, 1999.
    [8]
    Yang Y. An evaluation of statistical approaches to text categorization. -\it Journal of Information Retrieval}, 1999, 1: 69--90.
    [9]
    Pang B, Lee L, Vaithyanathan S. Thumbs up Sentiment classification using machine learning techniques. In -\it Proc. Conf. Empirical Methods in Natural Language Processing}, Philadelphia, US, 2002, pp.79--86.
    [10]
    Pang B, Lee L. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In -\it Proc. 42nd Meeting of the Association for Computational Languages}, Barcelona, ES, 2004, pp.271--278.
    [11]
    Chen B, He H, Guo J. Language feature mining for document subjectivity analysis. In -\it Proc. 1st Int. Symp. Data}, -\it Privacy, & E-Commerce}, Chengdu, China, November 1--3, 2007, pp.62--67.
    [12]
    Huang X D, Alleva F, Hon H W, Hwang M Y, Lee K F, Rosenfeld R. The SPHINX-II speech recognition system: An overview. -\it Computer, Speech and Language}, 1993, 2: 137--148.
    [13]
    Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. -\it Information Processing and Management}, 1988, 24(5): 513--523.
    [14]
    Della Pietra S A, Della Pietra V J, Lafferty J. Inducing features of random fields. -\it IEEE Transactions on Pattern Analysis and Machine Intelligence}, 1997, 19(4): 380--393.
    [15]
    Bahl L, Jelinek F, Mercer R. A maximum likelihood approach to continuous speech recognition. -\it IEEE Transactions on Pattern Analysis and Machine Intelligence}, 1983, 5(2): 179--190.
    [16]
    Chen S F, Goodman J. An empirical study of smoothing techniques for language modeling. -Tech. Rep. TR-10-98}, Harvard University, 1998.
    [17]
    Berger A. Convexity, maximum likelihood and all that, 1996. http://www.cs.cmu.edu/afs/cs/user/aberger/www/ps/co\-n\-ve\-x.ps.
    [18]
    Chen S F, Rosenfeld R. A Gaussian prior for smoothing maximum entropy models. -\it Tech. Rep. CMUCS-99-108}, Carnegie Mellon University, 1999.
    [19]
    Kazama J, Tsujii J. Evaluation and extension of maximum entropy models with inequality constraints. In -\it Proc. EMNLP 2003}, 2003, pp.137--144.
    [20]
    Goodman J. Exponential priors for maximum entropy models. -\it Microsoft Research Tech. Rep.}, 2003.
    [21]
    Cormack G. TREC 2006 spam track overview. In -\it Proc. TREC 2006}, Gaithersburg, MD, 2006.
  • Related Articles

    [1]FANG GaoLin, GAO Wen, WANG ZhaoQi. Incorporating Linguistic Structure into MaximumEntropy Language Models[J]. Journal of Computer Science and Technology, 2003, 18(1).
    [2]XU Zhiming, WANG Xiaolong. A New Linguistic Decoding Method for Online Handwritten Chinese Character Recognition[J]. Journal of Computer Science and Technology, 2000, 15(6): 597-604.
    [3]XU Zhiming, WANG Xiaolong. A New Linguistic Decoding Method for Online Handwritten Chinese Character Recognition[J]. Journal of Computer Science and Technology, 2000, 15(6).
    [4]LIAO Husheng. An Action Analysis for Combining Partial Evaluation[J]. Journal of Computer Science and Technology, 2000, 15(2): 196-201.
    [5]LIN Dongdai, LIU Zhuojun. Object-oriented Analysis of ELIMINO[J]. Journal of Computer Science and Technology, 1999, 14(5): 487-494.
    [6]WU Guoqing, LIU Xiang, YING Shi, Tetsuo Tamai. Automated Analysis of the SCR-StyleRequirements Specifications[J]. Journal of Computer Science and Technology, 1999, 14(4): 401-407.
    [7]Wang Haiqin, Dai Ruwei. Document Analysis by Crosscount Approach[J]. Journal of Computer Science and Technology, 1998, 13(1): 32-40.
    [8]Farid Mheir-ELSaadi, Bozena Kaminska. An Automatic Hierarchical Delay Analysis Tool[J]. Journal of Computer Science and Technology, 1994, 9(4): 349-364.
    [9]Feng Yulin. Hierarchical Protocol Analysis by Temporal Logic[J]. Journal of Computer Science and Technology, 1988, 3(1): 56-69.
    [10]Pan Yangsheng. An Analysis of WS and PFF Algorithms[J]. Journal of Computer Science and Technology, 1987, 2(2): 145-156.

Catalog

    Article views (30) PDF downloads (7960) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return