[1] Arasu A, Ganti V, Kaushik R. Efficient exact set-similarity joins. In Proc. the 32nd VLDB, September 2006, pp.918-929.[2] Hadjieleftheriou M, Yu X, Koudas N, Srivastava D. Hashed samples:Selectivity estimators for set similarity selection queries. PVLDB, 2008, 1(1):201-212.[3] Lee H, Ng R T, Shim K. Power-law based estimation of set similarity join size. PVLDB, 2009, 2(1):658-669.[4] White R W, Jose J M. A study of topic similarity measures. In Proc. the 27th SIGIR, July 2004, pp.520-521.[5] Zhu X, Song S, Lian X, Wang J, Zou L. Matching heterogeneous event data. In Proc. SIGMOD, June 2014, pp.1211-1222.[6] Zhu X, Song S, Wang J, Yu P S, Sun J. Matching heterogeneous events with patterns. In Proc. the 30th ICDE, March 31-April 4, 2014, pp.376-387.[7] Wang J, Song S, Zhu X, Lin X. Efficient recovery of missing events. PVLDB, 2013, 6(10):841-852.[8] Wang J, Song S, Lin X, Zhu X, Pei J. Cleaning structured event logs:A graph repair approach. In Proc. the 31st ICDE, April 2015, pp.30-41.[9] Song S, Chen L. Similarity joins of text with incomplete information formats. In Proc. the 12th DASFAA, April 2007, pp.313-324.[10] Chaudhuri S, Ganti V, Kaushik R. A primitive operator for similarity joins in data cleaning. In Proc. the 22nd ICDE, April 2006, p.5.[11] Beckmann J L, Halverson A, Krishnamurthy R, Naughton J F. Extending RDBMSs to support sparse datasets using an interpreted attribute storage format. In Proc. the 22nd ICDE, April 2006, p.58.[12] Jain A, Doan A, Gravano L. SQL queries over unstructured text databases. In Proc. the 23rd ICDE, April 2007, pp.1255-1257.[13] Dong X, Halevy A Y. Indexing dataspaces. In Proc. SIGMOD, June 2007, pp.43-54.[14] Song S, Chen L, Yuan M. Materialization and decomposition of dataspaces for efficient search. IEEE Trans. Knowl. Data Eng., 2011, 23(12):1872-1887.[15] Song S, Chen L, Yu P S. On data dependencies in dataspaces. In Proc. the 27th ICDE, April 2011, pp.470-481.[16] Dong X, Halevy A Y, Madhavan J, Nemes E, Zhang J. Similarity search for web services. In Proc. the 30th VLDB, August 29-September 3, 2004, pp.372-383.[17] Song S, Chen L. Probabilistic correlation-based similarity measure of unstructured records. In Proc. the 16th CIKM, November 2007, pp.967-970.[18] Song S, Zhu H, Chen L. Probabilistic correlation-based similarity measure on text records. Inf. Sci., 2014, 289:8-24.[19] Sahami M, Heilman T D. A web-based kernel function for measuring the similarity of short text snippets. In Proc. the 15th WWW, May 2006, pp.377-386.[20] Liu S, Liu F, Yu C, Meng W. An effective approach to document retrieval via utilizing WordNet and recognizing phrases. In Proc. the 27th SIGIR, July 2004, pp.266-272.[21] Jin R, Chai J Y, Si L. Learn to weight terms in information retrieval using category information. In Proc. the 22nd ICML, August 2005, pp.353-360.[22] Xiong H, Shekhar S, Tan P N, Kumar V. Exploiting a support-based upper bound of Pearson's correlation coefficient for efficiently identifying strongly correlated pairs. In Proc. the 10th KDD, August 2004, pp.334-343.[23] Song S, Chen L. Efficient set-correlation operator inside databases. In Proc. CIKM, October 2010, pp.139-148.[24] Gravano L, Ipeirotis P G, Jagadish H V, Koudas N, Muthukrishnan S, Srivastava D. Approximate string joins in a database (almost) for free. In Proc. the 27th VLDB, September 2001, pp.491-500.[25] Cohen W W. Integration of heterogeneous databases without common domains using queries based on textual similarity. In Proc. SIGMOD, June 1998, pp.201-212.[26] Gravano L, Ipeirotis P G, Koudas N, Srivastava D. Text joins in an RDBMS for web data integration. In Proc. the 12th WWW, May 2003, pp.90-101.[27] Salton G. Automatic Text Processing:The Transformation, Analysis, and Retrieval of Information by Computer. AddisonWesley, 1989.[28] Bilenko M, Mooney R J. Adaptive duplicate detection using learnable string similarity measures. In Proc. the 9th KDD, August 2003, pp.39-48.[29] Sarawagi S, Bhamidipaty A. Interactive deduplication using active learning. In Proc. the 8th KDD, July 2002, pp.269-278.[30] Hofmann T. Probabilistic latent semantic analysis. In Proc. UAI, July 1999, pp.289-296.[31] Hofmann T. Probabilistic latent semantic indexing. In Proc. the 22nd SIGIR, August 1999, pp.50-57.[32] Deerwester S C, Dumais S T, Landauer T K, Furnas G W, Harshman R A. Indexing by latent semantic analysis. JASIS, 1990, 41(6):391-407.[33] Brin S, Motwani R, Silverstein C. Beyond market baskets:Generalizing association rules to correlations. In Proc. SIGMOD, May 1997, pp.265-276.[34] Jermaine C. The computational complexity of high dimensional correlation search. In Proc. ICDM, November 2001, pp.249-256.[35] Xiong H, Shekhar S, Tan P N, Kumar V. TAPER:A two-step approach for all-strong-pairs correlation query in large databases. IEEE Trans. Knowl. Data Eng., 2006, 18(4):493-508.[36] Sparck Jones K. Index term weighting. Information Storage and Retrieval, 1973, 9(11):619-633.[37] Robertson S. Understanding inverse document frequency:On theoretical argument for IDF. Journal of Documentation, 2004, 60(5):503-520.[38] Chaudhuri S, Das G, Hristidis V, Weikum G. Probabilistic ranking of database query results. In Proc. the 30th VLDB, August 29-September 3, 2004, pp.888-899.[39] Chirita P A, Firan C S, Nejdl W. Personalized query expansion for the web. In Proc. the 30th SIGIR, July 2007, pp.7-14.[40] Theobald M, Schenkel R, Weikum G. Efficient and self-tuning incremental query expansion for top-k query processing. In Proc. the 28th SIGIR, August 2005, pp.242-249.[41] Metzler D, Dumais S T, Meek C. Similarity measures for short segments of text. In Proc. the 29th ECIR, April 2007, pp.16-27.[42] Allan J, Wade C, Bolivar A. Retrieval and novelty detection at the sentence level. In Proc. the 26th SIGIR, August 2003, pp.314-321.[43] Balasubramanian N, Allan J, Croft W B. A comparison of sentence retrieval techniques. In Proc. the 30th SIGIR, July 2007, pp.813-814.[44] Li X, Croft W B. Improving novelty detection for general topics using sentence level information patterns. In Proc. the 15th CIKM, November 2006, pp.238-247.[45] Li X, Croft W B. Novelty detection based on sentence level patterns. In Proc. the 14th CIKM, November 2005, pp.744-751.[46] Murdock V, Croft W B. A translation model for sentence retrieval. In Proc. HLT/EMNLP, October 2005, pp.684-691.[47] Fung P, Yee Lo Y. An IR approach for translating new words from nonparallel, comparable texts. In Proc. the 36th COLING-ACL, August 1998, pp.414-420.[48] Cao G, Nie J Y, Bai J. Integrating word relationships into language models. In Proc. the 28th SIGIR, July 2005, pp.298-305.[49] Arasu A, Ganti V, Kaushik R. Efficient exact set-similarity joins. In Proc. the 32nd VLDB, September 2006, pp.918-929.[50] Lewis D D, Yang Y, Rose T G, Li F. RCV1:A new benchmark collection for text categorization research. Journal of Machine Learning Research, 2004, 5:361-397.[51] Lang K. NewsWeeder:Learning to filter netnews. In Proc. the 12th ICML, June 1995, pp.331-339.[52] Van Rijsbergen C J. Information Retrieval. Butterworth-Heinemann, Newton, MA, USA, 1979. |