SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.
Citation: | Wenfei Fan, Jin-Peng Huai. Querying Big Data: Bridging Theory and Practice[J]. Journal of Computer Science and Technology, 2014, 29(5): 849-869. DOI: 10.1007/s11390-014-1473-2 |
[1] |
Abiteboul S, Hull R, Vianu V. Foundations of Databases. Addison-Wesley, 1995.
|
[2] |
Dalvi N N, Machanavajjhala A, Pang B. An analysis of structured data on the Web. PVLDB, 2012, 5(7): 680-691.
|
[3] |
Bienvenu M, ten Cate B, Lutz C, Wolter F. Ontology-based data access: A study through disjunctive datalog, CSP, and MMSNP. In Proc. the 32nd PODS, June 2013, pp.213-224.
|
[4] |
Sellis T K. Personalization in web search and data management. In proc. the 1st Int. Conf. Model and Data Engineering, September 2011, p.1.
|
[5] |
Papadimitriou C H. Computational Complexity. AddisonWesley, 1994.
|
[6] |
Hartmanis J, Stearns R E. On the computational complexity of algorithms. Trans. American Mathematical Society, 1965, 117(5): 285-306.
|
[7] |
Santos G. SSD ranking: The fastest solid state drives. http: // www.fastestssd.com/featured/ssd-rankings-the-fastest-solid-state-drives/, Aug. 2014.
|
[8] |
Fan W, Geerts F, Neven F. Making queries tractable on big data with preprocessing. PVLDB, 2013, 6(9): 685-696.
|
[9] |
Fan W, Geerts F, Libkin L. On scale independence for querying big data. In Proc. the 33rd PODS, June 2014, pp.51-62.
|
[10] |
Fan W, Wang X, Wu Y. Performance guarantees for distributed reachability queries. PVLDB, 2012, 5(11): 1304-1315.
|
[11] |
Fan W, Li J, Wang X, Wu Y. Query preserving graph compression. In Proc. ACM SIGMOD, May 2012, pp.157-168.
|
[12] |
Fan W, Wang X, Wu Y. Answering graph pattern queries using views. In Proc. the 30th ICDE, March 31-April 4, 2014 pp.184-195.
|
[13] |
Fan W, Wang X, Wu Y. Incremental graph pattern matching. ACM Trans. Database Systems, 2013, 38(3): Article No. 18.
|
[14] |
Ramalingam G, Reps T. On the computational complexity of dynamic graph problems. TCS, 1996, 158(1/2): 233-277.
|
[15] |
Fan W, Li J, Ma S, Tang N, Wu Y. Adding regular expressions to graph reachability and pattern queries. In Proc. the 27th ICDE, April 2011, pp.39-50.
|
[16] |
Fan W, Li J, Ma S, Tang N, Wu Y, Wu Y. Graph pattern matching: From intractability to polynomial time. PVLDB, 2010, 3(1): 264-275.
|
[17] |
Ma S, Cao Y, Fan W, Huai J, Wo T. Strong simulation: Capturing topology in graph pattern matching. ACM Trans. Database Systems, 2014, 39(1): Article No. 4.
|
[18] |
Fan W, Wang X, Wu Y. Querying big graphs within bounded resources. In Proc. ACM SIGMOD, June 2014, pp.301-312.
|
[19] |
Fan W, Geerts F. Foundations of Data Quality Management. Morgan & Claypool Publishers, 2012.
|
[20] |
Fan W, Geerts F, Jia X, Kementsietsidis A. Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Database Systems, 2008, 33(2): Article No. 6. 10 O Google. Knowledge Graph. http://www.google.co.uk/insidesearch/features/search/knowledge.html, Aug. 2014. 11 O Wikipedia. YAGO. http://en.wikipedia.org/wiki/YAGO (database), Aug. 2014. 12 O Wikipedia. Wiki. http://en.wikipedia.org/wiki/Wiki, Aug. 2014. 868 J. Comput. Sci. & Technol., Sept. 2014, Vol.29, No.5
|
[21] |
Cao Y, Fan W, Yu W. Determining the relative accuracy of attributes. In Proc. ACM SIGMOD, June 2013, pp.565-576.
|
[22] |
Fan W, Geerts F. Relative information completeness. ACM Trans. Database Systems, 2010, 35(4): Article No. 27.
|
[23] |
Fan W, Geerts F, Wijsen J. Determining the currency of data. ACM Trans. Database Systems, 2012, 37(4): Article No. 25.
|
[24] |
Fan W, Gao H, Jia X, Li J, Ma S. Dynamic constraints for record matching. VLDB J., 2011, 20(4): 495-520.
|
[25] |
Cong G, Fan W, Geerts F, Jia X, Ma S. Improving data quality: Consistency and accuracy. In Proc. the 33rd VLDB, Sept. 2007, pp.315-326.
|
[26] |
Fan W, Li J, Ma S, Tang N, Yu W. Towards certain fixes with editing rules and master data. VLDB J., 2012, 21(2): 213-238.
|
[27] |
Fan W, Ma S, Tang N, Yu W. Interaction between record matching and data repairing. ACM J. Data and Information Quality, 2014, 4(4): Article No. 16.
|
[28] |
Fan W, Geerts F, Tang N, Yu W. Inferring data currency and consistency for conflict resolution. In Proc. the 29th ICDE, April 2013, pp.470-481.
|
[29] |
Greenlaw R, Hoover H J, Ruzzo W L. Limits to Parallel Computation: P-Completeness Theory. New York, USA: Oxford University Press, 1995.
|
[30] |
Johnson D S. A catalog of complexity classes. In Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity (A), Cambridge, USA: The MIT Press, 1990, pp.67-161.
|
[31] |
Karloff H J, Suri S, Vassilvitskii S. A model of computation for MapReduce. In Proc. the 21st SODA, Jan. 2010, pp.938-948.
|
[32] |
Dorrigiv R, López-Ortiz A, Salinger A. Optimal speedup on a low-degree multi-core parallel architecture (LoPRAM). In Proc. the 20th SPAA, June 2008, pp.185-187.
|
[33] |
Suciu D, Tannen V. A query language for NC. J. Comput. Syst. Sci., 1997, 55(2): 299-321.
|
[34] |
Hellerstein J M. The declarative imperative: Experiences and conjectures in distributed logic. SIGMOD Record, 2010, 39(1): 5-19.
|
[35] |
Koutris P, Suciu D. Parallel evaluation of conjunctive queries. In Proc. the 30th PODS, June 2011, pp.223-234.
|
[36] |
Afrati F N, Ullman J D. Optimizing joins in a map-reduce environment. In Proc. the 13th EDBT, March 2010, pp.99-110.
|
[37] |
Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Commun. ACM, 2008, 51(1): 107-113.
|
[38] |
Armbrust M, Curtis K, Kraska T, Fox A, Franklin M J, Patterson D A. PIQL: Success-tolerant query processing in the cloud. PVLDB, 2011, 5(3): 181-192.
|
[39] |
Armbrust M, Fox A, Patterson D A, Lanham N, Trushkowsky B, Trutna J, Oh H. SCADS: Scale-independent storage for social computing applications. In Proc. the 4th CIDR, Jan. 2009.
|
[40] |
Armbrust M, Liang E, Kraska T, Fox A, Franklin M J, Patterson D. Generalized scale independence through incremental precomputation. In Proc. ACM SIGMOD, June 2013, pp.625-636.
|
[41] |
Fagin R, Lotem A, Naor M. Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences, 2003, 66(4): 614-656.
|
[42] |
Cao Y, Fan W, Wu T, Yu W. Bounded conjunctive queries. PVLDB, 2014, 7(12): 1231-1242.
|
[43] |
Milner R. Communication and Concurrency. NJ, USA: Prentice Hall, 1989.
|
[44] |
Brynielsson J, Högberg J, Kaati L, Moartenson C, Svenson P. Detecting social positions using simulation. In Proc. ASONAM, August 2010, pp.48-55.
|
[45] |
Cho J, Shivakumar N, Garcia-Molina H. Finding replicated Web collections. SIGMOD Rec., 2000, 29(2): 355-366.
|
[46] |
Nardo L D, Ranzato F, Tapparo F. The subgraph similarity problem. TKDE, 2009, 21(5): 748-749.
|
[47] |
Zou L, Chen L, Özsu M T. Distance-Join: Pattern match query in a large graph database. PVLDB, 2009, 2(1): 886-897.
|
[48] |
Henzinger M R, Henzinger T, Kopke P. Computing simulations on finite and infinite graphs. In Proc. the 36th FOCS, October 1995, pp.453-462.
|
[49] |
Jones N D. An introduction to partial evaluation. ACM Comput. Surv., 1996, 28(3): 480-503.
|
[50] |
Lenzerini M. Data integration: A theoretical perspective. In Proc. the 21st PODS, June 2002, pp.233-246.
|
[51] |
Halevy A Y. Answering queries using views: A survey. VLDB J., 2001, 10(4): 270-294.
|
[52] |
Ntoulas A, Cho J, Olston C. What's new on the Web? The evolution of the Web from a search engine perspective. In Proc. the 13th WWW, May 2004, pp.1-12.
|
[53] |
Fan W, Wang X, Wu Y, Deng D. Distributed graph simulation: Impossibility and possibility. PVLDB, 2014, 7(12): 1083-1094.
|
[54] |
Kruskal C P, Rudolph L, Snir M. A complexity theory of ef-ficient parallel algorithms. TCS, 1990, 71(1): 95-132.
|
[55] |
Tao Y, Lin W, Xiao X. Minimal MapReduce algorithms. In Proc. ACM SIGMOD, June 2013, pp.529-540.
|
[56] |
Bendersky M, Metzler D, Croft W. Learning concept importance using a weighted dependence model. In Proc. the 3rd WSDM, Feb. 2010, pp.31-40.
|
[57] |
Morris M, Teevan J, Panovich K. What do people ask their social networks, and why? A survey study of status message Q&A behavior. In Proc. the 28th CHI, April 2010, pp.1739-1748.
|
[58] |
Fan W, Wang X, Wu Y. Diversified top-k graph pattern matching. PVLDB, 2013, 6(13): 1510-1521.
|
[59] |
Gollapudi S, Sharma A. An axiomatic approach for result diversification. In Proc. the 18th WWW, April 2009, pp.381-390.
|
[60] |
Deng T, Fan W. On the complexity of query result diversifi-cation. PVLDB, 2013, 6(8): 577-588.
|
[61] |
Crescenzi P, Kann V, Halldórsson M. A compendium of NP optimization problems. http://www.nada.kth.se/?viggo/wwwcompendium/, Aug. 2014.
|
[62] |
Vazirani V V. Approximation Algorithms. Springer, 2003.
|
[63] |
Acharya S, Gibbons P B, Poosala V, Ramaswamy S. Join synopses for approximate query answering. In Proc. ACM SIGMOD, June 1999, pp.275-286.
|
[64] |
Babcock B, Chaudhuri S, Das G. Dynamic sample selection for approximate query processing. In Proc. ACM SIGMOD, June 2003, pp.539-550.
|
[65] |
Ester M, Kriegel H P, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. the 2nd KDD, August 1996, pp.226-231.
|
[66] |
Garofalakis M N, Gibbons P B. Wavelet synopses with error guarantees. In Proc. ACM SIGMOD, June 2002, pp.476-487.
|
[67] |
Gibbons P B, Matias Y. Synopsis data structures for massive data sets. In Proc. the 10th SODA, Jan. 1999, pp.909-910.
|
[68] |
Ioannidis Y E, Poosala V. Histogram-based approximation of set-valued query-answers. In Proc. the 25th VLDB, Sept. 1999, pp.174-185.
|
[69] |
Jagadish H V, Koudas N, Muthukrishnan S, Poosala V, Sevcik K C, Suel T. Optimal histograms with quality guarantees. In Proc. the 24th VLDB, August 2009, pp.275-286.
|
[70] |
Kaufman L, Rousseeuw P J. Finding Groups in Data: An Introduction to Cluster Analysis. New York, USA: John Wiley, 1990. Wenfei Fan et al.: Querying Big Data 869
|
[71] |
Rösch P, Lehner W. Sample synopses for approximate answering of group-by queries. In Proc. the 12th EDBT, March 2009, pp.403-414.
|
[72] |
Vitter J S, Wang M. Approximate computation of multidimensional aggregates of sparse data using wavelets. In Proc. ACM SIGMOD, June 1999, pp.193-204.
|
[73] |
Agarwal S, Mozafari B, Panda A, Milner H, Madden S, Stoica I. BlinkDB: Queries with bounded errors and bounded response times on very large data. In Proc. the 8th EuroSys, April 2013, pp.29-42.
|
[74] |
Redman T. The impact of poor data quality on the typical enterprise. Commun. ACM, 1998, 41(2): 79-82.
|
[75] |
Shilakes C C, Tylman J. Enterprise information portals. Technical Report, Merrill Lynch, Inc., 1998.
|
[76] |
Miller Jr D W, Yeast J D, Evans R L. Missing prenatal records at a birth center: A communication problem quantified. In AMIA Annu. Symp. Proc., 2005, pp.535-539.
|
[77] |
Eckerson W W. Data quality and the bottom line: Achieving business success through a commitment to high quality data. Technical Report, The Data Warehousing Institute, 2002.
|
[78] |
Ma S, Fan W, Bravo L. Extending inclusion dependencies with conditions. TCS, 1998, 515: 64-95.
|
[79] |
Herzog T N, Scheuren F J, Winkler W E. Data Quality and Record Linkage Techniques. Springer, 2009.
|
[80] |
Chiang F, Miller R J. Discovering data quality rules. PVLDB, 2008, 1(1): 1166-1177.
|
[81] |
Fan W, Geerts F, Li J, Xiong M. Discovering conditional functional dependencies. TKDE, 2011, 23(5): 683-698.
|
[82] |
Golab L, Karloff H, Korn F, Srivastava D, Yu B. On generating near-optimal tableaux for conditional functional dependencies. PVLDB, 2008, 1(1): 376-390.
|
[83] |
Arenas M, Bertossi L, Chomicki J. Consistent query answers in inconsistent databases. In Proc. the 18th PODS, May 31-June 2, 1999, pp.68-79.
|
[84] |
Bohannon P, Fan W, Flaster M, Rastogi R. A cost-based model and effective heuristic for repairing constraints by value modification. In Proc. ACM SIGMOD, June 2005, pp.143-154.
|
[85] |
Yakout M, Elmagarmid A K, Neville J, Ouzzani M, Ilyas I F. Guided data repair. PVLDB, 2011, 4(5): 279-289.
|
[86] |
Fan W, Geerts F, Ma S, Müller H. Detecting inconsistencies in distributed data. In Proc. the 26th ICDE, March 2010, pp.64-75.
|
[87] |
Fan W, Li J, Tang N, Yu W. Incremental detection of inconsistencies in distributed data. TKDE, 2014, 26(6): 1367-1383.
|
[88] |
Bleiholder J, Naumann F. Data fusion. ACM Comput. Surv., 2008, 41(1): Article No. 1.
|
[1] | Zhi-Xin Qi, Hong-Zhi Wang, An-Jie Wang. Impacts of Dirty Data on Classification and Clustering Models: An Experimental Evaluation[J]. Journal of Computer Science and Technology, 2021, 36(4): 806-821. DOI: 10.1007/s11390-021-1344-6 |
[2] | An-Zhen Zhang, Jian-Zhong Li, Hong Gao. Interval Estimation for Aggregate Queries on Incomplete Data[J]. Journal of Computer Science and Technology, 2019, 34(6): 1203-1216. DOI: 10.1007/s11390-019-1970-4 |
[3] | Yong-Nan Liu, Jian-Zhong Li, Zhao-Nian Zou. Determining the Real Data Completeness of a Relational Dataset[J]. Journal of Computer Science and Technology, 2016, 31(4): 720-740. DOI: 10.1007/s11390-016-1659-x |
[4] | Hui Li, Di Wu, Gao-Xiang Li, Yi-Hao Ke, Wen-Jie Liu, Yuan-Huan Zheng, Xiao-La Lin. Enhancing Telco Service Quality with Big Data Enabled Churn Analysis: Infrastructure, Model, and Deployment[J]. Journal of Computer Science and Technology, 2015, 30(6): 1201-1214. DOI: 10.1007/s11390-015-1594-2 |
[5] | Ai-Hua Wu, Zi-Jing Tan, Wei Wang. Annotation Based Query Answer over Inconsistent Database[J]. Journal of Computer Science and Technology, 2010, 25(3): 469-481. |
[6] | Taghi M. Khoshgoftaar, Pierre Rebours. Improving Software Quality Prediction by Noise Filtering Techniques[J]. Journal of Computer Science and Technology, 2007, 22(3): 387-396. |
[7] | Chao-Kun Wang, Jian-Min Wang, Jia-Guang Sun, Sheng-Fei Shi, Hong Gao. AbIx: An Approach to Content-Based Approximate Query Processing in Peer-to-Peer Data Systems[J]. Journal of Computer Science and Technology, 2007, 22(2): 280-286. |
[8] | Riadh Robbana. Verification of Duration Systems Using an Approximation Approach[J]. Journal of Computer Science and Technology, 2003, 18(2). |
[9] | YONG Junhai, HU Shimin, SUN Jiaguang. CIM Algorithm for Approximating Three-Dimensional Polygonal Curves[J]. Journal of Computer Science and Technology, 2001, 16(6). |
[10] | Zheng Weidong, Lin Zongkai, Guo Yuchai. A Data Manager for Engineering Applications[J]. Journal of Computer Science and Technology, 1993, 8(4): 21-30. |