[1] Abiteboul S, Hull R, Vianu V. Foundations of Databases. Addison-Wesley, 1995.[2] Dalvi N N, Machanavajjhala A, Pang B. An analysis of structured data on the Web. PVLDB, 2012, 5(7): 680-691.[3] Bienvenu M, ten Cate B, Lutz C, Wolter F. Ontology-based data access: A study through disjunctive datalog, CSP, and MMSNP. In Proc. the 32nd PODS, June 2013, pp.213-224.[4] Sellis T K. Personalization in web search and data management. In proc. the 1st Int. Conf. Model and Data Engineering, September 2011, p.1.[5] Papadimitriou C H. Computational Complexity. AddisonWesley, 1994.[6] Hartmanis J, Stearns R E. On the computational complexity of algorithms. Trans. American Mathematical Society, 1965, 117(5): 285-306.[7] Santos G. SSD ranking: The fastest solid state drives. http: // www.fastestssd.com/featured/ssd-rankings-the-fastest-solid-state-drives/, Aug. 2014.[8] Fan W, Geerts F, Neven F. Making queries tractable on big data with preprocessing. PVLDB, 2013, 6(9): 685-696.[9] Fan W, Geerts F, Libkin L. On scale independence for querying big data. In Proc. the 33rd PODS, June 2014, pp.51-62.[10] Fan W, Wang X, Wu Y. Performance guarantees for distributed reachability queries. PVLDB, 2012, 5(11): 1304-1315.[11] Fan W, Li J, Wang X, Wu Y. Query preserving graph compression. In Proc. ACM SIGMOD, May 2012, pp.157-168.[12] Fan W, Wang X, Wu Y. Answering graph pattern queries using views. In Proc. the 30th ICDE, March 31-April 4, 2014 pp.184-195.[13] Fan W, Wang X, Wu Y. Incremental graph pattern matching. ACM Trans. Database Systems, 2013, 38(3): Article No. 18.[14] Ramalingam G, Reps T. On the computational complexity of dynamic graph problems. TCS, 1996, 158(1/2): 233-277.[15] Fan W, Li J, Ma S, Tang N, Wu Y. Adding regular expressions to graph reachability and pattern queries. In Proc. the 27th ICDE, April 2011, pp.39-50.[16] Fan W, Li J, Ma S, Tang N, Wu Y, Wu Y. Graph pattern matching: From intractability to polynomial time. PVLDB, 2010, 3(1): 264-275.[17] Ma S, Cao Y, Fan W, Huai J, Wo T. Strong simulation: Capturing topology in graph pattern matching. ACM Trans. Database Systems, 2014, 39(1): Article No. 4.[18] Fan W, Wang X, Wu Y. Querying big graphs within bounded resources. In Proc. ACM SIGMOD, June 2014, pp.301-312.[19] Fan W, Geerts F. Foundations of Data Quality Management. Morgan & Claypool Publishers, 2012.[20] Fan W, Geerts F, Jia X, Kementsietsidis A. Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Database Systems, 2008, 33(2): Article No. 6. 10 O Google. Knowledge Graph. http://www.google.co.uk/insidesearch/features/search/knowledge.html, Aug. 2014. 11 O Wikipedia. YAGO. http://en.wikipedia.org/wiki/YAGO (database), Aug. 2014. 12 O Wikipedia. Wiki. http://en.wikipedia.org/wiki/Wiki, Aug. 2014. 868 J. Comput. Sci. & Technol., Sept. 2014, Vol.29, No.5 [21] Cao Y, Fan W, Yu W. Determining the relative accuracy of attributes. In Proc. ACM SIGMOD, June 2013, pp.565-576.[22] Fan W, Geerts F. Relative information completeness. ACM Trans. Database Systems, 2010, 35(4): Article No. 27.[23] Fan W, Geerts F, Wijsen J. Determining the currency of data. ACM Trans. Database Systems, 2012, 37(4): Article No. 25.[24] Fan W, Gao H, Jia X, Li J, Ma S. Dynamic constraints for record matching. VLDB J., 2011, 20(4): 495-520.[25] Cong G, Fan W, Geerts F, Jia X, Ma S. Improving data quality: Consistency and accuracy. In Proc. the 33rd VLDB, Sept. 2007, pp.315-326.[26] Fan W, Li J, Ma S, Tang N, Yu W. Towards certain fixes with editing rules and master data. VLDB J., 2012, 21(2): 213-238.[27] Fan W, Ma S, Tang N, Yu W. Interaction between record matching and data repairing. ACM J. Data and Information Quality, 2014, 4(4): Article No. 16.[28] Fan W, Geerts F, Tang N, Yu W. Inferring data currency and consistency for conflict resolution. In Proc. the 29th ICDE, April 2013, pp.470-481.[29] Greenlaw R, Hoover H J, Ruzzo W L. Limits to Parallel Computation: P-Completeness Theory. New York, USA: Oxford University Press, 1995.[30] Johnson D S. A catalog of complexity classes. In Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity (A), Cambridge, USA: The MIT Press, 1990, pp.67-161.[31] Karloff H J, Suri S, Vassilvitskii S. A model of computation for MapReduce. In Proc. the 21st SODA, Jan. 2010, pp.938-948.[32] Dorrigiv R, López-Ortiz A, Salinger A. Optimal speedup on a low-degree multi-core parallel architecture (LoPRAM). In Proc. the 20th SPAA, June 2008, pp.185-187.[33] Suciu D, Tannen V. A query language for NC. J. Comput. Syst. Sci., 1997, 55(2): 299-321.[34] Hellerstein J M. The declarative imperative: Experiences and conjectures in distributed logic. SIGMOD Record, 2010, 39(1): 5-19.[35] Koutris P, Suciu D. Parallel evaluation of conjunctive queries. In Proc. the 30th PODS, June 2011, pp.223-234.[36] Afrati F N, Ullman J D. Optimizing joins in a map-reduce environment. In Proc. the 13th EDBT, March 2010, pp.99-110.[37] Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Commun. ACM, 2008, 51(1): 107-113.[38] Armbrust M, Curtis K, Kraska T, Fox A, Franklin M J, Patterson D A. PIQL: Success-tolerant query processing in the cloud. PVLDB, 2011, 5(3): 181-192.[39] Armbrust M, Fox A, Patterson D A, Lanham N, Trushkowsky B, Trutna J, Oh H. SCADS: Scale-independent storage for social computing applications. In Proc. the 4th CIDR, Jan. 2009.[40] Armbrust M, Liang E, Kraska T, Fox A, Franklin M J, Patterson D. Generalized scale independence through incremental precomputation. In Proc. ACM SIGMOD, June 2013, pp.625-636.[41] Fagin R, Lotem A, Naor M. Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences, 2003, 66(4): 614-656.[42] Cao Y, Fan W, Wu T, Yu W. Bounded conjunctive queries. PVLDB, 2014, 7(12): 1231-1242.[43] Milner R. Communication and Concurrency. NJ, USA: Prentice Hall, 1989.[44] Brynielsson J, Högberg J, Kaati L, Moartenson C, Svenson P. Detecting social positions using simulation. In Proc. ASONAM, August 2010, pp.48-55.[45] Cho J, Shivakumar N, Garcia-Molina H. Finding replicated Web collections. SIGMOD Rec., 2000, 29(2): 355-366.[46] Nardo L D, Ranzato F, Tapparo F. The subgraph similarity problem. TKDE, 2009, 21(5): 748-749.[47] Zou L, Chen L, Özsu M T. Distance-Join: Pattern match query in a large graph database. PVLDB, 2009, 2(1): 886-897.[48] Henzinger M R, Henzinger T, Kopke P. Computing simulations on finite and infinite graphs. In Proc. the 36th FOCS, October 1995, pp.453-462.[49] Jones N D. An introduction to partial evaluation. ACM Comput. Surv., 1996, 28(3): 480-503.[50] Lenzerini M. Data integration: A theoretical perspective. In Proc. the 21st PODS, June 2002, pp.233-246.[51] Halevy A Y. Answering queries using views: A survey. VLDB J., 2001, 10(4): 270-294.[52] Ntoulas A, Cho J, Olston C. What's new on the Web? The evolution of the Web from a search engine perspective. In Proc. the 13th WWW, May 2004, pp.1-12.[53] Fan W, Wang X, Wu Y, Deng D. Distributed graph simulation: Impossibility and possibility. PVLDB, 2014, 7(12): 1083-1094.[54] Kruskal C P, Rudolph L, Snir M. A complexity theory of ef-ficient parallel algorithms. TCS, 1990, 71(1): 95-132.[55] Tao Y, Lin W, Xiao X. Minimal MapReduce algorithms. In Proc. ACM SIGMOD, June 2013, pp.529-540.[56] Bendersky M, Metzler D, Croft W. Learning concept importance using a weighted dependence model. In Proc. the 3rd WSDM, Feb. 2010, pp.31-40.[57] Morris M, Teevan J, Panovich K. What do people ask their social networks, and why? A survey study of status message Q&A behavior. In Proc. the 28th CHI, April 2010, pp.1739-1748.[58] Fan W, Wang X, Wu Y. Diversified top-k graph pattern matching. PVLDB, 2013, 6(13): 1510-1521.[59] Gollapudi S, Sharma A. An axiomatic approach for result diversification. In Proc. the 18th WWW, April 2009, pp.381-390.[60] Deng T, Fan W. On the complexity of query result diversifi-cation. PVLDB, 2013, 6(8): 577-588.[61] Crescenzi P, Kann V, Halldórsson M. A compendium of NP optimization problems. http://www.nada.kth.se/?viggo/wwwcompendium/, Aug. 2014.[62] Vazirani V V. Approximation Algorithms. Springer, 2003.[63] Acharya S, Gibbons P B, Poosala V, Ramaswamy S. Join synopses for approximate query answering. In Proc. ACM SIGMOD, June 1999, pp.275-286.[64] Babcock B, Chaudhuri S, Das G. Dynamic sample selection for approximate query processing. In Proc. ACM SIGMOD, June 2003, pp.539-550.[65] Ester M, Kriegel H P, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. the 2nd KDD, August 1996, pp.226-231.[66] Garofalakis M N, Gibbons P B. Wavelet synopses with error guarantees. In Proc. ACM SIGMOD, June 2002, pp.476-487.[67] Gibbons P B, Matias Y. Synopsis data structures for massive data sets. In Proc. the 10th SODA, Jan. 1999, pp.909-910.[68] Ioannidis Y E, Poosala V. Histogram-based approximation of set-valued query-answers. In Proc. the 25th VLDB, Sept. 1999, pp.174-185.[69] Jagadish H V, Koudas N, Muthukrishnan S, Poosala V, Sevcik K C, Suel T. Optimal histograms with quality guarantees. In Proc. the 24th VLDB, August 2009, pp.275-286.[70] Kaufman L, Rousseeuw P J. Finding Groups in Data: An Introduction to Cluster Analysis. New York, USA: John Wiley, 1990. Wenfei Fan et al.: Querying Big Data 869 [71] Rösch P, Lehner W. Sample synopses for approximate answering of group-by queries. In Proc. the 12th EDBT, March 2009, pp.403-414.[72] Vitter J S, Wang M. Approximate computation of multidimensional aggregates of sparse data using wavelets. In Proc. ACM SIGMOD, June 1999, pp.193-204.[73] Agarwal S, Mozafari B, Panda A, Milner H, Madden S, Stoica I. BlinkDB: Queries with bounded errors and bounded response times on very large data. In Proc. the 8th EuroSys, April 2013, pp.29-42.[74] Redman T. The impact of poor data quality on the typical enterprise. Commun. ACM, 1998, 41(2): 79-82.[75] Shilakes C C, Tylman J. Enterprise information portals. Technical Report, Merrill Lynch, Inc., 1998.[76] Miller Jr D W, Yeast J D, Evans R L. Missing prenatal records at a birth center: A communication problem quantified. In AMIA Annu. Symp. Proc., 2005, pp.535-539.[77] Eckerson W W. Data quality and the bottom line: Achieving business success through a commitment to high quality data. Technical Report, The Data Warehousing Institute, 2002.[78] Ma S, Fan W, Bravo L. Extending inclusion dependencies with conditions. TCS, 1998, 515: 64-95.[79] Herzog T N, Scheuren F J, Winkler W E. Data Quality and Record Linkage Techniques. Springer, 2009.[80] Chiang F, Miller R J. Discovering data quality rules. PVLDB, 2008, 1(1): 1166-1177.[81] Fan W, Geerts F, Li J, Xiong M. Discovering conditional functional dependencies. TKDE, 2011, 23(5): 683-698.[82] Golab L, Karloff H, Korn F, Srivastava D, Yu B. On generating near-optimal tableaux for conditional functional dependencies. PVLDB, 2008, 1(1): 376-390.[83] Arenas M, Bertossi L, Chomicki J. Consistent query answers in inconsistent databases. In Proc. the 18th PODS, May 31-June 2, 1999, pp.68-79.[84] Bohannon P, Fan W, Flaster M, Rastogi R. A cost-based model and effective heuristic for repairing constraints by value modification. In Proc. ACM SIGMOD, June 2005, pp.143-154.[85] Yakout M, Elmagarmid A K, Neville J, Ouzzani M, Ilyas I F. Guided data repair. PVLDB, 2011, 4(5): 279-289.[86] Fan W, Geerts F, Ma S, Müller H. Detecting inconsistencies in distributed data. In Proc. the 26th ICDE, March 2010, pp.64-75.[87] Fan W, Li J, Tang N, Yu W. Incremental detection of inconsistencies in distributed data. TKDE, 2014, 26(6): 1367-1383.[88] Bleiholder J, Naumann F. Data fusion. ACM Comput. Surv., 2008, 41(1): Article No. 1. |