[1] Herodotou H, Lim H, Luo G et al. Starfish: A self-tuning system for big data analytics. In Proc. the 15th CIDR, Apr. 2011, pp.261-272.[2] Wu S, Ooi B C, Tan K L. Continuous sampling for online aggregation over multiple queries. In Proc. the 2010 International Conference on Management of Data (SIGMOD), June 2010, pp.651-662.[3] Chaudhuri S, Das G, Datar M et al. Overcoming limitations of sampling for aggregation queries. In Proc. the 17th Int. Conf. Data Engineering, Apr. 2001, pp.534-544.[4] Laptev N, Zeng K, Zaniolo C. Early accurate results for advanced analytics on MapReduce. PVLDB, 2012, 5(10): 10281039.[5] Hellerstein J M, Haas P J, Wang H J. Online aggregation. ACM SIGMOD Record., 1997, 26(2): 171-182.[6] Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 2008, 51(1): 107-113.[7] Borkar V, Carey M, Grover R et al. Hyracks: A flexible and extensible foundation for data-intensive computing. In Proc. the 27th International Conference on Data Engineering, Apr. 2011, pp.1151-1162.[8] Pansare N, Borkar V R, Jermaine C et al. Online aggregation for large MapReduce jobs. PVLDB, 2011, 4(11): 1135-1145.[9] Böose J H, Andrzejak A, Höogqvist M. Beyond online aggregation: Parallel and incremental data mining with online mapreduce. In Proc. MDAC, Apr. 2010, Article No.3.[10] Condie T, Conway N, Alvaro P et al. Online aggregation and continuous query support in MapReduce. In Proc. the 2010 International Conference on Management of Data, June 2010, pp.1115-1118.[11] Shi Y, Meng X,Wang F et al. You can stop early with COLA: Online processing of aggregate queries in the cloud. In Proc. the 21st ACM International Conference on Information and Knowledgy Management, Oct. 29-Nov. 2, 2012, pp.1223-1232.[12] Grover R, Carey M J. Extending MapReduce for efficient predicate-based sampling. In Proc. the 28th International Conference on Data Engineering, Apr. 2012, pp.486-497.[13] Wang Y, Luo J, Song A, Jin J H, Dong F. Improving online aggregation performance for skewed data distribution. In Proc. Database Systems for Advanced Applications, Apr. 2012, pp.18-32.[14] Chaudhuri S, Das G, Srivastava U. Effective use of blocklevel sampling in statistics estimation. In Proc. the 2004 International Conference on Management of Data, June 2004, pp.287-298.[15] Jacobs A. The pathologies of big data. Communications of the ACM, 2009, 52(8): 36-44.[16] Soroush E, Balazinska M, Wang D. Arraystore: A storage manager for complex parallel array processing. In Proc. the 2011 International Conference on Management of Data, June 2011, pp.253-264.[17] Eltabakh M Y, Tian Y, Ozcan F et al. CoHadoop: Flexible data placement and its exploitation in Hadoop. PVLDB, 2011, 4(9): 575-585.[18] Nykiel T, Potamias M, Mishra C et al. MRShare: Sharing across multiple queries in MapReduce. PVLDB, 2010, 3(1/2): 494-505.[19] Wu S, Jiang S, Ooi B C et al. Distributed online aggregations. PVLDB, 2009, 2(1): 443-454.[20] Zaharia M, Borthakur D, Sen Sarma J et al. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In Proc. the 5th European Conference on Computer System, Apr. 2010, pp.265-278.[21] Chaudhuri S, Narasyrra V. Program for tpc-d data generation with skew. Technical Report, ftp://ftp.research.microsoft.com/pub/user./viveknar/tpcdskew, Dec. 2012.[22] Haas P J. Large-sample and deterministic confidence intervals for online aggregation. In Proc. the 9th International Conference on Scientific and Statistical Database Management, Aug. 1997, pp.51-62.[23] Haas P J, Hellerstein J M. Ripple joins for online aggregation. ACM SIGMOD Record, 1999, 28(2): 287-298.[24] Luo G, Ellmann C J, Haas P J et al. A scalable hash ripple join algorithm. In Proc. the 2002 International Conference on Management of Data, June 2002, pp.252-262. |