|
Journal of Computer Science and Technology ›› 2021, Vol. 36 ›› Issue (4): 792-805.doi: 10.1007/s11390-021-1353-5
Special Issue: Data Management and Data Mining
• Special Section on AI4DB and DB4AI • Previous Articles Next Articles
Jian-Wei Cui, Member, CCF, Wei Lu, Member, CCF, Xin Zhao, and Xiao-Yong Du*, Fellow, CCF
[1] Yosinski J, Clune J, Bengio Y, Lipson H. How transferable are features in deep neural networks? arXiv:1411.1792, 2014. https://arxiv.org/abs/1411.1792, Nov. 2020. [2] Wang W, Wang S, Gao J, Zhang M, Chen G, Ng T K, Ooi B C. Rafiki:Machine learning as an analytics service system. arXiv:1804.06087, 2018. https://arxiv.org/abs/1804.06087, Apr. 2021. [3] Zhang W, Jiang J, Shao Y, Cui B. Efficient diversity-driven ensemble for deep neural networks. In Proc. the 36th IEEE International Conference on Data Engineering, Apr. 2020, pp.73-84. DOI:10.1109/ICDE48307.2020.00014. [4] Derakhshan B, Mahdiraji A R, Abedjan Z, Rabl T, Markl V. Optimizing machine learning workloads in collaborative environments. In Proc. the 2020 ACM SIGMOD International Conference on Management of Data, Jun. 2020, pp.1701-1716. DOI:10.1145/3318464.3389715. [5] Schapire R E. Explaining AdaBoost. In Empirical Inference, Schölkopf B, Luo Z, Vovk V (eds.), Springer, 2013, pp.37-52. DOI:10.1007/978-3-642-41136-65. [6] Zhao Z, Chen H, Zhang J, Zhao X, Liu T, Lu W, Chen X, Deng H, Ju Q, Du X. UER:An open-source toolkit for pre-training models. arXiv:1909.05658, 2019. https://arxiv.org/abs/1909.05658, April 2021. [7] Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv:1301.3781, 2013. https://arxiv.org/abs/1301.3781, Jan. 2021. [8] Pennington J, Socher R, Manning C D. Glove:Global vectors for word representation. In Proc. the 2014 Conference on Empirical Methods in Natural Language Processing, Oct. 2014, pp.1532-1543. DOI:10.3115/v1/D14-1162. [9] Zhao Z, Liu T, Li S, Li B, Du X. Ngram2vec:Learning improved word representations from ngram co-occurrence statistics. In Proc. the 2017 Conference on Empirical Methods in Natural Language Processing, Sept. 2017, pp.244-253. DOI:10.18653/v1/D17-1023. [10] Dai A M, Olah C, Le Q V. Document embedding with paragraph vectors. arXiv:1507.07998, 2015. https://arxiv.org/abs/1507.07998, April 2021. [11] Axelrod A, He X, Gao J. Domain adaptation via pseudo in-domain data selection. In Proc. the 2011 Conference on Empirical Methods in Natural Language Processing, Jul. 2011, pp.355-362. [12] Chen B, Huang F. Semi-supervised convolutional networks for translation adaptation with tiny amount of in-domain data. In Proc. the 20th SIGNLL Conference on Computational Natural Language Learning, Aug. 2016, pp.314-323. DOI:10.18653/v1/K16-1031. [13] Pan S J, Yang Q. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10):1345-1359. DOI:10.1109/TKDE.2009.191. [14] Freitag M, Al-Onaizan Y. Fast domain adaptation for neural machine translation. arXiv:1612.06897, 2016. https://arxiv.org/abs/1612.06897, Dec. 2020. [15] Zhang C, Kumar A, Ré C. Materialization optimizations for feature selection workloads. ACM Transactions on Database Systems, 2016, 41(1):Article No. 2. DOI:10.1145/2877204. [16] Nguyen C, Hassner T, Seeger M, Archambeau C. LEEP:A new measure to evaluate transferability of learned representations. In Proc. the 37th International Conference on Machine Learning, July 2020, pp.7294-7305. [17] Dietterich T G. Ensemble methods in machine learning. In Proc. the 1st International Workshop on Multiple Classifier Systems, Jun. 2000, pp.1-15. DOI:10.1007/3-540-4501491. [18] Fu F, Jiang J, Shao Y, Cui B. An experimental evaluation of large scale GBDT systems. Proceedings of the VLDB Endowment, 2019, 12(11):1357-1370. DOI:10.14778/3342263.3342273. [19] Breiman L. Stacked regressions. Machine Learning, 1996, 24(1):49-64. DOI:10.1023/A:1018046112532. [20] Ding Y X, Zhou Z H. Boosting-based reliable model reuse. In Proc. the 12th Asian Conference on Machine Learning, November 2020, pp.145-160. [21] Miao H, Li A, Davis LS, Deshpande A. ModelHub:Towards unified data and lifecycle management for deep learning. arXiv:1611.06224, 2016. https://arxiv.org/abs/1611.06224, Nov. 2020. [22] Vartak M, Subramanyam H, Lee W E, Viswanathan S, Husnoo S, Madden S, Zaharia M. MODELDB:A system for machine learning model management. In Proc. the Workshop on Human-in-the-Loop Data Analytics, June 26-July 1, 2016, Article No. 14. DOI:10.1145/2939502.2939516. [23] Bhardwaj A, Bhattacherjee S, Chavan A, Deshpande A, Elmore A J, Madden S, Parameswaran A G. Datahub:Collaborative data science & dataset version management at scale. arXiv:1409.0798, 2014. https://arxiv.org/abs/1409.0798, April 2021. [24] Kraska T, Talwalkar A, Duchi J C, Griffith R, Franklin M J, Jordan M I. MLbase:A distributed machine-learning system. In Proc. the 6th Biennial Conference on Innovative Data Systems Research, Jan. 2013. [25] Xin D, Ma L, Liu J, Macke S, Song S, Parameswaran A. HELIX:Accelerating human-in-the-loop machine learning. arXiv:1808.01095, 2018. https://arxiv.org/abs/1808.01095, April 2021. [26] Xu L, Dong Q, Liao Y, Yu C, Tian Y, Liu W, Li L, Liu C, Zhang X. CLUENER2020:Fine-grained named entity recognition dataset and benchmark for Chinese. arXiv:2001.04351, 2020. https://arxiv.org/abs/2001.04351, Jan. 2021. [27] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I. Attention is all you need. arXiv:1706.03762, 2017. https://arxiv.org/abs/1706.03762, April 2021. [28] Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V. RoBERTa:A robustly optimized BERT pretraining approach. arXiv:1907.11692, 2019. https://arxiv.org/abs/1907.11692, April 2021. |
[1] | Zhi-Xin Qi, Hong-Zhi Wang, An-Jie Wang. Impacts of Dirty Data on Classification and Clustering Models: An Experimental Evaluation [J]. Journal of Computer Science and Technology, 2021, 36(4): 806-821. |
|