SCIE, EI, Scopus, INSPEC, DBLP, CSCD, etc.
Citation: | Jian-Wei Cui, Wei Lu, Xin Zhao, Xiao-Yong Du. Efficient Model Store and Reuse in an OLML Database System[J]. Journal of Computer Science and Technology, 2021, 36(4): 792-805. DOI: 10.1007/s11390-021-1353-5 |
[1] |
Yosinski J, Clune J, Bengio Y, Lipson H. How transferable are features in deep neural networks? arXiv:1411.1792, 2014. https://arxiv.org/abs/1411.1792, Nov. 2020.
|
[2] |
Wang W, Wang S, Gao J, Zhang M, Chen G, Ng T K, Ooi B C. Rafiki:Machine learning as an analytics service system. arXiv:1804.06087, 2018. https://arxiv.org/abs/1804.06087, Apr. 2021.
|
[3] |
Zhang W, Jiang J, Shao Y, Cui B. Efficient diversity-driven ensemble for deep neural networks. In Proc. the 36th IEEE International Conference on Data Engineering, Apr. 2020, pp.73-84. DOI: 10.1109/ICDE48307.2020.00014.
|
[4] |
Derakhshan B, Mahdiraji A R, Abedjan Z, Rabl T, Markl V. Optimizing machine learning workloads in collaborative environments. In Proc. the 2020 ACM SIGMOD International Conference on Management of Data, Jun. 2020, pp.1701-1716. DOI: 10.1145/3318464.3389715.
|
[5] |
Schapire R E. Explaining AdaBoost. In Empirical Inference, Schölkopf B, Luo Z, Vovk V (eds.), Springer, 2013, pp.37-52. DOI: 10.1007/978-3-642-41136-65.
|
[6] |
Zhao Z, Chen H, Zhang J, Zhao X, Liu T, Lu W, Chen X, Deng H, Ju Q, Du X. UER:An open-source toolkit for pre-training models. arXiv:1909.05658, 2019. https://arxiv.org/abs/1909.05658, April 2021.
|
[7] |
Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv:1301.3781, 2013. https://arxiv.org/abs/1301.3781, Jan. 2021.
|
[8] |
Pennington J, Socher R, Manning C D. Glove:Global vectors for word representation. In Proc. the 2014 Conference on Empirical Methods in Natural Language Processing, Oct. 2014, pp.1532-1543. DOI: 10.3115/v1/D14-1162.
|
[9] |
Zhao Z, Liu T, Li S, Li B, Du X. Ngram2vec:Learning improved word representations from ngram co-occurrence statistics. In Proc. the 2017 Conference on Empirical Methods in Natural Language Processing, Sept. 2017, pp.244-253. DOI: 10.18653/v1/D17-1023.
|
[10] |
Dai A M, Olah C, Le Q V. Document embedding with paragraph vectors. arXiv:1507.07998, 2015. https://arxiv.org/abs/1507.07998, April 2021.
|
[11] |
Axelrod A, He X, Gao J. Domain adaptation via pseudo in-domain data selection. In Proc. the 2011 Conference on Empirical Methods in Natural Language Processing, Jul. 2011, pp.355-362.
|
[12] |
Chen B, Huang F. Semi-supervised convolutional networks for translation adaptation with tiny amount of in-domain data. In Proc. the 20th SIGNLL Conference on Computational Natural Language Learning, Aug. 2016, pp.314-323. DOI: 10.18653/v1/K16-1031.
|
[13] |
Pan S J, Yang Q. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10):1345-1359. DOI: 10.1109/TKDE.2009.191.
|
[14] |
Freitag M, Al-Onaizan Y. Fast domain adaptation for neural machine translation. arXiv:1612.06897, 2016. https://arxiv.org/abs/1612.06897, Dec. 2020.
|
[15] |
Zhang C, Kumar A, Ré C. Materialization optimizations for feature selection workloads. ACM Transactions on Database Systems, 2016, 41(1):Article No. 2. DOI: 10.1145/2877204.
|
[16] |
Nguyen C, Hassner T, Seeger M, Archambeau C. LEEP:A new measure to evaluate transferability of learned representations. In Proc. the 37th International Conference on Machine Learning, July 2020, pp.7294-7305.
|
[17] |
Dietterich T G. Ensemble methods in machine learning. In Proc. the 1st International Workshop on Multiple Classifier Systems, Jun. 2000, pp.1-15. DOI: 10.1007/3-540-4501491.
|
[18] |
Fu F, Jiang J, Shao Y, Cui B. An experimental evaluation of large scale GBDT systems. Proceedings of the VLDB Endowment, 2019, 12(11):1357-1370. DOI: 10.14778/3342263.3342273.
|
[19] |
Breiman L. Stacked regressions. Machine Learning, 1996, 24(1):49-64. DOI: 10.1023/A:1018046112532.
|
[20] |
Ding Y X, Zhou Z H. Boosting-based reliable model reuse. In Proc. the 12th Asian Conference on Machine Learning, November 2020, pp.145-160.
|
[21] |
Miao H, Li A, Davis LS, Deshpande A. ModelHub:Towards unified data and lifecycle management for deep learning. arXiv:1611.06224, 2016. https://arxiv.org/abs/1611.06224, Nov. 2020.
|
[22] |
Vartak M, Subramanyam H, Lee W E, Viswanathan S, Husnoo S, Madden S, Zaharia M. MODELDB:A system for machine learning model management. In Proc. the Workshop on Human-in-the-Loop Data Analytics, June 26-July 1, 2016, Article No. 14. DOI: 10.1145/2939502.2939516.
|
[23] |
Bhardwaj A, Bhattacherjee S, Chavan A, Deshpande A, Elmore A J, Madden S, Parameswaran A G. Datahub:Collaborative data science & dataset version management at scale. arXiv:1409.0798, 2014. https://arxiv.org/abs/1409.0798, April 2021.
|
[24] |
Kraska T, Talwalkar A, Duchi J C, Griffith R, Franklin M J, Jordan M I. MLbase:A distributed machine-learning system. In Proc. the 6th Biennial Conference on Innovative Data Systems Research, Jan. 2013.
|
[25] |
Xin D, Ma L, Liu J, Macke S, Song S, Parameswaran A. HELIX:Accelerating human-in-the-loop machine learning. arXiv:1808.01095, 2018. https://arxiv.org/abs/1808.01095, April 2021.
|
[26] |
Xu L, Dong Q, Liao Y, Yu C, Tian Y, Liu W, Li L, Liu C, Zhang X. CLUENER2020:Fine-grained named entity recognition dataset and benchmark for Chinese. arXiv:2001.04351, 2020. https://arxiv.org/abs/2001.04351, Jan. 2021.
|
[27] |
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I. Attention is all you need. arXiv:1706.03762, 2017. https://arxiv.org/abs/1706.03762, April 2021.
|
[28] |
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V. RoBERTa:A robustly optimized BERT pretraining approach. arXiv:1907.11692, 2019. https://arxiv.org/abs/1907.11692, April 2021.
|
1. | Jianwei Cui, Wenhang Shi, Honglin Tao, et al. A Two-Phase Recall-and-Select Framework for Fast Model Selection. 2024 IEEE 40th International Conference on Data Engineering (ICDE), DOI:10.1109/ICDE60146.2024.00087 |
2. | Antonios Kontaxakis, Dimitris Sacharidis, Alkis Simitsis, et al. HYPPO: Using Equivalences to Optimize Pipelines in Exploratory Machine Learning. 2024 IEEE 40th International Conference on Data Engineering (ICDE), DOI:10.1109/ICDE60146.2024.00024 |