“你”是你的行为——大学生学业成就问题的时空表征学习
You Are How You Behave-Spatiotemporal Representation Learning for College Student Academic Achievement
-
摘要: 目的
奖学金是大学生学业成就的体现。传统的奖学金分配主要基于期末成绩,无法在学期中途识别出学业表现有所改善或下降的学生。因此,关于学业成就的影响因素研究非常重要。以往学者多关注个体因素和社会因素,很少通过学生的日常行为特征推断学生的学业成就。本文将通过表征学习对大学生在校日常活动产生的行为数据进行分析,从而识别出影响大学生学业成就的因素,并为高校奖学金分配与学业预警提供决策支持。
方法
本文提出了TMS(Trajectory Mining on Clustering for Scholarship Assignment and Academic Warning)方法。首先,运用特征工程从大学生在校行为数据中提取特征来表征学生的生活模式,学习模式和校园网使用模式。之后,本文提出了客观与主观相结合的加权k-means(Wosk-means)算法进行聚类分析,将6701个本科生分为5个互不相交的群体。此外,我们对具有位置信息的原始数据进行抽取,以获得学生的时空轨迹数据,从而量化学生的轨迹偏离方向,并利用PrefixSpan算法来识别出各个学生群体中学业表现有所下降或改善的学生。
结果
实验结果表明,Wosk-means算法的轮廓系数和Calinski-Harabasz指数均约为最优基准算法的1.5倍,其SSE仅为最优基准算法的一半。同时,Wosk-means算法的时间开销为次优。在与以往研究进行对比分析后,本文实验结果证实了良好的生活方式对学业成就具有积极的影响。而与以往通过实证分析的研究结果不同的是,在低学业成就的学生群体中,本文发现互联网的使用对于女生的影响高于男生。此外,以往研究认为女生的学业成就整体上高于男生,但本文的实验结果并不支持该结论。
结论
本文提出的Wosk-means算法将学生划分为5个互不相交的群体,这些群体在奖学金的分布及金额方面差异明显,为高校的奖学金分配提供依据。此外,通过轨迹偏离方向分析,本文能够识别出学生当前所在的群体以及表现出偏离趋势的目标群体。因此,本文能够发现学期中途学业表现有所改善或下降的学生,从而对学生采取学业预警和激励措施。本文的研究结论将对高校管理决策提供有力支持。Abstract: Scholarships are a reflection of academic achievement for college students. The traditional scholarship assignment is strictly based on final grades and cannot recognize students whose performance trend improves or declines during the semester. This paper develops the Trajectory Mining on Clustering for Scholarship Assignment and Academic Warning (TMS) approach to identify the factors that affect the academic achievement of college students and to provide decision support to help low-performing students attain better performance. Specifically, we first conduct feature engineering to generate a set of features to characterize the lifestyles patterns, learning patterns, and Internet usage patterns of students. We then apply the objective and subjective combined weighted k-means (Wosk-means) algorithm to perform clustering analysis to identify the characteristics of different student groups. Considering the difficulty in obtaining the real global positioning system (GPS) records of students, we apply manually generated spatiotemporal trajectories data to quantify the direction of trajectory deviation with the assistance of the PrefixSpan algorithm to identify low-performing students. The experimental results show that the silhouette coefficient and Calinski-Harabasz index of the Wosk-means algorithm are both approximately 1.5 times to that of the best baseline algorithm, and the sum of the squared error of the Wosk-means algorithm is only the half of the best baseline algorithm.