Clustering anxiety and depression among international student athletes during study abroad using transformer-based embeddings
DOI:
https://doi.org/10.15561/20755279.2025.0506Keywords:
anxiety, Depression, machine learning clustering, transformer embeddings, student-athleteAbstract
Background and Study Aim. Student-athletes studying abroad experience increased risks of anxiety and depression as they balance academic and athletic responsibilities in cross-cultural environments. These psychological challenges may vary depending on individual adaptation, motivation, and environmental stressors. Although previous studies have applied network analysis to explore the structure of anxiety and depression symptoms, the relative effectiveness of advanced semantic approaches in identifying subgroup heterogeneity remains of practical interest. As a follow-up study, this research aimed to apply a transformer-based semantic embedding approach to cluster the mental health profiles of international student-athletes and to compare the model’s performance with traditional clustering methods. Materials and Methods. Data were collected from 219 Chinese international student-athletes who completed the GAD-7 and PHQ-9 questionnaires. Three models were compared: (1) K-means clustering on raw item scores, (2) K-means clustering after dimensionality reduction using Principal Component Analysis (PCA), and (3) K-means clustering on pseudo-text representations embedded via a transformer model, followed by PCA and K-means. Internal validity was assessed with silhouette scores. Between-cluster differences were analyzed using t-tests with Holm correction, effect sizes (Cohen’s d), and cluster profiles. Results. Model 3 (Transformer embeddings + PCA + K-means) outperformed Models 1 and 2, achieving the highest silhouette score (0.391). Visualization in 2D and 3D projections confirmed clearer separation. Three clusters were identified: Cluster 2 (high symptoms), Cluster 0 (intermediate), and Cluster 1 (low symptoms). Pairwise comparisons revealed significant differences across nearly all items. Conclusions. Transformer-based semantic embeddings provide an effective approach to clustering psychological symptoms, outperforming traditional numerical methods. The results indicate the heterogeneity of anxiety and depression subgroups among student-athletes during study abroad, offering valuable insights for targeted screening, early identification, and long-term monitoring.References
Beiter R, Nash R, McCrady M, Rhoades D, Linscomb M, Clarahan M, et al. The prevalence and correlates of depression, anxiety, and stress in a sample of college students. Journal of Affective Disorders, 2015;173: 90–96. https://doi.org/10.1016/j.jad.2014.10.054
Beisecker L, Harrison P, Josephson M, DeFreese JD. Depression, anxiety and stress among female student-athletes: a systematic review and meta-analysis. British Journal of Sports Medicine, 2024;58(5): 278–285. https://doi.org/10.1136/bjsports-2023-107328
Armstrong SN, Burcin MM, Bjerke W, Early J. Depression in student athletes: A particularly at-risk group? A systematic review of the literature. Athletic Insight. 2015;7(2):177-193.
Jing S, Wan Pa WAM, Awang MM. Anxiety and depression among Chinese international student-athletes during study abroad: a psychological network approach. Physical Education of Students, 2025;29(1): 27–38. https://doi.org/10.15561/20755279.2025.0103
Lopes Dos Santos M, Uftring M, Stahl CA, Lockie RG, Alvar B, Mann JB, et al. Stress in Academic and Athletic Performance in Collegiate Athletes: A Narrative Review of Sources and Monitoring Strategies. Frontiers in Sports and Active Living, 2020;2: 42. https://doi.org/10.3389/fspor.2020.00042
Sawir E, Marginson S, Deumert A, Nyland C, Ramia G. Loneliness and International Students: An Australian Study. Journal of Studies in International Education, 2008;12(2): 148–180. https://doi.org/10.1177/1028315307299699
Newell EM. International Student–Athlete Adjustment Issues: Advising Recommendations for Effective Transitions. NACADA Journal, 2015;35(2): 36–47. https://doi.org/10.12930/NACADA-14-015
Misirlis N, Zwaan MH, Weber D. International students’ loneliness, depression and stress levels in COVID-19 crisis. The role of social media and the host university. 2020; https://doi.org/10.48550/ARXIV.2005.12806
Brown L. Language and Anxiety: An Ethnographic Study of International Postgraduate Students. Evaluation & Research in Education, 2008;21(2): 75–95. https://doi.org/10.1080/09500790802152167
Tan Y, Wu Z, Qu X, Liu Y, Peng L, Ge Y, et al. Influencing Factors of International Students’ Anxiety Under Online Learning During the COVID-19 Pandemic: A Cross-Sectional Study of 1,090 Chinese International Students. Frontiers in Psychology, 2022;13: 860289. https://doi.org/10.3389/fpsyg.2022.860289
Rossi A, Pappalardo L, Cintia P. A Narrative Review for a Machine Learning Application in Sports: An Example Based on Injury Forecasting in Soccer. Sports, 2021;10(1): 5. https://doi.org/10.3390/sports10010005
Lu L. Big data analysis of mental health intervention effects in student-athletes: based on data mining techniques and affective computing algorithms. Multimedia Tools and Applications, 2024;84(17): 18547–18565. https://doi.org/10.1007/s11042-024-19786-5
Zhao Z, Wang J. Exploring the Potential of Large Language Model in Predictive Mental Health Diagnosis of Athletes. Advances in Education, Humanities and Social Science Research, 2024;12(1): 342. https://doi.org/10.56028/aehssr.12.1.342.2024
Yun HJ, Jang N, Jeon M. Deep learning-based tennis match type clustering. BMC Sports Science, Medicine and Rehabilitation, 2025;17(1): 104. https://doi.org/10.1186/s13102-025-01147-w
Niu H, Omitaomu OA, Langston MA, Olama M, Ozmen O, Klasky HB, et al. EHR-BERT: A BERT-based model for effective anomaly detection in electronic health records. Journal of Biomedical Informatics, 2024;150: 104605. https://doi.org/10.1016/j.jbi.2024.104605
De Boeck P, Wilson M, Acton GS. A Conceptual and Psychometric Framework for Distinguishing Categories and Dimensions. Psychological Review, 2005;112(1): 129–158. https://doi.org/10.1037/0033-295X.112.1.129
Delgadillo J, Ali S, Fleck K, Agnew C, Southgate A, Parkhouse L, et al. Stratified Care vs Stepped Care for Depression: A Cluster Randomized Clinical Trial. JAMA Psychiatry, 2022;79(2): 101. https://doi.org/10.1001/jamapsychiatry.2021.3539
Bower P, Gilbody S. Stepped care in psychological therapies: access, effectiveness and efficiency: Narrative literature review. British Journal of Psychiatry, 2005;186(1): 11–17. https://doi.org/10.1192/bjp.186.1.11
Van Straten A, Hill J, Richards DA, Cuijpers P. Stepped care treatment delivery for depression: a systematic review and meta-analysis. Psychological Medicine, 2015;45(2): 231–246. https://doi.org/10.1017/S0033291714000701
Kaiser J, Walter N, Oppitz L, Braun B, Schmitz J. 2Steps4Health project: Promotion of mental health in junior professional sports: Study protocol. Sports Psychiatry, 2024;3(1): 23–30. https://doi.org/10.1024/2674-0052/a000066
Scodari BT, Chacko S, Matsumura R, Jacobson NC. Using machine learning to forecast symptom changes among subclinical depression patients receiving stepped care or usual care. Journal of Affective Disorders, 2023;340: 213–220. https://doi.org/10.1016/j.jad.2023.08.004
Kodinariya TM, Makwana PR. Review on Determining Number of Cluster in K-Means Clustering. Int J Adv Res Comput Sci Manag Stud. 2013;1(6):90-95.
Ding C, He X. K -means clustering via principal component analysis. In: Twenty-first international conference on Machine learning - ICML ’04, Banff, Alberta, Canada: ACM Press; 2004. p. 29. https://doi.org/10.1145/1015330.1015408 [Accessed 8th October 2025].
Wu W, Wang W, Jia X, Feng X. Transformer Autoencoder for K-means Efficient clustering. Engineering Applications of Artificial Intelligence, 2024;133: 108612. https://doi.org/10.1016/j.engappai.2024.108612
Holmes B, Raymer M, Banerjee T. Extraction of patients subpopulations with psychiatric symptoms using a transformer architecture. In: 2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Orlando, FL, USA: IEEE; 2024. P. 1–4. https://doi.org/10.1109/EMBC53108.2024.10781648
Torres-Luque G, Ramirez A, Cabello-Manrique D, Nikolaidis TP, Alvero-Cruz JR. Match analysis of elite players during paddle tennis competition. International Journal of Performance Analysis in Sport, 2015;15(3): 1135–1144. https://doi.org/10.1080/24748668.2015.11868857
Gotthardt MU. Clustering Large-Scale 3D Football Player Skeleton Data: Investigating Differences in Player Pose Distributions and their Correspondence to Tracking Performance Levels [Master’s thesis]. Stockholm (Sweden): KTH, School of Electrical Engineering and Computer Science; 2024.
Kroenke K, Spitzer RL, Williams JBW. The PHQ-9: Validity of a brief depression severity measure. Journal of General Internal Medicine, 2001;16(9): 606–613. https://doi.org/10.1046/j.1525-1497.2001.016009606.x
Spitzer RL, Kroenke K, Williams JBW, Löwe B. A Brief Measure for Assessing Generalized Anxiety Disorder: The GAD-7. Archives of Internal Medicine, 2006;166(10): 1092. https://doi.org/10.1001/archinte.166.10.1092
Song B, Martínez-Aranda LM, Leiva-Arcas A, Sánchez-Pato A. The evolution of Chinese high-performance student-athletes’ admission, cultivation and management policies. International Journal of Sport Policy and Politics, 2024;16(1): 151–175. https://doi.org/10.1080/19406940.2023.2273350
Bhatia S. Exploring variability in risk taking with large language models. Journal of Experimental Psychology: General, 2024;153(7): 1838–1860. https://doi.org/10.1037/xge0001607
Li B, Zhou H, He J, Wang M, Yang Y, Li L. On the Sentence Embeddings from Pre-trained Language Models. 2020. https://doi.org/10.48550/ARXIV.2011.05864
Khosa S, Mehmood A, Rizwan M. Unifying Sentence Transformer Embedding and Softmax Voting Ensemble for Accurate News Category Prediction. Computers, 2023;12(7): 137. https://doi.org/10.3390/computers12070137
Salloum S, Alhumaid K, Salloum A, Shaalan K. K-means Clustering of Tweet Emotions: A 2D PCA Visualization Approach. Procedia Computer Science, 2024;244: 30–36. https://doi.org/10.1016/j.procs.2024.10.175
Rousseeuw PJ. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 1987;20: 53–65. https://doi.org/10.1016/0377-0427(87)90125-7
Cumming G, Finch S. Inference by Eye: Confidence Intervals and How to Read Pictures of Data. American Psychologist, 2005;60(2): 170–180. https://doi.org/10.1037/0003-066X.60.2.170
Holm S. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics. 1979;6(2):65-70.
Cohen J. Statistical Power Analysis for the Behavioral Sciences.. 0 edn Routledge; 2013. https://doi.org/10.4324/9780203771587
Viechtbauer W. Conducting Meta-Analyses in R with the metafor Package. Journal of Statistical Software, 2010;36(3). https://doi.org/10.18637/jss.v036.i03
McNally RJ. Can network analysis transform psychopathology? Behaviour Research and Therapy, 2016;86: 95–104. https://doi.org/10.1016/j.brat.2016.06.006
Tareaf RB, AbuJarour M, Engelman T, Liermann P, Klotz J. Accelerating Contextualization in AI Large Language Models Using Vector Databases. In: 2024 International Conference on Information Networking (ICOIN), Ho Chi Minh City, Vietnam: IEEE; 2024. P. 316–321. https://doi.org/10.1109/ICOIN59985.2024.
Miranda O, Kiehl SM, Qi X, Brannock MD, Kosten T, Ryan ND, et al. Enhancing post-traumatic stress disorder patient assessment: leveraging natural language processing for research of domain criteria identification using electronic medical records. BMC Medical Informatics and Decision Making, 2024;24(1): 154. https://doi.org/10.1186/s12911-024-02554-8
Geiger S, Jahre LM, Aufderlandwehr J, Krakowczyk JB, Esser AJ, Mühlbauer T, et al. Mental health symptoms in German elite athletes: a network analysis. Frontiers in Psychology, 2023;14: 1243804. https://doi.org/10.3389/fpsyg.2023.1243804
Roitblat Y, Cleminson R, Kavin A, Schonberger E, Shterenshis M. Assessment of anxiety in adolescents involved in a study abroad program: a prospective study. International Journal of Adolescent Medicine and Health, 2020;32(2): 20170101. https://doi.org/10.1515/ijamh-2017-0101
Minutillo S, Cleary M, P. Hills A, Visentin D. Mental Health Considerations for International Students. Issues in Mental Health Nursing, 2020;41(6): 494–499. https://doi.org/10.1080/01612840.2020.1716123
Ansari Lari S, Zumot MS, Fredericks S. Navigating mental health challenges in international university students: adapting to life transitions. Frontiers in Psychiatry, 2025;16: 1574953. https://doi.org/10.3389/fpsyt.2025.1574953
Liw L, Ciftci A, Kim T. Cultural values, shame and guilt, and expressive suppression as predictors of depression. International Journal of Intercultural Relations, 2022;89: 90–99. https://doi.org/10.1016/j.ijintrel.2022.05.005
Akiba D, Perrone M, Almendral C. Study Abroad Angst: A Literature Review on the Mental Health of International Students During COVID-19. International Journal of Environmental Research and Public Health, 2024;21(12): 1562. https://doi.org/10.3390/ijerph21121562
Wolanin A, Gross M, Hong E. Depression in Athletes: Prevalence and Risk Factors. Current Sports Medicine Reports, 2015;14(1): 56–60. https://doi.org/10.1249/JSR.0000000000000123
Wilhelm C, Steckelberg A, Rebitschek FG. Benefits and harms associated with the use of AI-related algorithmic decision-making systems by healthcare professionals: a systematic review. The Lancet Regional Health - Europe, 2025;48: 101145. https://doi.org/10.1016/j.lanepe.2024.101145
Balcombe L, De Leo D. Psychological Screening and Tracking of Athletes and Digital Mental Health Solutions in a Hybrid Model of Care: Mini Review. JMIR Formative Research, 2020;4(12): e22755. https://doi.org/10.2196/22755
Duffy A. University Student Mental Health: An Important Window of Opportunity for Prevention and Early Intervention. The Canadian Journal of Psychiatry, 2023;68(7): 495–498. https://doi.org/10.1177/07067437231183747
Downloads
Published
How to Cite
Issue
License
Copyright (c) 2025 Shuoyu Jing, Mohd Mahzan Awang, Wan Ahmad Munsif Wan Pa

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright Holder - Author(s). more
Abstract views: 427 / PDF downloads: 413


