Clustering anxiety and depression among international student athletes during study abroad using transformer-based embeddings

Authors

DOI:

https://doi.org/10.15561/20755279.2025.0506

Keywords:

anxiety, Depression, machine learning clustering, transformer embeddings, student-athlete

Abstract

Background and Study Aim. Student-athletes studying abroad experience increased risks of anxiety and depression as they balance academic and athletic responsibilities in cross-cultural environments. These psychological challenges may vary depending on individual adaptation, motivation, and environmental stressors. Although previous studies have applied network analysis to explore the structure of anxiety and depression symptoms, the relative effectiveness of advanced semantic approaches in identifying subgroup heterogeneity remains of practical interest. As a follow-up study, this research aimed to apply a transformer-based semantic embedding approach to cluster the mental health profiles of international student-athletes and to compare the model’s performance with traditional clustering methods. Materials and Methods. Data were collected from 219 Chinese international student-athletes who completed the GAD-7 and PHQ-9 questionnaires. Three models were compared: (1) K-means clustering on raw item scores, (2) K-means clustering after dimensionality reduction using Principal Component Analysis (PCA), and (3) K-means clustering on pseudo-text representations embedded via a transformer model, followed by PCA and K-means. Internal validity was assessed with silhouette scores. Between-cluster differences were analyzed using t-tests with Holm correction, effect sizes (Cohen’s d), and cluster profiles. Results. Model 3 (Transformer embeddings + PCA + K-means) outperformed Models 1 and 2, achieving the highest silhouette score (0.391). Visualization in 2D and 3D projections confirmed clearer separation. Three clusters were identified: Cluster 2 (high symptoms), Cluster 0 (intermediate), and Cluster 1 (low symptoms). Pairwise comparisons revealed significant differences across nearly all items. Conclusions. Transformer-based semantic embeddings provide an effective approach to clustering psychological symptoms, outperforming traditional numerical methods. The results indicate the heterogeneity of anxiety and depression subgroups among student-athletes during study abroad, offering valuable insights for targeted screening, early identification, and long-term monitoring.

Author Biographies

Shuoyu Jing, Universiti Kebangsaan Malaysia

p119216@siswa.ukm.edu.my; Faculty of Education; Bangi, Malaysia.

Mohd Mahzan Awang, Universiti Kebangsaan Malaysia

mahzan@ukm.edu.my; Faculty of Education; Bangi, Malaysia.

Wan Ahmad Munsif Wan Pa, Universiti Kebangsaan Malaysia

munsif@ukm.edu.my; Faculty of Education; Bangi, Malaysia.

References

Beiter R, Nash R, McCrady M, Rhoades D, Linscomb M, Clarahan M, et al. The prevalence and correlates of depression, anxiety, and stress in a sample of college students. Journal of Affective Disorders, 2015;173: 90–96. https://doi.org/10.1016/j.jad.2014.10.054

Beisecker L, Harrison P, Josephson M, DeFreese JD. Depression, anxiety and stress among female student-athletes: a systematic review and meta-analysis. British Journal of Sports Medicine, 2024;58(5): 278–285. https://doi.org/10.1136/bjsports-2023-107328

Armstrong SN, Burcin MM, Bjerke W, Early J. Depression in student athletes: A particularly at-risk group? A systematic review of the literature. Athletic Insight. 2015;7(2):177-193.

Jing S, Wan Pa WAM, Awang MM. Anxiety and depression among Chinese international student-athletes during study abroad: a psychological network approach. Physical Education of Students, 2025;29(1): 27–38. https://doi.org/10.15561/20755279.2025.0103

Lopes Dos Santos M, Uftring M, Stahl CA, Lockie RG, Alvar B, Mann JB, et al. Stress in Academic and Athletic Performance in Collegiate Athletes: A Narrative Review of Sources and Monitoring Strategies. Frontiers in Sports and Active Living, 2020;2: 42. https://doi.org/10.3389/fspor.2020.00042

Sawir E, Marginson S, Deumert A, Nyland C, Ramia G. Loneliness and International Students: An Australian Study. Journal of Studies in International Education, 2008;12(2): 148–180. https://doi.org/10.1177/1028315307299699

Newell EM. International Student–Athlete Adjustment Issues: Advising Recommendations for Effective Transitions. NACADA Journal, 2015;35(2): 36–47. https://doi.org/10.12930/NACADA-14-015

Misirlis N, Zwaan MH, Weber D. International students’ loneliness, depression and stress levels in COVID-19 crisis. The role of social media and the host university. 2020; https://doi.org/10.48550/ARXIV.2005.12806

Brown L. Language and Anxiety: An Ethnographic Study of International Postgraduate Students. Evaluation & Research in Education, 2008;21(2): 75–95. https://doi.org/10.1080/09500790802152167

Tan Y, Wu Z, Qu X, Liu Y, Peng L, Ge Y, et al. Influencing Factors of International Students’ Anxiety Under Online Learning During the COVID-19 Pandemic: A Cross-Sectional Study of 1,090 Chinese International Students. Frontiers in Psychology, 2022;13: 860289. https://doi.org/10.3389/fpsyg.2022.860289

Rossi A, Pappalardo L, Cintia P. A Narrative Review for a Machine Learning Application in Sports: An Example Based on Injury Forecasting in Soccer. Sports, 2021;10(1): 5. https://doi.org/10.3390/sports10010005

Lu L. Big data analysis of mental health intervention effects in student-athletes: based on data mining techniques and affective computing algorithms. Multimedia Tools and Applications, 2024;84(17): 18547–18565. https://doi.org/10.1007/s11042-024-19786-5

Zhao Z, Wang J. Exploring the Potential of Large Language Model in Predictive Mental Health Diagnosis of Athletes. Advances in Education, Humanities and Social Science Research, 2024;12(1): 342. https://doi.org/10.56028/aehssr.12.1.342.2024

Yun HJ, Jang N, Jeon M. Deep learning-based tennis match type clustering. BMC Sports Science, Medicine and Rehabilitation, 2025;17(1): 104. https://doi.org/10.1186/s13102-025-01147-w

Niu H, Omitaomu OA, Langston MA, Olama M, Ozmen O, Klasky HB, et al. EHR-BERT: A BERT-based model for effective anomaly detection in electronic health records. Journal of Biomedical Informatics, 2024;150: 104605. https://doi.org/10.1016/j.jbi.2024.104605

De Boeck P, Wilson M, Acton GS. A Conceptual and Psychometric Framework for Distinguishing Categories and Dimensions. Psychological Review, 2005;112(1): 129–158. https://doi.org/10.1037/0033-295X.112.1.129

Delgadillo J, Ali S, Fleck K, Agnew C, Southgate A, Parkhouse L, et al. Stratified Care vs Stepped Care for Depression: A Cluster Randomized Clinical Trial. JAMA Psychiatry, 2022;79(2): 101. https://doi.org/10.1001/jamapsychiatry.2021.3539

Bower P, Gilbody S. Stepped care in psychological therapies: access, effectiveness and efficiency: Narrative literature review. British Journal of Psychiatry, 2005;186(1): 11–17. https://doi.org/10.1192/bjp.186.1.11

Van Straten A, Hill J, Richards DA, Cuijpers P. Stepped care treatment delivery for depression: a systematic review and meta-analysis. Psychological Medicine, 2015;45(2): 231–246. https://doi.org/10.1017/S0033291714000701

Kaiser J, Walter N, Oppitz L, Braun B, Schmitz J. 2Steps4Health project: Promotion of mental health in junior professional sports: Study protocol. Sports Psychiatry, 2024;3(1): 23–30. https://doi.org/10.1024/2674-0052/a000066

Scodari BT, Chacko S, Matsumura R, Jacobson NC. Using machine learning to forecast symptom changes among subclinical depression patients receiving stepped care or usual care. Journal of Affective Disorders, 2023;340: 213–220. https://doi.org/10.1016/j.jad.2023.08.004

Kodinariya TM, Makwana PR. Review on Determining Number of Cluster in K-Means Clustering. Int J Adv Res Comput Sci Manag Stud. 2013;1(6):90-95.

Ding C, He X. K -means clustering via principal component analysis. In: Twenty-first international conference on Machine learning - ICML ’04, Banff, Alberta, Canada: ACM Press; 2004. p. 29. https://doi.org/10.1145/1015330.1015408 [Accessed 8th October 2025].

Wu W, Wang W, Jia X, Feng X. Transformer Autoencoder for K-means Efficient clustering. Engineering Applications of Artificial Intelligence, 2024;133: 108612. https://doi.org/10.1016/j.engappai.2024.108612

Holmes B, Raymer M, Banerjee T. Extraction of patients subpopulations with psychiatric symptoms using a transformer architecture. In: 2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Orlando, FL, USA: IEEE; 2024. P. 1–4. https://doi.org/10.1109/EMBC53108.2024.10781648

Torres-Luque G, Ramirez A, Cabello-Manrique D, Nikolaidis TP, Alvero-Cruz JR. Match analysis of elite players during paddle tennis competition. International Journal of Performance Analysis in Sport, 2015;15(3): 1135–1144. https://doi.org/10.1080/24748668.2015.11868857

Gotthardt MU. Clustering Large-Scale 3D Football Player Skeleton Data: Investigating Differences in Player Pose Distributions and their Correspondence to Tracking Performance Levels [Master’s thesis]. Stockholm (Sweden): KTH, School of Electrical Engineering and Computer Science; 2024.

Kroenke K, Spitzer RL, Williams JBW. The PHQ-9: Validity of a brief depression severity measure. Journal of General Internal Medicine, 2001;16(9): 606–613. https://doi.org/10.1046/j.1525-1497.2001.016009606.x

Spitzer RL, Kroenke K, Williams JBW, Löwe B. A Brief Measure for Assessing Generalized Anxiety Disorder: The GAD-7. Archives of Internal Medicine, 2006;166(10): 1092. https://doi.org/10.1001/archinte.166.10.1092

Song B, Martínez-Aranda LM, Leiva-Arcas A, Sánchez-Pato A. The evolution of Chinese high-performance student-athletes’ admission, cultivation and management policies. International Journal of Sport Policy and Politics, 2024;16(1): 151–175. https://doi.org/10.1080/19406940.2023.2273350

Bhatia S. Exploring variability in risk taking with large language models. Journal of Experimental Psychology: General, 2024;153(7): 1838–1860. https://doi.org/10.1037/xge0001607

Li B, Zhou H, He J, Wang M, Yang Y, Li L. On the Sentence Embeddings from Pre-trained Language Models. 2020. https://doi.org/10.48550/ARXIV.2011.05864

Khosa S, Mehmood A, Rizwan M. Unifying Sentence Transformer Embedding and Softmax Voting Ensemble for Accurate News Category Prediction. Computers, 2023;12(7): 137. https://doi.org/10.3390/computers12070137

Salloum S, Alhumaid K, Salloum A, Shaalan K. K-means Clustering of Tweet Emotions: A 2D PCA Visualization Approach. Procedia Computer Science, 2024;244: 30–36. https://doi.org/10.1016/j.procs.2024.10.175

Rousseeuw PJ. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 1987;20: 53–65. https://doi.org/10.1016/0377-0427(87)90125-7

Cumming G, Finch S. Inference by Eye: Confidence Intervals and How to Read Pictures of Data. American Psychologist, 2005;60(2): 170–180. https://doi.org/10.1037/0003-066X.60.2.170

Holm S. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics. 1979;6(2):65-70.

Cohen J. Statistical Power Analysis for the Behavioral Sciences.. 0 edn Routledge; 2013. https://doi.org/10.4324/9780203771587

Viechtbauer W. Conducting Meta-Analyses in R with the metafor Package. Journal of Statistical Software, 2010;36(3). https://doi.org/10.18637/jss.v036.i03

McNally RJ. Can network analysis transform psychopathology? Behaviour Research and Therapy, 2016;86: 95–104. https://doi.org/10.1016/j.brat.2016.06.006

Tareaf RB, AbuJarour M, Engelman T, Liermann P, Klotz J. Accelerating Contextualization in AI Large Language Models Using Vector Databases. In: 2024 International Conference on Information Networking (ICOIN), Ho Chi Minh City, Vietnam: IEEE; 2024. P. 316–321. https://doi.org/10.1109/ICOIN59985.2024.

Miranda O, Kiehl SM, Qi X, Brannock MD, Kosten T, Ryan ND, et al. Enhancing post-traumatic stress disorder patient assessment: leveraging natural language processing for research of domain criteria identification using electronic medical records. BMC Medical Informatics and Decision Making, 2024;24(1): 154. https://doi.org/10.1186/s12911-024-02554-8

Geiger S, Jahre LM, Aufderlandwehr J, Krakowczyk JB, Esser AJ, Mühlbauer T, et al. Mental health symptoms in German elite athletes: a network analysis. Frontiers in Psychology, 2023;14: 1243804. https://doi.org/10.3389/fpsyg.2023.1243804

Roitblat Y, Cleminson R, Kavin A, Schonberger E, Shterenshis M. Assessment of anxiety in adolescents involved in a study abroad program: a prospective study. International Journal of Adolescent Medicine and Health, 2020;32(2): 20170101. https://doi.org/10.1515/ijamh-2017-0101

Minutillo S, Cleary M, P. Hills A, Visentin D. Mental Health Considerations for International Students. Issues in Mental Health Nursing, 2020;41(6): 494–499. https://doi.org/10.1080/01612840.2020.1716123

Ansari Lari S, Zumot MS, Fredericks S. Navigating mental health challenges in international university students: adapting to life transitions. Frontiers in Psychiatry, 2025;16: 1574953. https://doi.org/10.3389/fpsyt.2025.1574953

Liw L, Ciftci A, Kim T. Cultural values, shame and guilt, and expressive suppression as predictors of depression. International Journal of Intercultural Relations, 2022;89: 90–99. https://doi.org/10.1016/j.ijintrel.2022.05.005

Akiba D, Perrone M, Almendral C. Study Abroad Angst: A Literature Review on the Mental Health of International Students During COVID-19. International Journal of Environmental Research and Public Health, 2024;21(12): 1562. https://doi.org/10.3390/ijerph21121562

Wolanin A, Gross M, Hong E. Depression in Athletes: Prevalence and Risk Factors. Current Sports Medicine Reports, 2015;14(1): 56–60. https://doi.org/10.1249/JSR.0000000000000123

Wilhelm C, Steckelberg A, Rebitschek FG. Benefits and harms associated with the use of AI-related algorithmic decision-making systems by healthcare professionals: a systematic review. The Lancet Regional Health - Europe, 2025;48: 101145. https://doi.org/10.1016/j.lanepe.2024.101145

Balcombe L, De Leo D. Psychological Screening and Tracking of Athletes and Digital Mental Health Solutions in a Hybrid Model of Care: Mini Review. JMIR Formative Research, 2020;4(12): e22755. https://doi.org/10.2196/22755

Duffy A. University Student Mental Health: An Important Window of Opportunity for Prevention and Early Intervention. The Canadian Journal of Psychiatry, 2023;68(7): 495–498. https://doi.org/10.1177/07067437231183747

Downloads

Published

2025-10-30

How to Cite

1.
Jing S, Awang MM, Wan Pa WAM. Clustering anxiety and depression among international student athletes during study abroad using transformer-based embeddings. Physical Education of Students. 2025;29(5):383-94. https://doi.org/10.15561/20755279.2025.0506
Statistics

Abstract views: 427 / PDF downloads: 413