A review of sports data analytics field: A bibliometric and network analysis of the articles published from 1997 to 2020

Document Type : Research Paper


1 School of Industrial engineering, Iran University of Science and Technology, Tehran, Iran

2 School of Architecture and Environmental Design, Iran University of Science and Technology, Tehran, Iran


Data analysis in competitive sports has increased significantly in recent years, and a significant number of studies have been done during the last decades. In the sports Data analysis field, bibliometric analysis and maps have not yet been used to analyze the production and visualize evolution and trends. Therefore, the primary purpose of this article is to review data science analysis in sports activities with network embedding-based visualization on a large-scale dataset.805 articles were published between 1997 and 2020 and written by 3141 different authors from 1181 institutions, and 60 different countries were extracted from WOS by using R, Cite Space, and VOS viewer. Articles, journals, authors, countries, and universities that have played a significant role in developing this field are identified. Following that, meaningful knowledge of the communication networks among articles, authors, and keywords are illustrated by scientific paper mining. Moreover, articles have been divided into six groups based on the subject and methodology, which provide a comprehensive sight for researchers in this emerging field of sports.


Main Subjects

Agopyan, A. (2020). An analysis of movements with or without back bend of the trunk or large hip extension in 1st Juniors’ Rhythmic Gymnastics World Championship-2019. Is there injury risk for gymnasts? International Journal of Performance Analysis in Sport, 1-18. doi:10.1080/24748668.2020.1850038
Akarçeşme, C. J. C. E. J. o. S. S., & Medicine. (2017). Is it possible to estimate match result in volleyball: A new prediction model. 19(3), 5-17.
Arabzad, S. M., Tayebi Araghi, M., Sadi-Nezhad, S., & Ghofrani, N. J. J. o. A. R. o. I. E. (2014). Football match results prediction using artificial neural networks; the case of Iran Pro League. 1(3), 159-179.
Araghi Niknam, F., Ghousi, R., Masoumi, A., Atashi, A., & Makui, A. (2021). Hybrid Medical Data Mining Model for Identifying Tumor Severity in Breast Cancer Diagnosis. Advances in Industrial Engineering, 55(2), 151-164. doi:10.22059/jieng.2021.326775.1789
Bartolini, M., Bottani, E., & Eric, H. J. J. o. C. P. (2019). Green warehousing: Systematic literature review and bibliometric analysis.
Bonidia, R. P., Brancher, J. D., & Busto, R. M. J. I. L. A. T. (2018). Data mining in sports: A systematic review. 16(1), 232-239.
Broglio, S. P., Cantu, R. C., Gioia, G. A., Guskiewicz, K. M., Kutcher, J., Palm, M., & McLeod, T. C. V. (2014). National Athletic Trainers' Association position statement: management of sport concussion. Journal of athletic training, 49(2), 245-265.
Cao, C. (2012). Sports data mining technology used in basketball outcome prediction.
Côté, J. (1999). The influence of the family in the development of talent in sport. The sport psychologist, 13(4), 395-417.
Davoodi, E., Khanteymoori, A. R. J. R. A. i. N. N., Fuzzy Systems, & Computing, E. (2010). Horse racing prediction using artificial neural networks. 2010, 155-160.
Dolak, K. L., Silkman, C., McKeon, J. M., Hosey, R. G., Lattermann, C., & Uhl, T. L. (2011). Hip strengthening prior to functional exercises reduces pain sooner than quadriceps strengthening in females with patellofemoral pain syndrome: a randomized clinical trial. journal of orthopaedic & sports physical therapy, 41(8), 560-570.
Dzikowski, P. J. J. o. B. R. (2018). A bibliometric analysis of born global firms. 85, 281-294.
Ekstrand, J., Hägglund, M., & Waldén, M. (2011). Epidemiology of muscle injuries in professional football (soccer). The American journal of sports medicine, 39(6), 1226-1232.
Ekstrand, J., Healy, J. C., Waldén, M., Lee, J. C., English, B., & Hägglund, M. (2012). Hamstring muscle injuries in professional football: the correlation of MRI findings with return to play. Br J Sports Med, 46(2), 112-117.
Fabregat-Aibar, L., Barberà-Mariné, M. G., Terceño, A., & Pié, L. J. S. (2019). A bibliometric and visualization analysis of socially responsible funds. 11(9), 2526.
Gao, C., Sun, M., Geng, Y., Wu, R., & Chen, W. J. A. e. (2016). A bibliometric analysis based review on wind power price. 182, 602-612.
Geng, S., Wang, Y., Zuo, J., Zhou, Z., Du, H., Mao, G. J. R., & Reviews, S. E. (2017). Building life cycle assessment research: A review by bibliometric analysis. 76, 176-184.
Gould, D., Dieffenbach, K., & Moffett, A. (2002). Psychological characteristics and their development in Olympic champions. Journal of applied sport psychology, 14(3), 172-204.
Guskiewicz, K. M., McCrea, M., Marshall, S. W., Cantu, R. C., Randolph, C., Barr, W., . . . Kelly, J. P. (2003). Cumulative effects associated with recurrent concussion in collegiate football players: the NCAA Concussion Study. Jama, 290(19), 2549-2555.
Haghighat, M., Rastegari, H., & Nourafza, N. J. A. i. C. S. a. I. J. (2013). A review of data mining techniques for result prediction in sports. 2(5), 7-12.
Hewett, T. E., Myer, G. D., Ford, K. R., Heidt Jr, R. S., Colosimo, A. J., McLean, S. G., . . . Succop, P. (2005). Biomechanical measures of neuromuscular control and valgus loading of the knee predict anterior cruciate ligament injury risk in female athletes: a prospective study. The American journal of sports medicine, 33(4), 492-501.
Hirsch, J. E. J. P. o. t. N. a. o. S. (2005). An index to quantify an individual's scientific research output. 102(46), 16569-16572.
Hjørland, B. J. I. P., & Management. (2013). Citation analysis: A social and dynamic approach to knowledge organization. 49(6), 1313-1325.
Holt, N. L., & Dunn, J. G. J. J. o. a. s. p. (2004). Toward a grounded theory of the psychosocial competencies and environmental conditions associated with soccer success. 16(3), 199-219.
Ivanković, Z., Racković, M., Markoski, B., Radosav, D., & Ivković, M. (2010). Analysis of basketball games using neural networks. Paper presented at the 2010 11th International Symposium on Computational Intelligence and Informatics (CINTI).
Jiwai, H., & Kamber, P. (2012). Data Mining concepts and techniques third edition: Morgan Kaufmann publishers.
Kahn, J. J. W. W. W. e. p. (2003). Neural network prediction of NFL football games. 9-15.
Kerr, Z. Y., Register-Mihalik, J. K., Kroshus, E., Baugh, C. M., & Marshall, S. W. J. T. A. j. o. s. m. (2016). Motivations associated with nondisclosure of self-reported concussions in former collegiate athletes. 44(1), 220-225.
Kleinberg, J. (2003). Bursty and hierarchical structure in streams. Data mining and knowledge discovery, 7(4), 373-397.
Kos, A., Wei, Y., Tomažič, S., & Umek, A. J. P. C. S. (2018). The role of science and technology in sport. 129, 489-495.
Lefevre, N., Klouche, S., Mirouse, G., Herman, S., Gerometta, A., & Bohu, Y. J. T. A. j. o. s. m. (2017). Return to sport after primary and revision anterior cruciate ligament reconstruction: a prospective comparative study of 552 patients from the FAST cohort. 45(1), 34-41.
Lemyre, F., Trudel, P., & Durand-Bush, N. J. T. s. p. (2007). How youth-sport coaches learn to coach. 21(2), 191-209.
Marar, M., McIlvain, N. M., Fields, S. K., & Comstock, R. D. (2012). Epidemiology of concussions among United States high school athletes in 20 sports. The American journal of sports medicine, 40(4), 747-755.
Masoumi, A., Ghousi, R., Vazifehdoost, M., & Araghi Niknam, F. (2021). A quantitative scoring system to compare the degree of COVID-19 infection in patients’ lungs during the three peaks of the pandemic in Iran. Journal of Industrial and Systems Engineering, 13(3), 61-69.
McCabe, A., & Trevathan, J. (2008). Artificial intelligence in sports prediction. Paper presented at the Fifth International Conference on Information Technology: New Generations (itng 2008).
McCulloh, I., Armstrong, H., & Johnson, A. (2013). Social network analysis with applications: John Wiley & Sons.
Morris, S., DeYong, C., Wu, Z., Salman, S., Yemenu, D. J. C., & engineering, i. (2002). DIVA: a visualization system for exploring document databases for technology forecasting. 43(4), 841-862.
Niu, B., Loaiciga, H. A., Wang, Z., Zhan, F. B., & Hong, S. J. J. o. H. (2014). Twenty years of global groundwater research: A Science Citation Index Expanded-based bibliometric survey (1993–2012). 519, 966-975.
Norton, M. (2000). Introductory concepts in information science: Information Today, Inc.
Paterno, M. V., Schmitt, L. C., Ford, K. R., Rauh, M. J., Myer, G. D., Huang, B., & Hewett, T. E. (2010). Biomechanical measures during landing and postural stability predict second anterior cruciate ligament injury after anterior cruciate ligament reconstruction and return to sport. The American journal of sports medicine, 38(10), 1968-1978.
Philippaerts, R. M., Vaeyens, R., Janssens, M., Van Renterghem, B., Matthys, D., Craen, R., . . . Malina, R. M. (2006). The relationship between peak height velocity and physical performance in youth soccer players. Journal of sports sciences, 24(3), 221-230.
Reilly, T., Williams, A. M., Nevill, A., & Franks, A. (2000). A multidisciplinary approach to talent identification in soccer. Journal of sports sciences, 18(9), 695-702.
Saavedra, J. M., Pic, M., Lozano, D., Tella, V., & Madera, J. (2020). The predictive power of game-related statistics for the final result under the rule changes introduced in the men’s world water polo championship: a classification-tree approach. International Journal of Performance Analysis in Sport, 20(1), 31-41.
Schumaker, R. P., Solieman, O. K., & Chen, H. (2010a). Sports data mining (Vol. 26): Springer Science & Business Media.
Schumaker, R. P., Solieman, O. K., & Chen, H. J. A. (2010b). Sports knowledge management and data mining. 44(1), 115-157.
Shi, Z., Moorthy, S., & Zimmermann, A. (2013). Predicting NCAAB match outcomes using ML techniques–some results and lessons learned. Paper presented at the ECML/PKDD 2013 Workshop on Machine Learning and Data Mining for Sports Analytics.
Small, H. J. J. o. t. A. S. f. i. S. (1973). Co‐citation in the scientific literature: A new measure of the relationship between two documents. 24(4), 265-269.
Smith, B., & McGannon, K. R. (2017). Developing rigor in qualitative research: problems and opportunities within sport and exercise psychology. International Review of Sport and Exercise Psychology, 11(1), 101-121. doi:10.1080/1750984x.2017.1317357
Sujatha, K., Godhavari, T., Bhavani, N. P. J. I. J. o. M., & Methods, C. (2018). Football match statistics prediction using artificial neural networks. 3.
Synnestvedt, M. B., Chen, C., & Holmes, J. H. (2005). CiteSpace II: visualization and knowledge discovery in bibliographic databases. Paper presented at the AMIA Annual Symposium Proceedings.
Tax, N., Joustra, Y. J. T. o. k., & engineering, d. (2015). Predicting the Dutch football competition using public data: A machine learning approach. 10(10), 1-13.
Thabtah, F., Zhang, L., & Abdelhamid, N. (2019). NBA Game Result Prediction Using Feature Analysis and Machine Learning. Annals of Data Science, 6(1), 103-116. doi:10.1007/s40745-018-00189-x
Tümer, A. E., & Koçer, S. (2017). Prediction of team league’s rankings in volleyball by artificial neural network method. International Journal of Performance Analysis in Sport, 17(3), 202-211.
Vaeyens, R., Lenoir, M., Williams, A. M., & Philippaerts, R. M. (2008). Talent identification and development programmes in sport. Sports medicine, 38(9), 703-714.
Vaeyens, R., Malina, R. M., Janssens, M., Van Renterghem, B., Bourgois, J., Vrijens, J., & Philippaerts, R. M. (2006). A multidisciplinary selection model for youth soccer: the Ghent Youth Soccer Project. British journal of sports medicine, 40(11), 928-934.
Vishwakarma, P., & Mukherjee, S. J. T. R. R. (2019). Forty-three years journey of Tourism Recreation Research: a bibliometric analysis. 44(4), 403-418.
Warmenhoven, J., Cobley, S., Draper, C., Harrison, A., Bargary, N., Smith, R. J. J. o. s., & sport, m. i. (2018). How gender and boat-side affect shape characteristics of force–angle profiles in single sculling: Insights from functional data analysis. 21(5), 533-537.
Warmenhoven, J., Cobley, S., Draper, C., Harrison, A., Bargary, N., & Smith, R. J. S. b. (2019). Bivariate functional principal components analysis: considerations for use with multivariate movement signatures in sports biomechanics. 18(1), 10-27.
Warmenhoven, J., Cobley, S., Draper, C., Harrison, A., Bargary, N., Smith, R. J. S. j. o. m., & sports, s. i. (2017). Assessment of propulsive pin force and oar angle time‐series using functional data analysis in on‐water rowing. 27(12), 1688-1696.
Warmenhoven, J., Harrison, A., Robinson, M. A., Vanrenterghem, J., Bargary, N., Smith, R., . . . sport, m. i. (2018). A force profile analysis comparison between functional data analysis, statistical parametric mapping and statistical non-parametric mapping in on-water single sculling. 21(10), 1100-1105.
Warmenhoven, J., Smith, R., Draper, C., Harrison, A., Bargary, N., Cobley, S. J. S. j. o. m., & sports, s. i. (2018). Force coordination strategies in on‐water single sculling: Are asymmetries related to better rowing performance? , 28(4), 1379-1388.
Werthner, P., & Trudel, P. J. T. s. p. (2006). A new theoretical perspective for understanding how coaches learn to coach. 20(2), 198-212.
Williams, A. M., & Reilly, T. (2000). Talent identification and development in soccer. Journal of sports sciences, 18(9), 657-667.
Wiseman, O. (2016). Using Machine Learning to Predict the Winning Score of Professional Golf Events on the PGA Tour. Dublin, National College of Ireland.  
Wolfenden, L. E., & Holt, N. L. (2005). Talent development in elite junior tennis: Perceptions of players, parents, and coaches. Journal of applied sport psychology, 17(2), 108-126.
Wright, T., Trudel, P., & Culver, D. (2007). Learning how to coach: the different learning situations reported by youth ice hockey coaches. Physical education and sport pedagogy, 12(2), 127-144.
Wuerth, S., Lee, M. J., & Alfermann, D. (2004). Parental involvement and athletes’ career in youth sport. Psychology of sport and Exercise, 5(1), 21-33.
Zdravevski, E., & Kulakov, A. (2009). System for Prediction of the Winner in a Sports Game. Paper presented at the International Conference on ICT Innovations.
Zupic, I., & Čater, T. J. O. R. M. (2015). Bibliometric methods in management and organization. 18(3), 429-472.