Written with Christopher Brodgen and Damian Harper
This post extends a previous one (read here) in which we’ve discussed the risks of concluding that improving one variable (eg hamstring strength) results in improving another one (eg sprint performance) based on the sole observation of parallel average changes within a small sample of athletes.
Thanks to the constructive collaboration of all the co-authors of a recent paper discussing the effects of a hamstring strength intervention on eccentric hamstring strength and sprint performance, (Please Note: the authors of this paper intended to determine if performing Nordics on a regular basis, could be a time-efficient way to tackle both injury risk reduction and performance enhancement.— This post will extend the discussion on the data visualization part. However, this post is not about the narrow “hamstring strength versus sprint performance” focus, it is about a broader topic: how do we/should we display-report the results of studies in which parallel changes in two variables are investigated. Our focus here is about research question and studies design like “does improving A with training result in improving B”. It is about how the association between observed effects of a training intervention on distinct variables is presented and discussed in our own studies and the studies we read and review. And about how the current standards are clearly misleading and why/how they should be improved.
To be clear, my point is not to criticize Siddle et al.’s specific study, cast doubts on their conclusions, or fuel the “hamstring strength versus sprint performance” debate, but the thought process that led to this blog post started when reading this paper and I think it will provide a good example. I’m very happy that Chris (@broc03), Damian (@DHMov) and all co-authors contributed to this post by essentially disclosing their subjects “code” so that parallel data could be correctly linked and the “actual scenario” discussed (see Step 2 below).
Here is the link to their paper:
Briefly, in this study, 7 amateur soccer or rugby players had their isokinetic knee flexor’s eccentric torque (KFT) output and 10-m sprint time measured pre and post-training. The training intervention was basically a 6-wk / 2 sessions per week / 3 sets per session (6 to 10 reps per set) of Nordic hamstring exercise. Other points such as detraining effects are discussed within the study, but I will only focus on the pre-post training results of the intervention group to further develop my points on data presentation.
Step 1: friends don’t let friends use bar charts
As brilliantly discussed in a series of papers (most of them with open access) by Tracey Weissgerber, the issue with “barbarplots” is now well acknowledged in the Sports Science community (well, should be…), and we see increasing numbers of studies displaying individual data in addition to bar plots or equivalent “average +/- SD” visuals. Although many papers only present average values in tables, more and more works display individual data plots, which gives access to one more layer of the actual story. As shown below, similar bar graphs can easily mask very different stories within paired-data designs (typical pre-post training protocols), especially in the sports science context with small sample sizes.
In Siddle et al.’s study, the authors correctly avoided this first trap, and showed not only group-average values pre and post with their SD error bars, but also individual trends. From the data published, reproduced using this very useful online resource (https://automeris.io/WebPlotDigitizer/), these are what the pre-post bar plots + individual trends look like. Note: I’ve plotted the results as bars+SDs and individual lines, where Siddle et al. used an average point + SD bars.
So, this is a good point in their study, we can observe that ALL subjects followed the average trend: KFT improved for each of them, and in parallel, sprint time decreased for each of them. One key thought here, what does our brain intuitively see in front of a bar graph: every single subject follows the average trend. However, this might not always be the case (average skewed by outliers etc…) so a very important first step is to confirm the average trend with individual trends. In my Editor and reviewer role (in addition to my author and co-author activity) I do not accept a paper until these individual trends/values are displayed. I think 95% of the time, authors don’t display individual data because they (we) are too lazy and/or do not use appropriate software. Have you ever tried to plot 40 pre-post lines with Excel? You see what I mean. But laziness is not a good excuse. The other 5% (I hope not more) are I guess the “dirty little secrets” in which individual data don’t confirm the “average big picture”.
A way to better detail this individual-scale presentation of data is to report the number of positive-neutral-negative changes in addition to the mean and SDs. Beyond the average trend, how many subjects actually responded to the intervention positively, negatively and how many did not respond. In sport sciences, a threshold of 0.2 times the overall pooled (pre+post) SD for each variable (or only the pre SD) is a well-accepted “smallest worthwhile change, SWC” threshold for practically meaningful change in many sports performance contexts. This is important since minor yet positive changes should be interpreted as neutral if largely below the difference with others. For example, if your 100-m time changes from 12.332s to 12.331 s it is objectively an improvement…but this is absolutely meaningless, so this 0.2SD bandwidth is very important. I always tell my students: if your boss says “I’m going to raise your income” you’re happy, if they add “by one cent per year” you’re not. Magnitude matters!
So, for example in one of our recent studies, we reported the individual responses to a hamstring strength, sprint or soccer training intervention (last column) which gives a more detailed picture of what actually happened, beyond the average trend. Individual training responses were then considered as a decrease (individual change < −1 SWC), trivial (from −1 SWC to +1 SWC) or increase (+1 SWC) for each variable of interest.
In our example with Siddle et al.’s study, this is what we have: an average increase in both variables (with every single of the 7 subjects tested following this increasing trend). But in order to link (conceptually and statistically) these two parallel increases into a sort of causal relationship, and interpret this link, we need to know “who is who”. Specifically, what subject showed what magnitude of increase in each variable. And this is the missing information. As you see in the table below, I ranked the values by order of eccentric torque value PRE, but I do not know from reading the paper if the order of corresponding values across the lines for eccentric torque and sprint time is correct.
The magnitude of change in KFT ranges from +9 to +44% and on the other hand the magnitude of change in sprint time ranges from -0.05 to -6.59%. The issue is that our brain thinks that these numbers are “parallel” and the greater the change in A, the greater the change in B, but the dataset available does not allow to comment on that.
In the next part of this post, I will explain why displaying a plot with these changes for each individual is useful to better understand the story, based on “who is who”.
Step 2: who’s who? Same data may mask opposite conclusions
In this part, I will use the data from the table above, which are the data extracted from the paper published, and form several scenarios of correlations between training-induced changes in one variable (KFT) and the other (10-m sprint time). What is important to keep in mind is that I will use the very same changes, but attribute them to different individuals, since all combinations are possible, because the actual combination is unknown…So potentially every scenario discussed below could have been the real one and could have led to both the Figure (bar plot and individual trends) and final average and SDs data displayed in red in the table above. Same final “average is savage” output, very different stories, and opposite conclusions as you will see. Only the final (fourth) scenario is the actual one, and I could discover it thanks to the authors of the study providing me the “who is who” key. Ok, let’s play with numbers. For maximal clarity, I’ve let the numbers appear with the correlation graphs.
Scenario 1: “ideal” case maximizing the association between individual changes
In this scenario, I’ve ranked both variables in such a way that the individuals showing the smallest training-induced changes in one variable (here knee flexor eccentric torque) were also those with the smallest changes in the other variable (sprint time) and vice versa.
Practical interpretation: this intervention resulted in improvements in both KFT and 10-m sprint performance at the group level, and, both changes were highly and significantly correlated. More than 80% (r2=0.838) of the individual variance in the change in sprint performance is associated with a change in KFT. Practical conclusion: improving knee flexion torque capability is associated with better 10-m sprint performance in the amateur soccer and rugby players tested. And the greater the magnitude of training-induced change in KFT, the greater the improvement in 10-m sprint performance.
Scenario 2: “worst case” scenario maximizing the association…in the opposite direction
In this scenario, I’ve ranked both variables in the exact opposite way: the individuals showing the smallest training-induced changes in one variable (here KFT) were those with the greatest changes in the other variable (sprint time) and vice versa. Again, this leads to the exact same average and SDs values (bar graph above) and individual trends (black lines above).
Practical interpretation: here the conclusion is very different: the more you increase your KFT, the less you improve your sprint time. And this relationship is high and significant. Of course, one may argue that all players except one actually improved their sprint performance, but in terms of a potential cause-effect relationship, this graph can be interpreted as a recommendation against improving your KFT capability too much. Which is the exact opposite of the previous scenario…on the basis of similar experimental data. Confusing.
Scenario 3: another possibility (among many)
As you can easily understand there are many other possibilities. In this case, I’ve mixed the ranking of subjects and obtained an almost null correlation.
Practical interpretation: well here we clearly see that yes, all players improved both KFT and 10-m time, but there is no relationship (at all…almost “flat” correlation) between the gains. This means that improvement in sprint performance is not associated to gains in KFT, on an individual basis. This training regimen works to improve sprint time within such a population, but this doesn’t seem to be associated with an improvement in KFT, so likely not due to this, but changes in other neuromuscular-tendon adaptations not measured. It is very important, from a training perspective, to differentiate average gains and behavior from individual behavior and association between variables.
Enough simulations, what was the actual scenario in the study?
Scenario 4: the actual data in this example
Thanks to the collaboration of Chris, Damian, and the co-authors, we could rank the subjects in the actual order and correct the “who is who” issue.
Practical interpretation: this is the exact story within this study. There is a correlation (r = 0.664, p=0.104) between training-induced changes in KFT and training-induced changes in 10-m sprint performance in the amateur soccer and rugby players tested. The magnitude of the linear regression between these two variables is 0.441, which means that 44% of the gains in sprint time can be “explained” or rather associated with the gains in KFT. This is not as high as in Scenario 1, but it is an interesting association, and it is thus relevant to recommend, in this population of amateur soccer and rugby players, to strengthen the knee flexors, especially in eccentric mode, in order to get injury prevention results and some gains in sprint performance. Note that other studies in trained soccer specialists show opposite results, which could be due to a myriad of factors including the participant training status (see Mendiguchia et al. 2020 and Suarez-Arrones et al. 2019).
Conclusion: avoid “averagiarianism” and clarify the scenario
One way to better connect sports science to the final users and improve the transfer of knowledge into practice is to display research results that illustrate both group and individual responses to change. The idea, of course, is not to re-analyze the hundreds (thousands?) of training studies already published, but to encourage authors to better detail their research outcomes, and especially better discuss the individual results in the future. As shown with the example taken here, similar raw data may lead to very different interpretations and thus recommendations. So, as in the paper by Siddle et al. researchers should report individual responses, but should also show the associated individual change between variables of interest. This will enable readers to understand exactly what was observed, “who is who”, how many subjects responded positively, neutrally or negatively to the intervention studied, and so on.
This applies not only to academic papers (which is the raw material of sports science) but also to infographics. This very popular way to disseminate sports science knowledge very often includes bar graphs only (even without SDs) where more detailed data were, in fact, available in the original paper, which adds to the confusion and unclear message. As I have often said, infographics are ok, but nothing replaces a careful read of original papers.
The concluding word is left to elite coach and CEO at ALTIS, Stuart McMillan, commenting on Twitter on one of my blog posts in which I discussed the now published individual variability in response to a heavy resisted sprint training program: “Coaches seek to improve individual performance not only group average, so research should focus on individual performances, not group average adaptations”.
Funny one, in my mother tongue (French), the word used to describe the mathematical process of averaging data is “moyennage”, which is almost exactly the same as “Moyen-Age” which means….Middle Ages. Let’s be modern, and stop being average.
We thank Dr Pierre Samozino for reviewing this post.
Buchheit M (2016) Chasing the 0.2. Int. J. Sports Physiol. Perform. 11:417–418
Buchheit M (2016) The numbers will love you back in return-I promise. Int J Sports Physiol Perform 11:551–554.
Buchheit M (2017) Houston, we still have a problem. Int J Sports Physiol Perform 12:1111–1114.
Mendiguchia J, Conceição F, Edouard P, et al (2020) Sprint versus isolated eccentric training: Comparative effects on hamstring architecture and performance in soccer players. PLoS One 15:e0228283.
Siddle J, Greig M, Weaver K, et al (2019) Acute adaptations and subsequent preservation of strength and speed measures following a Nordic hamstring curl intervention: a randomised controlled trial. J Sports Sci 37:911–920.
Suarez-Arrones L, Lara-Lopez P, Rodriguez-Sanchez P, et al (2019) Dissociation between changes in sprinting performance and Nordic hamstring strength in professional male football players. PLoS One 14:1–12.
Weissgerber TL, Garovic VD, Savic M, et al (2016) From Static to Interactive: Transforming Data Visualization to Improve Transparency. PLOS Biol 14:e1002484.
Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13:e1002128.