*By Gary C. Bird, Ph.D., American Academy of Family Physicians; and Melanie D. Bird, Ph.D., American Academy of Family Physicians*

One of the first things to consider as you review data obtained from a CE activity is to ask the question, “Is the data reproducible?” meaning that another person can construct the same study and obtain the same results. This is a primary concern for randomly controlled trials (RCTs) seeking to tightly control factors that may influence results and, in doing so, allow for robust data collection and statistical analysis that yield definitive answers to a defined research question. Reproducibility is also important for generalizing data across a population. If an RCT is reproducible (and the sample is random), the results should hold for a larger population. Unfortunately for us as providers of CE, the data we obtain from our educational activities can never involve such tight control. CE learners have different motivations for engaging in an activity and different levels of engagement during the activity, resulting in potential sources of variability in the data obtained. Our learners, even when converged by profession or specialty, often vary dramatically across a spectrum in these variables.

This is not to say that data from CE activities has no place in a truly scientific assessment of the impact on CE outcomes. There is considerable knowledge to be gained from collection and analysis of activity data. The education we provide is an important element in the continuous professional development of medical professionals and what we do has the potential to positively impact the way they practice medicine, which ultimately results in the increased health of their patients. Therefore, drawing conclusions from data obtained as a result of our education that carries over to larger populations is the key to not only improving the quality of education, but also improving the means by which we disseminate ideas to others in our area.

**Scope of this Article **Previous articles in the statistics series have talked about specific statistical methodologies that are pertinent to those trying to make sense of their data. In this article, we will go through some of the common mistakes and misconceptions then individuals use these methodologies to analyze, view and interpret CE activity data. We will also highlight key areas that can cause issues with drawing conclusions on a population level.

**Inappropriately Graphing or Charting Your Data. **For most people, numbers alone generally do not “pop” or liberate trends in the data to provoke thought on activity successes and failures. Presenting data in the form of a graph or chart provides a way to combine a lot of data and give the viewer the ability to consider the entirety of the data in a simple way. When done well, a chart or graph may save pages of report space; however, it is important to remember to keep it simple and clear. Figure 1 depicts two different graphs of the same data, but one of them is clearly misrepresenting the findings. When graphs lack appropriate labels and context, it is easy to read more into the data than what is actually there.

Key features to note when making a graph:

- Describe the chart’s data using consistent units.
- Clearly label the axis to avoid an unclear basepoint problem.
- Label the represented data as a bar or point, so that it is clear what it represents.

**Errors from Using Percentage Values **Often, converting raw values into percentages is a good way to present data, in particular to show changes. However, the same technique can also promote a serious error in interpreting the data: inconsistent use of terms. Although simple, this error is still often encountered by savvy data handlers. Two terms that are often confused are percentage point change and percentage change. Percentage points deal with percentages as a unit, so that “1 percentage point = 1 percent.” When percentages are increased, the percentage point change is represented by the post-score minus the pre-score (or reversed if the post-test score is lower). On the other hand, if you are describing a change in the values as a fraction or percent of previous data, then you are looking at a change in percentages. Percentage increase is post-score minus the pre-score (reversed for percentage decrease), which is then divided by the pre-score and multiplied by 100.

**Overreliance on P Values as a Gold Standard **One of the biggest assumptions made in statistics is that by providing a P value associated with a comparative parameter the data is given a “seal of approval.” This subject was important enough to be discussed in a 2014 news article in the prestigious science journal Nature1. In contrast to its modern use as an absolute standard of measure for strength of evidence, the UK statistician Ronald Fisher, who introduced the P value in the 1920s, never meant it to be a definitive test. The original thought was to utilize the methodology solely to determine the probability of an event happening only in the context of the “null hypothesis,” and whether the hypothesis could therefore be disproved based on the available data. The assumptions generated by such a P value are therefore limited, and we may be making a mistake to place too much value when we say “statistically significant.” To emphasize this point, consider the pre-/post-test results testing knowledge before and after a CE activity in the data scenario on the following page.

**Believing Non-significance Equals no Effect **In contrast, just because your results are non-significant does not mean there is no effect. There are several reasons you might have obtained non-significant results. One reason may be that the sample size was too small (see previous statistics series article on sampling); another is perhaps the sample has too much variability. A third reason may be that the effect is small. However, that does not mean it is not important. Small changes can have value. It is important to evaluate the results in the context of your population of learners.

When interpreting studies with non-significance, we can look at the power of the study and the confidence intervals.^{2} Power analyses can help calculate both the minimum sample size required for a study as well as the minimum effect size likely to be detected in a study with a set sample size. Looking at the power analysis, we can then determine if a non-significant result is due to the study being underpowered. In other words, for a non-significant result in a study with low power, we cannot accurately state that the null hypothesis is true. However, if we get a non-significant result with a high-powered study, we can feel more comfortable suggesting that the null hypothesis is true.

Confidence intervals can also be used to interpret non-significant results.^{2} As discussed previously in this series, confidence intervals show the range where the true mean of the population will be found. A non-significant result will have an effect size of zero and the confidence interval will cross zero, suggesting the null hypothesis is true. However, the range, or width, of the confidence interval can give more information. If the confidence interval is narrow, this is consistent with the null hypothesis being true and a lack of effect. If the confidence interval is wider, there is more likelihood that there may be a true effect.

**Correlation and Causation **“Correlation does not mean causation” is a common phrase in statistics. Correlation refers to a statistical relationship between two variables. Causation refers to instances where one variable causes another variable to occur. Causation and correlation are examined every day in epidemiology, for example, in cases of food-borne illness. Epidemiologists will investigate all variables that are similar between people who became sick and attempt to determine the cause of the illness. They might find multiple variables in common in a group of people, but realistically only one (e.g. a particular restaurant or food item) will be the cause.

Many people confuse this issue and are very quick to assume that if something precedes an event then it caused it. Most superstitions are based on this assumption — how many black cats have been blamed for a string of bad luck? Another example of this fallacy occurs often in the world of dieting and fitness. A new diet requires ingestion of a particular supplement, reduced consumption of calories and increased physical activity. A person following this diet plan loses weight and assumes it must be due to the supplement. Without an appropriately controlled study to compare Person A with other people who simply dieted and exercised without taking the supplement, it is incorrect to assume causation. For us as CE providers, because of the variability of our learners and the difficulties in making suitable control groups, we should be extra careful in making statements indicating causality.

**Extrapolation and Interpolation **Extrapolation is defined as drawing conclusions about a study beyond the range of data (Figure 2) and is another common error in statistics. For example, an inference made about one small sample of learners with a particular trait is applied to all learners who may vary greatly in that trait. Another cause of extrapolation errors involves the use of biased sampling. There are commercials every day stating that four out of five doctors recommend Product X. Extrapolating this result to the whole population would mean that 80 percent of doctors would recommend Product X meaning it must be really good. However, what if only five doctors were used in the sample to generate this statistic or what if the five doctors polled were involved in creating the product? This would be a biased sample. This becomes apparent when a greater number of doctors were polled, in which case, only 15 out of 100 doctors (15 percent) would recommend the product. Doesn’t sound as amazing, does it?

Interpolation is a method for determining a new data point within a set of known data points (Figure 2). Similar to extrapolation, it is still a means of estimating a hypothetical value but unlike extrapolation, we can feel safer in our estimate, as we are staying within the experimental range. One of the simplest methods is linear interpolation. When using a formula, the value for an unknown data point can be calculated using the two closest known values on either side of the unknown value and drawing a straight line between them. In Figure 2, the green box represents an unknown value calculated using linear interpolation based on the known values. In addition to Excel and statistics packages, there are numerous online calculators that can be used to calculate linear interpolation.

**Forecast of Next Article **In the final article of this series, Gary Bird, PhD, and Pesha Rubinstein, MPH, CHCP, will provide key takeaways from these articles and list resources for further exploration for the CE professional who is new to the study of statistics.

**References and Further Reading**

- Nuzzo, R. 2014. Scientific method: Statistical errors. Nature. 506: 150-3.
- Colegrave and Ruxton. 2002. Confidence intervals are a more useful complement to nonsignificant tests than are power calculations. Behavioral Ecol. 14(3). 446-7.