^{1}

^{2}

Much of psychological research reports population trends, often expressed as correlations. A simple tool for researchers, students, and the public, TACT, shows how to (not) use correlations to say something about individuals.

Correlations are often used to make statements about individuals.

TACT helps to intuitively assess the accuracy of these statements.

The accuracy can be expressed as a percentage, compared to a random guess.

Most correlations don’t allow for meaningful statements about individuals.

Phrases like “someone high in x is likely to be high in y” are usually incorrect.

Many psychological research findings represent statistical trends in the population, showing how two variables tend to vary together among people. The strengths of these trends are often expressed using correlation coefficients, the absolute value of which can range from 0 (no relation at all) to 1 (one variable is perfectly predictable from the other). Insofar as we assume psychology to be about individuals rather than populations, we expect these trends to tell us something meaningful about individual people. For example, from a correlation between income and happiness, we may conclude that an individual with high earnings (say, Kati) is probably happy, while someone with an average income (say, Mati) probably has about average happiness. This is how research findings are often interpreted in the (social) media and everyday conversations by researchers and the public alike. In clinical assessment, educational, or hiring settings, correlations can inform real-life decisions about individuals. Even psychologists themselves often admit to choosing their field—which advances by documenting correlations—to understand themselves better.

But how much can we trust statistical trends in the population to tell us something meaningful about actual individuals like Kati and Mati? Most people know that applying population trends to individuals entails uncertainty, so any conclusion is only correct to some degree. Here, I describe an intuitive way to think about and communicate this degree, based on grouping individuals into "low", "medium", and "high" groups in both variables and calculating the probabilities that similar values match. In contrast to clinical diagnostics with its tools for expressing binary outcomes’ probabilities, such simple trisecting and cross-tabulating (TACT) is particularly useful for variables that vary on arbitrarily defined continuous scales, as is very common in psychology. For example, given a .25 correlation between the personality trait of conscientiousness and supervisor-rated job performance, how likely is it that a highly conscientious individual performs highly at their job while a person with a medium conscientiousness is a medium-performer?

TACT makes research findings interpretable without specialist knowledge (e.g., what correlations are typical in the field?) or abstract statistical concepts (e.g., standard deviation). Characterising people as having low, medium, or high values makes continuous variables easily interpretable and aligns with how many people intuitively think of them. For example, like some of my colleagues, I am neither low nor high in talkativeness but somewhere in the middle, whereas some of our colleagues are distinctly more and others less talkative than us, the medium-talkativeness people. The distinction between medium and more extreme values is also important because the former are often less informative, as I demonstrate below.

I show how a range of commonly observed correlations can be interpreted using TACT, hoping that this helps researchers, students, and laypeople better grasp the implications of these and many other common research findings across psychology. Experimenting with TACT has reshaped my own intuition about the meaning of correlation coefficients and made me more careful about drawing conclusions from common research findings. I only wish this experimenting had happened as a part of my training, and I hope that this tutorial is useful to others, regardless of their career stage, as well as the public.^{1}

TACT means little more than paying attention to scatterplots that every psychologist is already trained to look at and that are familiar to most people with at least a high school education. A scatterplot is a cloud of points showing the relationship between two variables, with each point representing one individual's position in both variables (

With no relationship between the two variables, all nine grid slots contain the same number of individuals, at least with a sufficiently large sample (

To show how many individuals match or do not match in their levels of the two variables, we can calculate their proportions so that they add up to 100% in each grid column. This means estimating how many individuals with low x values are expected to have low, medium, or high values on y, and the same for the medium and high values of x. For example, we can then say: “

Almost all psychological variables have many causes and subcomponents, so they have bell-shaped distributions in the population: most people are around the average, whereas more extreme values are increasingly less common. The middle grid slots are narrower both vertically and horizontally for such variables, as shown in

In the following two sections, I apply TACT to several well-established correlations and provide rules of thumb for interpreting typical correlations. TACT probabilities for all other correlations can be seen in

In my calculations, I assume normal (bell-like) distributions for both variables (x and y), simulate a range of correlations between them among 10^{8} individuals, apply the 3-by-3 grid on the resulting scatterplots, and calculate the three proportions of y values for each of the three levels of x. For this, I apply a companion R package TACT (see

> TACT(r = .25, distribution = "normal").

The package can also TACT empirical correlations in raw data by being supplied with two variables rather than a correlation coefficient.

Here, I only consider positive correlations, although the logic is identical for negative ones if we swap the low and high labels for either variable.

In a subsequent section, I discuss the implications of and alternatives to my choices (e.g., different cut-offs for low and high values and alternative variables’ distributions).

Given how conscientiousness correlates with multiple physical health markers such as body mass (

People differ widely in their general healthiness, but for simplicity, we can categorise them into three equally sized health status groups: low, medium, and high. If we know nothing about a person, we can only guess that their likelihood of being in any of the three groups is equally 33.3%. This is like throwing a fair three-sided die to guess how healthy that person is. But knowing whether that person belongs to the bottom, medium, or top third of the population in conscientiousness increases our accuracy in predicting their health status to 34.7%—it's like the die is now slightly loaded. On a closer look, however, how much we can learn about the person’s health depends on their particular level of conscientiousness. If that person has high conscientiousness, they are also likely to have high health status with a 35.3% probability, and the same probability goes for low conscientiousness and low health (

Suppose a layperson completes a personality questionnaire and receives feedback that they have a medium conscientiousness level. This information cannot make them much wiser regarding their health, being almost equally likely to have any health status. But having received feedback that their conscientiousness is low, they are about two percentage points likelier to have a low rather than medium health status and about four percentage points likelier to have a low rather than high health status. This way of presenting the information can increase the chances that individuals meaningfully apprehend the implications of

Put differently, although the best guess is that someone's health status is similar to their conscientiousness, it still remains incorrect nearly two times out of three, which is quite similar to guessing randomly. One may ask, then, what do researchers mean when they say that conscientiousness is an important predictor of health (e.g.,

Hypothetically, suppose that the conscientiousness-health link is directly causal, and public health officials deploy a cheap yet at least modestly efficient intervention to a million people with currently low conscientiousness (or three million randomly selected people of whom a million have a low level of the trait), increasing the trait among 10% of them to a level that is currently considered medium. Among these people, 35.3% currently have poor health; after the intervention, this could decrease to 33.3%. This means that nearly 2,000 currently low-health people could end up with health that is presently considered medium. Moreover, almost 2,000 people will move from what is currently a medium health status to a level that is presently regarded as high. Although the lows, mediums, and highs will need to be recalculated after the intervention and, in comparison to others, most low-health people will remain low-health people, the absolute increase in their underlying health status trait may be substantial enough to mean that fewer people get sick and die. Although the intervention does not help a vast majority of people, it will help some, and public health officials can calculate whether the reduction in treatment costs and increased productivity are sufficient to cover the intervention’s costs.

Both the public and scientists know that parents and children tend to be similar in their traits, primarily due to their partly shared genes. Empirically, parent-child correlations in personality traits tend to be around .15 (

Given the .15 parent-child correlation, the overall probability that a parent in the bottom, medium or top third of the population in a personality trait has a child in the same third is 37.5%; for comparison, any two randomly compared people have a 33.3% probability of being in the same third. More specifically, a parent with a high or low value on the trait can expect their child to match their trait level with 39.4% probability, whereas the child has a medium or even the opposite trait level with 33.2% and 27.5% probabilities, respectively. But a child of a parent with a medium trait level is almost equally likely to have a low (33.2%), medium (33.7%), or high (33.2%) trait level (

Parent’s trait | Child’s trait (%) |
||
---|---|---|---|

27.45 | 33.16 | ||

33.16 | 33.16 | ||

33.16 | 27.45 |

This means that over three in five sons (or daughters) are

The personality trait of conscientiousness is known to be among the predictors of various indicators of job performance and is often used for selecting suitable job applicants. Across many studies, the conscientiousness-job performance correlation is around .25 (e.g.,

Knowing whether a job applicant is in the bottom, middle, or top third of conscientiousness among other applicants, we can predict their job performance level—low, medium, or high—with 40.5% accuracy; for comparison, by just throwing a die, we could achieve 33.3% accuracy. More specifically, it is a 43.6% probability that a person in the top third in conscientiousness will also be in the top third in job performance, against a 32.9% probability of being a medium performer and a 23.6% probability of having a low-third performance level (

Job Performance | Conscientiousness (%) |
||
---|---|---|---|

23.57 | 32.85 | ||

32.85 | 32.85 | ||

32.85 | 23.57 |

In other words, by picking an applicant from the highest third in conscientiousness rather than randomly, we decrease the chances of

This information could be helpful for employers who want to quickly filter out a majority of applicants from a larger applicant pool while over-saturating the remaining pool with high-performers (43.6%) and ensuring that among those removed from that pool, the proportion of high-performers is lower (28.2%, the average of 32.9% and 23.6%). However, these employers have to be careful with providing feedback to the applicants. For example, a statement like "

When people (targets) complete well-established personality questionnaires about themselves and have others who know them well (informants) also complete that test about them, the scores often correlate around .50 (

If people were randomly filling out the questionnaire, 33.3% of the targets scoring low, medium or high in a self-reported personality trait would score similarly in the informant-reports. However, assuming the .50 correlation, 49.2% of targets are expected to have a similar trait level in both self- and informant-reports. More specifically, of those scoring high or low in self-reports, 54.9% would score similarly in informant-reports, whereas 31.1% would have a medium and 14% an opposite score (

Self-reports | Informant-reports (%) |
||
---|---|---|---|

14.02 | 31.12 | ||

31.12 | 31.12 | ||

31.12 | 14.02 |

Suppose we conducted a study where people received feedback on their self-reported personality traits (low, medium, or high compared to other self-reports) and that each person was also rated by an informant who independently received similar feedback about the target (low, medium, or high compared to other informant-reports). If the participants and their informants could compare their feedback, about half of the target-informant pairs would find that they similarly described the target in any given trait.

Suppose a person completes a personality trait questionnaire and receives feedback that they are in the medium third in neuroticism compared to other people; they are unsure about this feedback and want to complete another questionnaire to get a “second opinion”. How likely would they get a different result?

The correlations among different neuroticism scales vary but average somewhere around .70 (

Score in the first test | Score in the second test (%) |
||
---|---|---|---|

6.55 | 27.90 | ||

27.89 | 27.90 | ||

27.89 | 6.55 |

TACT offers a way to think about and explain what measurement (im)precision means for the measurements of individual people. As shown, about five in ten people will be in the same third in a personality trait according to two different sources of information—their own ratings and ratings by someone who knows them well—and nearly six in ten will be in the same third when completing two different tests of the same trait. To complement that, over seven out of ten people will be in the same third when completing the same test twice.

Specifically, scores of well-established psychometric tests taken twice over two weeks typically correlate close to .90 (e.g.,

Score in the first testing | Score in two weeks (%) |
||
---|---|---|---|

0.56 | 19.02 | ||

19.00 | 19.02 | ||

19.00 | 0.56 |

For psychologists, correlations between the scores from two testing occasions show how reliable the scores are, often abstractly defined as the proportion of "true score" variance in them (

When psychologists discuss personality with a lay audience, one question that often comes up is about personality stability:

A correlation of .80, for example, would mean that an individual with a low, medium or high level of the trait on the first testing occasion can expect the same result in a few years with a 64.9% probability. At a closer look, someone with a high or low trait score has a 72.1% probability of scoring similarly, 24.8% probability of having a medium score and only 3.1% probability of having an opposite trait score after a few years (

Score in the first testing | Score a few years later (%) |
||
---|---|---|---|

3.13 | 24.78 | ||

24.79 | 24.78 | ||

24.79 | 3.13 |

So, about two-thirds of people will retain their relative trait level in the population within a few years, while every third person changes, most often by moving to the adjacent trait level. Even with such strong population-level stability, then, personality trait change is still very common among individuals, especially among those with medium trait levels.

Variables’ medium values carry less predictive information about other variables than more extreme values—at least when the baseline probabilities of low, medium and high values are equal (see below for different examples)—because they have two equally probable adjacent levels on the other variable that they can deviate to. However, more extreme values have only one adjacent value to deviate to, whereas deviating to the other extreme is less likely.

In psychological research, correlations between .15 to .25 represent the most common statistical associations between variables (

around 40% of those with a low or high value on one variable are likely to have a similar value on the other variable, whereas 60% are likely to have different values on the two variables;

compared against the random-guess baselines of 33% and 67%, respectively, typical correlations mean a seven percentage points improvement in the accuracy of predicting low or high values in one variable from the other variable;

medium values on one variable carry virtually no information about the other variable.

Innocent-looking statements like "

it takes a correlation over .40 for it to be even marginally correct, being valid for at least over half of the people;

it takes a correlation over .80 for it to be accurate to a compelling extent, applying to at least two in three people.

Such statements would typically be misleading at best or incorrect at worst because correlations that high are rare (

So, we should avoid categorical conclusions about any real person based on typical statistical trends among people. Instead, one solution is to confine our conclusions to vague population-level statements such as "

News articles often present research findings as having implications for the reader: unfortunately, this almost always means misleading the reader because the vast majority of correlations are not even nearly sufficiently strong for that.

Primarily, I present TACT as a simple and general-purpose tool for understanding and communicating correlations’ magnitudes. Admittedly, it involves arbitrary assumptions and simplifications, some of which I address in this section. Yet it can be adapted to more specialised needs to meet different assumptions, also described here. I also discuss some of TACT’s overlaps with the rich but sophisticated toolbox of clinical diagnostics. Those readers interested in TACT as a simple and general-purpose tool may find this section too technical and wish to skip it in full or partly.

For example, why not use two categories instead of three, thus

> TACT(r = .25, distribution = "normal", cutoffsx = c(.5,.5), cutoffsy = c(.5,.5))

Given the .25 correlation, those above the median in conscientiousness have a 58.1% probability of also being above the median in job performance, against the 50% random-guess rate.^{2}

Labelling people as low

Second, with continuous variables BESD masks the omnipresent phenomenon of regression to the mean: high (or low) values in one measurement are statistically expected to match relatively lower (or higher) values in another measurement, even when the measurements are correlated. Within high or low groups, people at the more extreme end in one variable are more likely to be closer to the middle of the distribution of the other variable. Moreover, the more extreme value they have on one variable, the more they are expected to regress towards the mean on the other variable. For example, a highly conscientious person is likely to regress towards the average on any other measure correlated with conscientiousness, even if they still remain (just) above the average slightly more than half of the time. Because the high and low groups are so broad when using BESD for continuously distributed variables, this trend remains masked (e.g., someone can be on the 5th percentile on one variable but on the 49th percentile on the other, yet counting as similarly high on both). In TACT, the groups are narrower, so the regression to the mean is comparatively less likely to go unnoticed—people can actually regress from high or low to the medium group.

Third, BESD masks medium values’ tendency to be less informative about other variables than more extreme values. In fact, with typical correlations, medium values carry virtually no predictive information. Admittedly, I had never thought about it before experimenting with TACT. As the TACT examples showed, this can often have important implications for interpreting research findings at the level of single individuals (e.g., for feedback).

So, I consider TACT an improvement over BESD, because it is more consistent with how people think of continuous variables, better addresses regression to the mean and shows the different predictive values of the medium and more extreme scores. Of course, we could also categorise people into four or five groups, but this would inevitably make the interpretation of correlations more complex since there would be 16 or 25 slots on the scatterplot.

Choosing any cut-offs between low, medium and high values is arbitrary, but making the three groups equal in size is arguably the least arbitrary and most generally applicable and intuitive solution. It means that the random guess is equally accurate at all levels of the variables involved—that is, the die is not

However, in specific circumstances, other cut-offs may be more practical such as, for example, half of the individuals or those within two standard deviations around the mean categorised as medium. The effects of any such cut-offs can be tested using the TACT R function by setting the cut-offs to, say, 25th vs 75th percentiles (half of the people now being medium) or 16th vs 84th percentiles (two standard deviations around the mean now being medium, for normally distributed variables). As a general rule, the overall probability that low, medium and high values on both variables match remains similar for all cut-offs, but larger medium groups mean higher probabilities that variables' medium values match at the expense that the probabilities of matching low (or high) values match. Of course, when more people are set to have medium values, it is

In other words, the more precise a conclusion we want to draw about someone's level in one variable based on their score in another variable, the more likely we are to be incorrect because riskier predictions are always less probable, all else equal. (I already showed this in relation to bisecting variables.) Suppose we want to identify the top quarter of job performers by only keeping the applicants in the top quarter in conscientiousness instead of the top third. We will find that the top-quarter cut-off would entail a lower accuracy in identifying a top-performer (35.7% vs the 25% random-guess baseline) than the top-third cut-off (43.6% vs the 33.3% baseline), and the same is true for even higher cut-offs. That is, the increase in accuracy over the random-guess baseline remains comparable, but the absolute accuracy decreases because the random-guess baseline decreases.

To test the effect of assigning half of people to the medium category, this code can be used, producing

> TACT(r = .25, distribution = "normal", cutoffsx = c(.25,.75), cutoffsy = c(.25,.75))

Bisecting variables and varying the high-low cut-offs connects TACT with the literature on clinical testing and diagnostics (e.g.,

With a .25 correlation, an accurate prediction about an individual—that they are in the top quarter in performance because of their high conscientiousness—will always remain less likely than a wrong prediction, but it does increase somewhat with higher conscientiousness cut-offs. For example, with a 50% conscientiousness cut-off, the probability of being accurate is 31.4% (against the 25% random guess baseline), which increases to 35.7%, 40.4% and 43.5% with 75%, 90% and 95% cut-offs, respectively. However, the probability of correctly identifying those not in the top quarter of job performance, or NPV, ^{3}

Such scenarios, already discussed by

> TACT(.25, "normal", cutoffsx = c(.90,.90), cutoffsy = c(.75,.75))

These scenarios can provide helpful solutions for specialised needs, but they also show the complexities of using correlations for diagnostic decisions about real individuals. Here, I emphasise again that the main idea of TACT is to introduce a simple general-purpose way of thinking about and communicating the meaning of correlations for individuals. For this, the default approach of trisecting variables probably works the best.

As presented so far, TACT has been based on variables with bell-shaped population distributions. But the TACT R function can also be used to assess correlations between variables that have uniform (all values are equally likely) or skewed (values in one extreme are more likely than medium values and especially values in the other extreme) distributions.

Generally, the TACT is relatively robust to how the variables are distributed. However, the accuracy in applying correlations to individuals—the probabilities of values in one variable matching similar values in another—tends to be somewhat smaller the more uniformly the variables are distributed. For example, this can be experimented with:

> TACT(r = .25, distribution = "uniform").

Increasingly, researchers measure their participants at many time points and the resulting time-series data allow for calculating correlations between variables’ values at different time points. These correlations describe variance trends within, not between, individuals. They can be unique to each individual, although they are often aggregated to sample-level estimates representing an average individual and becoming another population-level trend (“fixed effect”). TACT can also be applied to these correlations, with the only difference being that individuals are swapped for measurement occasions.

For example, using such a within-individual design

No variable is measured with perfect accuracy. Random measurement error biases correlations downwards, so our ability to say something about individuals based on correlations could be greater if we somehow fixed the correlations. The best way to do this is to reduce measurement error in the first place.

However, in cases where one variable is clearly the predictor and the other being predicted, it could make sense to adjust the correlations for measurement error in the latter before TACTing the correlation. This is because it is the hypothetical true value that is being predicted rather than its imperfect measurement. Also, when the association is interpreted in relation to variables’ hypothetical true values rather than their measured values, it may be useful to adjust the correlations for measurement error. For example, when we estimate individuals’ hypothetical trait scores’ stability over time rather than the stability of the trait’s measurements—as was discussed above—we may adjust the rank-order stability for measurement error before TACTing it.

TACT is not suitable for interpreting linear correlation coefficients (incorrectly) calculated for relations that are actually non-linear. However, TACTing the scatterplot for these variables’ associations, when raw data are available, can be particularly useful for illustrating the non-linearity. This can be achieved with the TACT function of the TACT R package.

Typical correlations that emerge from psychological research can be useful for showing trends in the population. Provided that necessary conditions are met, these trends may inspire cost-effective population-level interventions that could benefit small but occasionally worthwhile proportions of these populations (

The editor-in-chief (who also handled this paper as the acting editor) encouraged the author to submit a paper based on their post at

The manuscript has been available as a pre-print at

I am grateful to Wendy Johnson, Michelle Luciano, Lisanne de Moor, Yavor Dragostinov, Sam Henry, Ross Stewart, Manuel Maldonado, Marco Del Guidice, Kadri Arumäe and Emma Mõttus for their comments on the drafts of this article.

I note that others have also thought about the meaning of statistical trends for individuals, such as in the context of clinical interventions (e.g.,

For continuous variables, BESD is sometimes incorrectly calculated, even in popular textbooks (e.g.,

The ratio of sensitivity to the opposite of specificity (100% - specificity) is often called the

For this article, data is freely available (

The supplementary material contains TACT propabilities for a range of correlation magnitudes and statistical (R) software for calculating these probabilities in similated and actual data (for access see

The author has no funding to report.

The author has declared that no competing interests exist.