Why Is It Important to Study Personality Judgment, and What Can We Learn?
The history of personality judgment research is nearly as long as the history of formal psychological research in general, and for good reason: personality judgments form quickly and naturally and have important consequences for both the person making the judgments and the person being judged. What we think about the personality of others plays an important role in how we behave toward others, including whether we attempt to communicate or establish a relationship, offer them a job, or vote for them in an election. As these are some of the most important domains in human existence (love and companionship, work and productivity), acquiring knowledge about these processes constitutes a potentially vital contribution to psychological science in both theory and practice.
As personality and social psychologists, we strive to understand how judgments of personality come about and the factors that influence these judgments. Personality judgment research seeks to answer questions such as: Which personality judgments are typically correlated and can be combined into overarching descriptive dimensions (i.e., what is the structure of these judgments)? Under what conditions do judgments of targets by others become more similar to one another (consensus)? How well do the targets’ judgments of themselves converge with judgments by other people (self-other agreement)? Do personality judgments agree with criteria reflecting what targets are really like (accuracy)?
In the present paper, we provide an overview of empirical research on personality judgment, with a focus on current findings of particularly high importance and consistency in relation to the preceding questions1. Due to space constraints, we have limited our citations to review articles when possible, or to the first article that supported a specific finding. We review terminology used within the field, what researchers generally agree upon, and where there are disagreements or questions remaining to be answered.
Accuracy and Bias in Personality Judgments
A complete review of theoretical deliberations within personality judgment research is beyond the scope of this paper. Instead, we will briefly reiterate some core theoretical tenets that we assume are shared by most researchers in the field, and otherwise concentrate on the most robust empirical evidence. Popular theories about interpersonal judgment processes include the following: Brunswik’s (1956) lens model (Osterholz et al., 2021), the Social Relations Model (SRM) and PERSON Model (Kenny, 2004, 2020), the Realistic Accuracy Model (RAM; Letzring & Funder, 2021), the Social Accuracy Model (SAM; Biesanz, 2021), and the Truth and Bias model (TAB; West & Kenny, 2011). There is considerable convergence among these models, although an overarching integration is still lacking. For a review of interpersonal perception theories and models, see Biesanz (2018).
When perceivers (or judges) evaluate the personality of targets using words or phrases, their descriptions may reflect some “substantive” or “true” characteristics of the targets. For example, a target who uses more words per minute than the average target may be judged as “talkative.” To study personality judgment accuracy, perceiver judgments are compared with some standard, or accuracy criterion, that is assumed to reflect an approximation of “true” characteristics of targets, using some variant of correlational analyses.
Several accuracy criteria are used in judgment research, including targets’ descriptions of themselves, descriptions from acquaintances of the targets, and codings of targets’ behaviors that are assumed to relate to the traits of interest. When ratings from different perceivers, including the self, are compared to each other, the level of correspondence can be referred to as inter-rater agreement, and researchers use a variety of terms to refer to different combinations of types of perceivers and criteria (Letzring & Funder, 2018). Consensus applies to correspondence between judgments by two or more perceivers other than the target. These perceivers could all be strangers or all be acquaintances (also called informants), or strangers’ judgments could be compared to acquaintances' judgments. Self-other agreement applies to correspondence between judgments by others and targets’ self-descriptions. Note that ratings by informants can be treated either as the judgments whose validity is to be determined, or as a criterion variable. There are several good reasons for using acquaintance-ratings as accuracy criteria. For example, close acquaintances of targets may reasonably be assumed to base judgments on a broad range of experiences that they had with the targets. Judgments by acquaintances also have the advantage of aggregating ratings across multiple acquaintances, which will increase the reliability of measurement. On the other hand, the targets themselves presumably know the truth best regarding certain variables to which they have “privileged access” (e.g., their own feelings; Vazire, 2010, p. 283). In many studies, these types of inter-rater agreement are interpreted in terms of accuracy, on the assumption that the different perceivers agree because they all observed or inferred something “real” that sets a given target’s personality apart from other targets.
Sometimes, composites of several variables (e.g., self and other-ratings) are used as accuracy criteria, based on the assumption that the outcome will reflect the variance that is shared among all sources of information, and thus most likely reflect something real (the Realistic Accuracy Model approach; Letzring & Funder, 2021). Another strategy is to use accuracy criteria that are (mostly) unfiltered by human perception, such as behavior tracking using mobile sensing and digital footprints, recordings of ambient sound (e.g., Tackman et al., 2020), or direct observation of behaviors in a laboratory setting. However, one downside is that generating these criteria requires much time, effort, and resources. Another consideration is that the conceptual relationship between judgment and criterion variables will have to be justified. For example, if judgments of extraversion correlate with speaking time, should that be interpreted in terms of (partial) accuracy? Also, the decision as to which of the many available (“hard”) variables are to be deemed important will still have to be made by human judges (i.e., researchers).
Personality judgments can also be influenced by variables other than the “substantive” characteristics of the target, and such influences are often called perceiver biases. For example, some perceivers tend to endorse all descriptor items, irrespective of content (known as acquiescence, see Section 2.4) and a perceiver’s fondness for a target plays a role in what terms the perceiver will use to describe the target. Inter-rater agreement or accuracy also may be driven by shared biases such as stereotypes (e.g., perceivers sharing certain racial stereotypes may agree in their judgments of targets of different races, irrespective of targets’ actual behaviors). While a “kernel of truth” may exist in some stereotypes and the variance associated with stereotypes can thus contribute to accuracy (Jussim et al., 2009; Kenny, 2004), an appropriate handling of the accuracy issue requires careful consideration regarding alternative, substance-unrelated reasons for measures to correlate with one another (e.g., Podsakoff et al., 2003). In summary, it is important to consider that personality judgments contain systematic variance between targets, between perceivers, between individual perceiver-target dyads (after controlling for differences between perceivers and between targets), and error (Kenny, 2020). So, personality judgments are not only informative regarding differences between targets - they also tell us something about perceivers and the specifics of each perceiver-target dyad.
What Do Personality Judgment Researchers Know and Agree On?
After considerable discussion, we have chosen topics that we agree are especially important in understanding personality judgments and judgment accuracy. Despite our attempt to aim for a collection of robust and representative findings, our selections are still somewhat subjective. In the following sections, we first focus on descriptor terms and phrases related to personality that people may use to describe targets (Section 1). These descriptors are used in most personality research and thus relevant to personality judgments. Then, we address issues of inter-rater agreement and accuracy in personality judgments, describing two commonly-used strategies: the variable-centered or trait-wise approach (Section 2) and the profile or person-centered approach (Section 3).
1. Core Findings Regarding Descriptors of Personality
1.1. The natural person-descriptive language contains hundreds of terms that people use to characterize personality traits (Anderson, 1968), such as “outspoken” or “impulsive.” The psycholexical hypothesis (Allport & Odbert, 1936) posits that these terms exist because they denote personality characteristics that people found important to communicate about throughout time. Understanding the properties of such terms, and ways in which perceivers use them, is relevant to personality judgments because they constitute the material with which data are collected (e.g., questionnaires, rating items, etc.).
1.2. For most person-descriptive terms, the optimal level is not found at one of the poles of the response continuum, but at a point that is less extreme. Thus, many person-descriptions incorporate the idea that one may have “too much” of a good trait, or “too little” of a bad trait (Jung & Kenny, 2005).
1.3. Research shows that personality-descriptive terms differ in a variety of ways that may impact judgments, including how positive or negative a light they shed on a target (social desirability; Edwards, 1953), the stability of substantive characteristics to which they refer (traitness), how easily the substantive differences between targets that they refer to may be observed from outside (observability or visibility), and how common those characteristics are in the population (base rate). Terms also differ in how broad a range of substantive characteristics they refer to (abstractness), and how important it is for others to know that someone has been described this particular way (importance). All of these term properties can be reliably rated by a small number of (~10-15) raters.
1.4. The distributions of most of these term-properties are unimodal and centrally-peaked, with one clear exception: social desirability ratings have a bimodal distribution and clearly imply a positive or negative evaluation of a target. There seem to be more negative descriptors than positive ones (Anderson, 1968).
1.5. The rated social desirability of a term is also a strong predictor of how its use will depend on attitude differences, such as liking, between perceiver-target dyads. Perceivers rate liked targets higher on positive terms, and disliked targets higher on negative terms, and they do so in at least partial independence of the targets’ actual characteristics (Bäckström et al., 2009; Zimmermann et al., 2018).
1.6. The covariance structure of personality judgments may be captured at different levels of abstraction. At the highest level, a strong, evaluative (“general”) factor distinguishes more positive from more negative descriptions (Biderman et al., 2018). This factor is most appropriately interpreted as reflecting the perceivers’ attitudes toward the targets (which may be shared among perceivers to varying degrees). In self-ratings, these attitudes largely equal the targets’ self-esteem (Anusic et al., 2009). The general factor aligns closely with Neuroticism, Depressivity, and other constructs saturated with self-evaluation. Beyond that, research has yielded five (John & Srivastava, 1999) or six (Ashton et al., 2014) factors that are more content-specific. These factors have been named Conscientiousness, Agreeableness, Neuroticism or Emotional Stability, Openness to Experience, Extraversion, and Honesty/Humility. However, the universality of these personality judgment factors across cultures is still the subject of considerable debate, with many proponents strongly advocating it, and others strongly opposing it (cf. Allik, Realo, Mõttus, Borkenau, et al., 2010; Thalmayer et al., 2020). Extraversion and Agreeableness largely capture the same item covariance that is also captured by the “big” two interpersonal dimensions: Dominance (or Status, Power, Agency, or Competence) and Affiliation (or Love, Warmth, or Communion; Wiggins, 1991). At an even lower level of abstraction, individual aspects, facets, and nuances of the broader personality factors may be distinguished from one another (John & Srivastava, 1999).
2. Core Findings Using the Variable-Centered Approach
The variable-centered approach provides information about how a specific trait may be perceived and is featured in several interpersonal perception models (e.g., Brunswik, 1956; Kenny, 2004). In this approach, several targets are rated by perceivers, either on a single term or phrase, or using several items from the same domain that are then aggregated. Importantly, in this approach agreement or accuracy is determined for individual variables (i.e., items or scale scores). This is useful when researchers are interested in studying judgments on a particular trait. It is also the only feasible approach in many cases where accuracy criteria other than (aggregated) judgments (e.g., visible behavior or objective test results) are used, because measuring such variables for more than just a few traits likely exceeds the resources available to most labs.
2.1. Scale scores are more reliable measures than scores of individual items, as long as the scale’s items are intercorrelated. This higher reliability also makes it more likely for scales, rather than individual items, to correlate with other variables, and is probably the main reason why the use of personality scales is very common in contemporary personality psychology. However, because aggregation will accentuate the items’ common variance at the cost of their unique variances, scale scores may sometimes be highly reliable measures of things other than the targets’ personalities (e.g., perceivers’ attitudes toward the targets; Anusic et al., 2009; Leising et al., 2021).
2.2. Scales seem to be more strongly correlated with one another in other-ratings than in self-ratings, especially at lower levels of acquaintance (e.g., Beer & Watson, 2008a). This is likely due to attitudes contributing more to the overall variance in other-ratings (Leising et al., 2021), and/or to self-rating factors capturing more actual between-target variation in more internal experiences such as thoughts and feelings.
2.3. Some relatively small disparities in mean levels exist between judgments of the self versus others. For example, people rate themselves as more neurotic and less conscientious compared to external observer ratings, and these differences are cross-culturally replicable (Allik, Realo, Mõttus, Borkenau, et al., 2010).
2.4. Perceivers differ from one another in how positively they judge the average target, in what trait levels they attribute to the average target on specific traits, and in acquiescence (Rau et al., 2021). Estimates of the total influence of different types of perceiver variance on personality judgments often range around 30% (Kenny, 2020; Rau et al., 2021). However, when studies do not permit interaction between perceivers and targets, estimates can be considerably lower (e.g., Heynicke et al., 2021). One reason for this could be that, in studies with previous interactions, the variation between perceivers’ judgment styles may be increased by variation in the average target’s behavior toward particular perceivers (e.g., one perceiver actually being treated more nicely by everyone than another perceiver).
2.5. Perceivers differ systematically from one another in how variable their responses are across items, regardless of content (Baird et al., 2017). This is important because it may lead to erroneous conclusions about variability in targets (e.g., in a person’s behavior) when, in reality, this variability is content-independent and rooted entirely in the perceiver. Notably, this individual difference is relatively stable over time (Wetzel et al., 2016).
2.6. People sometimes judge other people similarly to how they judge themselves (assumed similarity). The average degree of assumed similarity is moderately sized and varies across traits; meta-analytic results show the strongest and most consistent correlations for honesty-humility (r = .48), followed by openness (r = .23-.35) and agreeableness (r = .11-.25; Thielmann et al., 2020, Table 1). Assumed similarity also tends to be stronger for traits that are less visible and among people who have closer relationships (Kenny, 2020; Kenny & West, 2010).
2.7. Both consensus and self-other agreement are consistently significantly different from zero, but are often not very large. When not correcting for unreliability, consensus between two individual raters (on average, across traits) is ~r = .30-.40, whereas self-other agreement tends to be lower (~r = .20-.30; Connelly & Ones, 2010; Hall et al., 2008; Kenny & West, 2010). The discrepancy is usually explained in terms of the types of information (feelings and thoughts vs. visible behavior) that self- versus other-raters have access to, and that are completely shared in the case of consensus but only partly shared in the case of self-other agreement (Kenny, 2020; Vazire, 2010).
2.8. Research has also looked into the extent to which people are aware of how others see them (meta-accuracy) and whether they notice the differences between others’ views of them and their own self-perceptions (meta-insight; Carlson et al., 2011). Both types of agreement between judgments have small to moderate effect sizes, depending on the trait.
2.9. Levels of self-other agreement and accuracy vary across traits (Krzyzaniak & Letzring, 2021), which supports the notion that some traits tend to be judged more accurately than others. Specifically, extraversion is most consistently judged with moderately high levels of self-other agreement and accuracy, and neuroticism is usually associated with the lowest levels of self-other agreement and accuracy (Beer & Watson, 2008b). These differences are especially robust when perceivers are not well-acquainted with targets and are negligible or even absent when perceivers are well-acquainted (Connelly & Ones, 2010). Additionally, there is evidence that variance is an important predictor of self-other agreement such that traits with more variance tend to have higher agreement (Allik, Realo, Mõttus, Esko, et al., 2010).
2.10. There is substantial “cross-situational consistency” of personality judgments. When statistically controlling for “non-shared meaning” (i.e., between-perceiver differences in how the same terms are used to describe the same behaviors; Kenny, 2020), judgments by different observers who observed the same targets for a few minutes correlate ~r = .40-.60 with one another (e.g., Borkenau et al., 2004), which is the approximate proportion of variance in such ratings that is attributable to stable differences between targets. Judgments based on minimal exposure to targets in the lab also correlate with self and acquaintance ratings (~r = .10-.30, depending on trait and level of aggregation), both of which are based on more information from other situations (Borkenau et al., 2004).
2.11. Theoretically, some of the existing agreement between perceivers in judging the same targets (see Section 2.7) and of the existing cross-situational consistency (see Section 2.10) may be explained in terms of shared stereotypes based on static target characteristics (e.g., age, gender) rather than the targets’ actual behavior (Kenny, 2004). For example, if different perceivers attributed higher levels of some traits to targets merely based on the targets’ gender (i.e., exhibiting a shared bias), their consensus would improve. Note, however, that interpreting this in terms of “bias” is only justified to the extent that static and behavioral information are uncorrelated. Sometimes, static information may actually be informative regarding differences in the targets’ experiences and behavior, which has been called the “kernel-of-truth” (Jussim et al., 2009; Kenny, 2004). There is some evidence, however, that inter-rater agreement barely changes when accounting for gender and age (Borkenau et al., 2004), suggesting that static information may not actually introduce much bias in personality judgments.
2.12. Many studies have examined whether consensus and accuracy of judgments increase in correspondence to the amount of personality-relevant information that perceivers have about targets (Allik et al., 2016; Beer, 2021). This is referred to as the acquaintanceship effect because longer acquaintance is assumed to be related to having more information. Higher levels of acquaintance typically involve more opportunities to communicate with targets and to observe targets’ behavior in a larger variety of situations, which should improve the representativeness of cues that are used to make judgments. Judgments of personal acquaintances, especially ones who have higher levels of intimacy with targets, tend to be more accurate than judgments of strangers or less intimate acquaintances (with moderate vs. small effect sizes), especially for less visible traits such as emotional stability (Connelly & Ones, 2010). However, studies that experimentally manipulate the amount of information that is available to perceivers usually yield small to negligible effects, and most gains occur within the first minutes and hours (Beer, 2021). It is interesting to note that self-other agreement is often at above chance levels even when judgments are made by strangers and people with relatively low levels of acquaintance, especially for traits with more visible cues (Connelly & Ones, 2010).
2.13. Personality judgments are valid in that they connect to behavior in various contexts. Self- and informant-ratings of behavior frequencies and personality traits predict measured frequencies of some behaviors (Tackman et al., 2020; Vazire & Mehl, 2008). For example, judgments of targets’ intelligence predict performance on intelligence tests and GPA at moderate to large effect sizes (Borkenau et al., 2004; Murphy, 2007), and judgments of conscientiousness and emotional stability predict academic and job performance (Connelly & Ones, 2010). Furthermore, judgments of personality traits predict important life outcomes: Extraversion and Conscientiousness predict longevity and Neuroticism predicts romantic relationship quality (Ozer & Benet-Martínez, 2006).
In laboratory settings, personality judgments by both observers and targets are related to targets’ expressed verbal and nonverbal behavior. For example, both observer and self-judgments of Extraversion are positively associated with speech rate (Breil et al., 2021). However, the relation between observer judgments and measured target behavior (i.e., cue utilization) tends to exceed the relation between self or informant-rated personality traits and measured target behavior (i.e., cue validity) in magnitude. When cue validities correspond with cue utilization, judgment accuracy should be enhanced (Brunswik, 1956; Osterholz et al., 2021).
2.14. Both self- and other-judgments of personality do have some incremental predictive validity when compared to each other (Connelly & Ones, 2010; Vazire & Mehl, 2008), indicating that each perspective contains some valid information that the other perspective does not. For example, although self-ratings have some predictive validity by themselves, other-judgments of Conscientiousness, Emotional Stability, and Extraversion are better predictors of academic achievement than self-ratings, as are other-judgments of Conscientiousness for predicting job performance (Connelly & Ones, 2010).
3. Core Findings Using Analyses of Profiles of Personality Judgments
The second major approach to examining personality judgments is the profile or person-centered approach, in which agreement/accuracy is based on correspondence between judgments of multiple items and traits for a given perceiver-target pair. The profile approach uses multilevel modeling and profile correlations (Furr, 2008), and is featured in person perception models such as the Social Accuracy Model (Biesanz, 2021). The profile approach is useful when investigating personality judgment more broadly (i.e., not limited to a particular trait). Advantages of this approach include higher statistical power and the possibility to simultaneously include predictors at the level of perceivers, targets, dyads, and items. One disadvantage of this approach is being limited to using self- or other-judgments as validation criteria, because obtaining other types of (e.g., observational) data is often not feasible for more than just a few personality dimensions.
Here, we present some core findings based on the profile approach. Given that most of the relevant studies use the same type of data that could also be analyzed with a variable-centered approach, and that several parts of multilevel profile analyses may be directly translated into variable-centered analyses (Allik et al., 2015; Biesanz, 2021), we will focus on findings that are most specific to profile analyses. Also, given that there are currently some discrepancies in how certain technical terms (e.g., “normative accuracy”) are used, we decided to avoid such terms altogether.
3.1. The judgment profile of the average target is strongly, but not perfectly (~r = .80-.90), correlated with ratings of the items’ social desirability (Edwards, 1953). This is probably a direct reflection of the average perceiver’s positive attitude toward the targets, as most perceivers are quite fond of their targets and thus describe them positively across items, regardless of those targets’ actual personality characteristics. As a consequence, randomly paired judgment profiles are likely to correlate positively with one another. Not accounting for this fact may lead to erroneous conclusions regarding the “substantive similarity” of two profiles that are actually just rooted in their joined positivity (Wood & Furr, 2016).
3.2. Apart from social desirability, the average judgment profile also contains another kind of systematic variance which is best interpreted in terms of the average target’s actual, substantive characteristics (Rogers & Biesanz, 2015). People, on average, show higher levels of some substance variables (e.g., working) than other substance variables (e.g., relaxing). This is another, independent reason why two randomly selected profiles of personality items should correlate positively with one another.
3.3. The two aforementioned profile components (social desirability and the average target’s personality) may be disentangled from one another, and from a third component, which reflects the ways in which each target differs from the average target (Wessels et al., 2020). Studies using the profile approach show that perceivers agree to a moderate extent in which characteristics they attribute to particular targets - as opposed to the average target (~r = .20; Human et al., 2019; Rogers & Biesanz, 2019; Wessels et al., 2020).
3.4. The three aforementioned profile components respond differently to differences between dyads in “knowing” (how well perceivers know targets) and attitudes such as how much perceivers like targets. The more a perceiver knows a target, the more accurately he or she will describe both the characteristics that this target has in common with most other targets, and the characteristics that set this target apart from other targets. At the same time, greater knowing is associated with lower positivity of personality judgments overall. For liking, a largely reverse pattern of associations with the three profile components is found (Wessels et al., 2020). This is important because shared liking may by itself account for profile similarity (Leising et al., 2013), and differences in liking across dyads may account for correlations between profile similarity and positive outcomes reported by the same perceivers. For example, the more two partners like each other, the more similar their ratings of each other will be, due to shared positivity. Also, this greater similarity will correlate, across dyads, with other measures that reflect the same perceivers’ attitudes (e.g., relationship satisfaction).
3.5. The good target is someone who tends to be judged with higher accuracy than other targets. Research suggests that targets do vary substantially in this regard (Rogers & Biesanz, 2019). Some of this variance may be rooted in good targets’ behaving more consistently across time and situations (Human et al., 2019).
3.6. The good judge is someone who perceives others with higher accuracy than other perceivers. Research demonstrates that perceivers do differ from one another in this regard (Biesanz, 2021; Rogers & Biesanz, 2019), but there is typically less variability in accuracy across perceivers than variability in accuracy across targets; and this lower variability may explain why it is difficult to find characteristics of perceivers that are consistently related to accuracy (Allik et al., 2016; Colman, 2021). Also, being a good perceiver requires having a good target (i.e., perceivers need valid information on which to base judgments; Rogers & Biesanz, 2019).
4. What Important Questions Remain Unanswered in Personality Judgment Research?
There are still many important unanswered questions in the field, some of which remain debated among researchers of personality judgments, including amongst ourselves. In this section, we briefly touch upon some questions we think are important to address in future personality judgment research.
4.1. What exactly are the substantive characteristics underlying the personality judgments that people make using natural language terms? For example, what is it about a target that makes a perceiver say the person is “dominant” or “kind?” We know some of these indicators (e.g., judgments of intelligence are predicted by eye gaze at an interaction partner; Murphy, 2007; see also Breil et al., 2021), but the picture is far from complete. Perhaps even more important are questions about the factor structure of these substantive differences. Empirical findings (see Section 2.2) all reflect the structure of personality judgments, but this is not informative regarding the covariation among the underlying substance variables themselves. In fact, the Big Two/Five/Six may be rooted partly or even completely in how the average perceiver evaluates different behaviors by comparing them with the same set of goals or ideals (Borkenau, 1990). The question of how “substantive” personality judgment factors really are is highly relevant because only if they are substantive (i.e., there is co-variation among the target’s actual behaviors) will it be reasonable to look for their between-target (e.g., biological) correlates or even origins.
4.2. What characteristics of perceivers are consistently related to accuracy levels? Much of the existing good-judge research demonstrates meaningful relationships between accuracy and perceiver characteristics such as agreeableness, emotional stability, psychological adjustment, dispositional intelligence, and gender - with women being more accurate when a difference between gender is found (Colman, 2021; Letzring & Funder, 2021). These findings overlap with other interpersonal accuracy research (which is not specific to personality judgments per se) which indicates that accuracy is associated with many positive psychosocial characteristics (Murphy, 2016). On the other hand, there is research demonstrating a lack of relationships between perceiver characteristics and self-other agreement (Allik et al., 2016). Research indicates that there may be good judges of particular traits, rather than good judges of a profile of traits (Hall et al., 2018; Schlegel et al., 2017). Some of the inconsistencies in findings may be due to differences in acquaintanceship between perceiver-target dyads, the type of accuracy measure being used, and/or characteristics of the targets (e.g., Hall et al., 2018; Rogers & Biesanz, 2019).
4.3. Can the ability to be a good judge be trained, and if so, how? Several researchers have speculated that it is possible to instruct people in how to become better judges of others, yet empirical research is mixed. Most empirically-relevant research has focused on judging domains other than personality traits, such as emotion and deception (Blanch-Hartigan & Cummings, 2021). In these domains, the most effective type of training tends to include practice with feedback, with improvements in the small to moderate range. Future research could apply a similar paradigm to the study of personality judgment. If accuracy for judgments of personality may in fact be trained, that would be highly consequential to many fields such as personnel selection and forensic psychiatry.
4.4. Researchers have examined how perceptions of personality made by people are related to technology-based cues such as Facebook likes, language use in emails, and features of pictures that people post (Wall & Campbell, 2021). Although studies with very large samples already demonstrated that some technology-based cues are related to personality with low to moderate effect sizes (e.g., Youyou et al., 2015), further research is needed. This concerns the replicability of these findings, particularly with respect to the importance of individual cues in multivariate models of predicting personality, as well as how judges use technology-based cues in their judgments and how accurate these judgments are. As technology continues to evolve, future research will have to be responsive to new technologies in relation to personality judgments.
4.5. To what extent do our lab-based findings regarding judgments and accuracy have practical relevance for people’s everyday lives? Do the mechanisms of accuracy work the same way both inside and outside of the lab? There is some work in natural settings and much work has examined ratings of acquaintances and family members (Beer, 2021), yet more can be done to examine the ecological validity of personality judgment findings outside the lab. Furthermore, there seems to be a tacit assumption that better-than-chance results (i.e., statistical significance) indicate useful levels of accuracy, but statistical significance and common effect size metrics provide little insight into the practical relevance of such effects outside the lab (Murphy, 2016). For instance, a variable-centered accuracy finding of r = .22 translates to a 61% accuracy rate, which is 11% better than a 50% chance base rate (see Hall et al., 2008). Does an 11% increase in accuracy translate to meaningful differences in perceiver or target experiences?
4.6. There is considerable criticism that human behavioral studies, including psychology, mostly investigate WEIRD people – Western, Educated, Industrialized, Rich, and Democratic (Henrich et al., 2010), which limits generalizability. This pertains directly to the issue of the generalizability of certain structural models (see Section 2.2.), and to many other important issues such as how common it is to view and/or describe one’s self positively. There is relatively little cross-cultural research investigating personality judgments (particularly in non-WEIRD populations and zero-acquaintance judgments beyond the Big Five). Investigating personality judgments in diverse populations is an important objective for future research in understanding human perception processes.
Research on personality judgment is a vibrant and active area of study. As demonstrated in this review, much has already been learned about the words and phrases used to describe personality, the structure of personality judgments, and some factors that are related to agreement and accuracy. But there is still much to be learned, and we encourage researchers to continue to address a variety of important issues to further increase our understanding of how accurate personality judgments are made.