Computational methods have increased the objectivity of measures of human behavior and positioned personality science to benefit from the ongoing digital revolution. In this review, we define and discuss computational personality assessment (CPA), a measurement process that uses computational technologies to obtain estimates of personality. We briefly review some of the most promising sources of data currently used for CPA: mobile sensing, digital footprints from social media, images, language, and experience sampling. We present a concise overview of key findings, discuss the promise and opportunities of CPA (e.g., moving towards objective measures of personality, obtaining new insights from big data), and highlight important limitations and challenges in the development and application of CPA (e.g., establishing reliability and validity, selecting appropriate ground truth criterion, assessing affect and cognition, implications for ethics and privacy). We conclude with our perspective on how CPA could change our understanding of individual differences.
Computational personality assessment will drastically impact personality science.
Objective measures of behavior, thought, and feelings will be essential.
Summary of key data sources, results, opportunities, and challenges.
Computational personality assessment based on digital footprints and high frequent behavioral data from in-vivo sensing studies could drastically change the concept of personality and its assessment.
It has been said that the quality of a scientific discipline flows from its tools of measurement (
In personality psychology, the phenomena of interest are individuals and their characteristic patterns of thinking, feeling, and behaving (
A core challenge for personality assessment is that the phenomena of interest are complex patterns in individual differences that partially manifest over time. These patterns are typically represented as latent constructs (e.g., traits and states, values, goals, identity) that are mostly assessed using self-report methods (
Over the past decade, advances in computing technologies for data collection (e.g., social media platforms, smartphones) and analysis (e.g., machine learning) have made behavioral observation studies increasingly viable (and valuable), enabling more and better use of objective measurements in addition to subjective questionnaire items (
Research at the intersection of personality psychology and computer science has already begun to 1) harness computational technologies to collect and store new sources of personality-relevant data, and 2) model such data to describe, understand, and predict individual differences. Given that computational technologies have and will continue to change measurement in psychological science, here we provide an overview of the most promising sources of data and current approaches being used in computational personality assessment.
We define computational personality assessment (CPA) as a measurement process that uses computational technologies to obtain estimates of personality. Specifically, CPA uses digital indicators and computing technologies for data processing and analysis to identify patterns in personality-relevant data (e.g., phone logs, digital footprints) to estimate personality at various levels (e.g., traits, states, processes). In
In this paper, we focus on five sources of data that are essential to CPA: 1) mobile sensing, 2) digital footprints from social media platforms, 3) images, 4) language, and 5) experience sampling. The data from these sources are not completely independent from one another, but we find the different categories provide a useful way of organizing a state-of-the-art review of the research in this burgeoning area. We then highlight some core opportunities for research on personality assessment through a computational lens, and critically discuss the limitations and challenges that need to be addressed for CPAs to become standardized psychometric tools. Finally, we conclude with our perspective on how CPA could fundamentally change our understanding of individual differences in thoughts, feelings, and behaviors.
Consumer electronics that are equipped with mobile sensing technologies (e.g., wearables, smartphones, IoT) are a pervasive feature of daily life for many people around the world. While as of 2017, an estimated 5% people in developed nations (11% in developing) own smartwatches that they wear on their wrists, an estimated median of 91% (80% developing) of the developed world’s population owns smartphones that they carry around as they go about their day, and an estimated 3% of people living in developed (6-10% in developing) countries own smart home devices that reside in their homes
These technologies rely on mobile sensors (e.g., accelerometer, microphone, GPS) and metadata (e.g., call and SMS logs, app use logs) to provide services central to the functioning of the device, such as activity recognition, voice detection, information retrieval, or location tracking (
To date, studies have primarily focused on demonstrating how self-reported personality trait levels can be estimated or inferred from sensing data, evaluating machine learning models that classify or predict a person’s self-reported Big Five trait scores. Taken together, past work on the prediction of personality traits from smartphone data suggests that individual levels of personality traits can be predicted from sensing data above chance for most traits (e.g.,
Social media (SM) has become an integral part of the lives of millions of people around the world (
The information captured via digital footprints from SM platforms includes both carefully curated identity claims as well as inadvertently generated behavioral residue (
With the growing interest among psychologists in Big Data and machine learning, researchers have increasingly started to investigate the feasibility and validity of automated, computer-based predictions of personality from online digital footprints. For example, computational modeling has been used to successfully predict self-reported Big Five personality trait levels from Facebook Likes (
The unprecedented ability to automatically retrieve information on peoples’ personality at scale has paved the way for SM platforms and third parties to use insights into people’s personality to shape their experience on the platforms themselves. For example, SM platforms and marketers can utilize predictions of personality traits from digital footprints to tailor the content a user is exposed within a platform to their specific psychological needs and preferences (
Images, including those shared on SM platforms, personal and professional websites, or offline settings, are rich sources of information for personality assessment. Cameras are embedded in many everyday devices (e.g., smartphones, cars) and are used extensively in both private and public settings (e.g., security systems in homes, CCTV systems), providing further visual information about individuals and their behavior that may be recorded and analyzed automatically (e.g.,
Though the human brain is carefully attuned to make sense of the vast stream of visual data our eyes perceive, this has historically proven much harder for computers. Nonetheless, with the advent of modern computing resources and improved algorithms, researchers have made great strides in the field of computer vision. For example, in recent years, scientists have greatly increased our ability to computationally detect objects, track motion, recognize actions, and estimate poses from image data (
Early work in this space has shown promising results. In one study, participants were fitted with a wearable camera, which automatically took pictures of their environment throughout the day. Objects in the captured images were automatically classified via computer vision algorithms and were shown to predict both self-reported personality traits and situational characteristics (
Moreover, faces can reflect a person’s emotional expressions and transient states, but could they offer clues to more enduring psychological dispositions? The face is one of the most central aspects of a person's image, yet it remains relatively underexplored as a personality-relevant feature of image data. Alongside the progress of the big data revolution, facial images and videos are becoming ever more prevalent, making appearances in profile pictures, SM posts, and online meetings. Even beyond the digital realm, faces are both easily accessible and extremely salient stimuli for human beings. Initial investigations in this area show some evidence that this may be possible. Researchers have demonstrated that machine learning models can be trained to predict psychological dispositions, including personality, from images, such as those on SM (
Nonetheless, it is important to note some limitations present in research using images for personality assessment. First, it is unknown whether such predictions rely on natural facial appearance or other low-level image cues, such as ambient lighting in the photograph. For example, extroverts are more likely to take pictures in bright environments (
Language data (e.g., data captured from verbal behavior, broadly defined, such as written text or audio recordings) is a central type of personality-relevant information available via computing technologies, such as SM and smartphone apps. Moreover, following standardized self-report questionnaires, language data being used as a source of diagnostic information has perhaps one of the oldest traditions in the field of personality assessment. Psychologists have privileged verbal behavior — particularly written and spoken words — as something of a direct pipeline to personality itself. Dating back even to before Psychology’s formative days, a person’s words were treated as a distillate of their underlying drives, emotions, and thought patterns: the swirling miasma of the preconscious, or the very essence of what ultimately makes someone who they are as a person (
Early on, language data within personality science often consisted of conspicuous responding to explicit prompts — such as dream reports, self-description tasks, or projective tests — or cultural artifacts, such as books, newspaper articles, and so on (e.g.,
Today, the theoretical underpinnings of analysis of language data are relatively simple when compared to those of the early-to-mid 20th century. From a modern psychometric perspective, language data is most often (though not always) treated as a reflection of one’s attentional habits, loosely defined (e.g.,
Language data has been used extensively for the passive, non-invasive assessment of personality across a variety of study paradigms. More specifically, personality research with verbal behavior commonly takes language data in its raw, “unstructured” form; words are coded as belonging to different domains: emotions (e.g., “happy”, “sad”, “nervous”), sociality (“friend”, “family”, “togetherness”), metacognitive (“think”, “understand”, “wonder”), and so on (e.g.,
Experience sampling methodology (ESM; also known as Ecological Momentary Assessment [EMA]) describes the repeated assessment of a person’s thoughts, feelings, behaviors through active self-reporting (
Current ESM approaches that allow for a somewhat standardized (yet adaptive) computational interaction with participants — for example via artificial conversational agents (i.e., chatbots) — could allow for the investigation of more complex and dynamic aspects of personality (e.g., communication processes) and apply interventions (e.g,. personality change;
Having provided a review of research centered on the most promising types of data for CPA, we next turn to a discussion of the key opportunities, challenges, and directions for this area. In doing so, our aim is to outline how these sources of data might catalyze new directions of personality research and assessment, while critically discussing the current limitations of these methods and how they might be addressed moving forward.
Computational approaches to personality assessment provide new ways of quantifying individual differences in thoughts, feelings, and behavior and estimating those differences, in an increasingly potentially automated fashion. The field of personality psychology has begun to recognize the added value of CPA to traditional forms of personality assessment (e.g., cheaper, more accurate, different aspects of personality). The degree to which CPA continues to expand within the discipline will depend, in part, on its ability to help solve pressing real-world problems (e.g., behavior-change for climate action, recognize and prevent psychopathology, better matching of romantic partners and jobs with applicants). Perhaps the most promising aspects of CPA lie in 1) its increased reliance on more objective sources of data (in comparison with self-reported personality inventories) and 2) its ability to accurately model more complex and dynamic patterns of individual difference in large-scale datasets. Both aspects could help to improve the assessment and the conceptualization of personality at state, trait, and process levels.
In self-report personality questionnaires, people are asked about their typical behaviors, thoughts, and feelings (e.g., their tendency to socialize or be honest in relationships), which can trigger socially desirable response biases. Digital footprints and mobile sensing data in contrast could potentially allow for the automated, objective quantification of a person’s behaviors, thoughts, and feelings (e.g., their tendency to communicate with others or use dating apps while in a relationship). In that sense, objective behavioral data for CPA could help to overcome the well-known problems of questionnaire-based assessments (e.g., memory, social desirability, response-styles, faking) and reduce participant burden by enabling passive and automated personality assessment. Moreover, objective behavioral data can be combined with self-reported assessments to provide complementary information on personality traits and states. Consider, for example, a person who spends little time communicating with others (based on sensed data) but reports being a talkative person — this discrepancy may itself reveal key personality information that would not otherwise be available through either source of data by itself.
Behavioral observations from digital data sources can reflect personality-relevant information, but do not necessarily represent a person’s standing on personality dimensions (e.g., sociability, honesty) on their own. However, when these and other indicators are considered
It is also known that some aspects of human personality are better assessed by the self, some are better assessed by others, and some aspects are neither assessed well by the self nor others (
CPA will enable the quantification of personality expressions in a more fine-grained and holistic way that better accounts for the dynamic components of personality. Specifically, will it allow us to simultaneously model behavioral, situational, cognitive, and emotional processes in a dynamic fashion over time, as described in contemporary personality theories (e.g., Whole Trait Theory;
Despite these promising opportunities, there are several challenges that limit the ability to use these new technologies for personality assessment. One frequent argument against using digital footprints, mobile sensing, and other digital sources of data for assessment purposes is that these new techniques still lack basic evaluations of their psychometric properties (e.g., reliability, validity). For CPAs to be a viable complementary approach to more traditional self-report assessments in applied settings, these new assessments must demonstrate adequate reliability (e.g., over time) and validity (e.g., by predicting relevant life outcomes). For example, a starting point could be to evaluate the extent to which the new assessment’s properties compare to the lowest thresholds for reliable and valid self-reported personality assessments (e.g., single-item trait measures).
CPAs from digital behavioral data are also impaired by the lack of understanding in how reliable obtained personality inferences might be. Several phenomena that are particularly relevant to these new approaches must be considered. For example, concept drifts (cf. longitudinal measurement invariance in psychometrics) refer to a change in the informativeness of digital behavioral indicators that are used as predictors with regard to a criterion (
Another big challenge in CPA is to test the validity of newly developed models (
For example, the item “I am a talkative person” is used solely as an indicator of Extraversion in personality trait inventories (unidimensional), but the average number of calls a person makes at night could be an indicator for levels of sociability and self-consciousness (e.g.,
Central to the challenge of establishing validity is a more foundational issue regarding the appropriate “ground truth” criteria for evaluating CPAs. The ground truth problem can be boiled down to a conceptual question: What is the most representative and unbiased conceptualization of systematic patterns in human feelings, thoughts, and behavior (i.e., personality)?
Virtually all of the latest computational approaches to personality assessment rely on self-reported personality trait scores as the ground truth criteria for evaluating the performance of the newly derived assessment models (e.g.,
However, it is questionable how useful the concept of ground truth really is in CPA in general. It might be rather useful to think of self- and other reported personality information to be complementary to purely behavioral measures (
Putting issues of validity aside, the question of how to best conceptualize affective and cognitive components of personality in an objective manner remains unsolved. The inherent subjectivity of thoughts and feelings, make it very difficult to directly measure them in objective ways (
Some recent studies hint at the possibility of measuring cognitive (
All research that has the potential to inform or influence people’s lived experiences — particularly work related to people’s personal thoughts, feelings and behaviors — comes with inherent ethical challenges; CPAs are no exception. One of the most salient ethical implications of automated CPAs is the potential loss in agency and control of the assessed over the diagnostic process. In contrast to self-report questionnaires that are widely used for personality assessment, computational assessments do not require people to actively provide data for assessment. This is the case when the methods that are used for CPA, reveal patterns that would otherwise have remained hidden in the data (e.g., in faces;
Another ethically challenging aspect of CPAs is the potentially persistent accountability of people for their past actions that is created by the comprehensiveness of the data collection process involved and the online-persistence of digital footprints and created models (
Finally, it is possible that different conceptualizations of individual privacy and related ethical positions will influence the use of CPAs under different cultural and political norms. Whereas China has taken a more aggressive approach towards automated computational assessments and algorithmic scoring mechanisms
Scientific progress is often characterized by improvements in the methods that it uses to investigate and conceptualize phenomena of interest. In personality science, we are on the path from personality prediction towards CPA, but remaining issues with regard to the objectivity, reliability, validity, and ethical aspects of CPAs must first be addressed. While CPA is still in its infancy, it is poised to both drastically change how we assess personality and how we think about it conceptually.
For this article an Open Peer-Review is available via PsychArchives (for access see
This research was supported in part by a Stanford HAI Seed Grant, the National Science Foundation (SES-1758835), the Federal Bureau of Investigation (15F06718C0002523), and the Swiss National Science Foundation (196255).
Sandra Matz is a member of the editorial board of the journal.
We want to thank the reviewers, the scientific community, and the organizational behavior reading group at the University of Zurich, CH for their helpful input on an earlier version of the manuscript.
No ethical issues and/or ethics approvals need to be disclosed.
Before submission of the manuscript to