Wide-band personality inventories: comparing Hogan HPI, shl OPQ32r, Facet Five, and Trait-Map


As the lead author of Trait-Map, a wide-band personality inventory, I am often asked to compare our product to other similar products on the market. This is not an easy task, because the considerations of model creation and questionnaire development are not easily understood by people who are not deeply in the topic. It also takes time to really understand a personality assessment tool beyond the marketing materials: some key information is hidden in the Technical Manuals, and only the initiated can understand it. I make an attempt now to compare these assessments in an objective manner and hopefully I can help readers in better understanding these mysterious products.

I am comparing four instruments here: Hogan HPI, shl OPQ, Facet Five, and Trait-Map. They are all wide-band personality inventories. They all have unique questionnaires: the True -False statements of Hogan, the ipsative triads of OPQ32r, the opposing semantic pair items of Facet Five, and the five-item ranking-blocks of Trait-Map all provide a unique test experience. They all measure similar characteristics. They all have similar reliability and validity figures. Reliability is evaluated by scale consistency and test-retest correlation. Validity comes from reliability, scale definitions, item content and implementation methods. From the validity studies of publishers and comparative studies like Project Epsom, we can assume, that these wide-band personality tests have similar validity figures. These assessments are truly comparable.

As introduced above, the major difference between these tools is not their reliability or validity, but the user experience for the test takers and the applicability for the test users. I am comparing the assessments in three areas: psychological model, questionnaire format, and reports. Let’s start with a basic overview:

Inventory / Development time Model type Model Structure Questionnaire format
HPI Hogan Personality Inventory
(since the late 1970s)
Factor analytic model since 1992 41 Sub-scales in 7 Dimensions Normative: 206 True/False statements
shl OPQ
(first published in 1984)
Theoretical model, 32 traits since 2005 32 Primary Scales in 3 Areas (OPQ32r) Ipsative: 104 blocks of 3 items (312 items in total)
Facet Five
(since the late 1980s)
Factor analytic model 13 Sub-Scales in 5 Dimensions Normative: 106 items with semantic opposite description pairs
(since 2002)
Big Five informed theoretical model 25 Primary Scales in 5 Dimensions Ipsative: 25 blocks of 5 items (125 items in total)

Main Difference 1: Theoretical model versus Factor analytic model

The model is a central and crucial feature of all personality tests. Personality models are simplified representations of individuals that enable predictions. Personality models originate from observing, analyzing and classifying human behavior. Very often the models are based on previous work of other researchers. The models at the beginning are usually created intuitively by the authors, and they later get enhanced with quantitative methods. All of the four included models have been through some data-based optimization. If a model has been enhanced by factor analysis, then this model is classified as a factor analytic model. Theoretical models in this context mean that these models have not been optimized with the factor analytic methodology.

Why does this matter? Factor analytics is an objective method, and it enables a deeper understanding of the structure behind the data. It helps in reducing the number of scales. After reducing the number of scales, most researchers discover the same five dimensions emerging from the data: the Big Five. Many scholars believe the Big Five is the common denominator between all wide-band personality models. Unfortunately, factor analysis and the resulting reduction in scales and item content also have disadvantages. The optimized structure may be rather abstract, not easily captured with words and impossible to be given an accurate name. The other disadvantage is the constrained item set does not cover all facets of job-related traits. So what about the models of our four assessments? HPI started from an existing model, the CPI. HPI used factor analysis to establish its seven dimensions but kept the 41 sub-scales to provide a rich, high-resolution picture of the personality, resisting the idea of reducing their model to only seven scales. The OPQ32r includes those 32 traits that the authors found important for predicting competencies at work, and they have not optimized their structure with factor analysis. Facet Five is a typical factor analytic model, developed according to the book, although the resulting five dimensions emerging from their data are not fully aligned with the traditional Big Five. The model of Trait-Map was developed the other way around. The authors started from an existing academic factor analytic test, the IPIP-NEO which has a picture-perfect Big Five structure (30 traits in 5 dimensions), and the authors reworked part of the scale definitions and items to make them more relevant to work competencies. In a sense, they did the opposite of academic development, which starts from “messy” real-world items and purifies them. They started with a crystallized structure and embedded some “real-world content” into it. The resulting inter-scale correlation table is one where you still can clearly recognize the Big Five, but there are a few outliers. As we can see, these four tools have taken very different approaches in creating their models. Which model is the best? I would say the one that the practitioner is the most familiar with or the one that feels the most intuitive.

Main Difference 2: “ipsative” versus “normative” questionnaires

This is an important concept, and it worth the efforts to understand it. Let’s start with the definitions from Wikipedia, see what do the words “ipsative” and “normative” mean:

Ipsative: “Ipsative (/ˈɪpsətɪv/; Latin: ipse, "of the self") is a descriptor used in psychology to indicate a specific type of measure in which respondents compare two or more desirable options and pick the one that is most preferred (sometimes called a "forced choice" scale).”

Normative: “A norm-referenced test (NRT) is a type of test, assessment, or evaluation which yields an estimate of the position of the tested individual in a predefined population, with respect to the trait being measured.”

In short, the ipsative format forces people to make “either-or” choices between items, while normative format gives more freedom to people, and compares their responses to responses of a sample of people. Using a metaphor, imagine you run a study on the shopping habits of people. The ipsative method would say “We give you 50 Euro. Please go to this supermarket, and spend 50 Euro there, and please buy the items you typically buy. You need to spend 50 Euro, no more, no less. You can keep the purchased items after the shopping.”

The normative version would be: “We give you 100 Euro. Please go to this supermarket, and please buy the items you typically buy. You can spend any amount of the 100 Euro, but you need to return us the remaining money. You can keep the purchased items after the shopping.”

Both methods would provide some valid data about the buying behavior of people, but both methods would bring in some distortions too.

Let’s use another metaphor, closer to the world of questionnaires. Imagine we would want to know which subjects people preferred back in their primary school days, let's say among Math, History, Drawing, Music, and Physical Education. The ipsative questionnaire would say: Please rank the following subjects in the order of how much you liked them. Write “1” in the bracket next to your favorite subject; “2” in the next, … and “5” next to the subject you liked the least”. The normative questionnaire would say: “Please rate yourself from 1 to 5 indicating how much you liked the following subjects. “1” means hated it, “2” means somewhat liked, … “5” means loved it.”

Again, we can see that both methods can provide some useful data, and both methods suffer from some distortions and weaknesses. This is very similar in using ipsative and normative questionnaires in personality testing. Having this understanding, now we can sink our teeth into the actual comparisons, going through the strengths and weaknesses of each test.

HPI Hogan Personality Inventory

The HPI began in the late 1970s as a university project, using the California Psychological Inventory model as the starting point. The idea was that self-presentation of one’s reputation is a good predictor of behavior. The authors created a set of unique “True – False” normative questionnaire items with creative content. The unusual format and content take people by surprise and reduce the risk of faking. The authors gathered data and conducted correlation analysis to explore the structure of the construct, and they structured their items in 7 Dimensions and 41 sub-scales they call Homogenous Item Composites (HICs).

Strength: a wealth of research data, publications, and materials. The HPI has well-documented norms with large sample sizes.

Weakness: the HPI started as a university project, and even though it was normed with huge numbers of working adults in all sort of professions, the questionnaire items show their roots in general psychology, and some of the items seem to be irrelevant in the world of work. Some HPI sub-scales names also sound strange at the workplace (e.g. No Guilt, Identity, Likes Parties, Exhibitionistic, or Easy to Live With).

shl OPQ32r

Personality testing before the 1980s was the realm of psychologists, and usage by organizations was not widespread. The California Psychological Inventory and the 16PF were the most often used assessments of the time. If you look at a CPI or 16PF sample report, you will be confronted with terminology and graphs that challenge the uninitiated. Hogan is a step towards our current civilization but was not the revolution people needed. There was a huge gap between the personality tests and the demand from the workplace for more applicable tests that HR folks (personnel was the name back then) can also understand. This was the scene where shl published OPQ in 1984, a questionnaire that could be used by trained HR people, effectively breaking the monopoly of psychologists. OPQ was the revolution people needed. The OPQ actually was a family of personality questionnaires, not just one test. The OPQ Pentagon measured the Big Five; the Octagon, Factor, and Concept versions of the OPQ measured respectively 8, 16 and 30 scales. Interestingly the multi-scale versions became more successful, and practitioners favored ipsative versions. This led to the development of the OPQ32i, which became very popular in large corporations worldwide and dominated the assessment scene for many years.

It seemed odd that the OPQ32i, the preferred tool of practitioners was an ipsative questionnaire based on a theoretical model. This went very much against what is taught in psychometry classes in universities, and shl, founded and led by psychologists, got a lot of heat from their colleagues. shl tried to promote the factor analytic OPQ model and OPQ32n (a version with normative questions), but practitioners did not pick up these “scientific” versions and shl finally abandoned development in these directions. Shl found the way out from the awkward fallout with academics by switching to an Item Response Theory-based (probabilistic) scoring of their forced-choice questionnaire, creating the current OPQ32r version. The new version also made the marathon-length OPQ 25% shorter.

Strengths: ipsative questionnaire format that is somewhat more resistant to faking. The comprehensive profile helps to generate competency, team roles, leadership, follower style, sales style reports and there is a lot of resources to help application, for example, integration with 360 to create competency/potential reports. The OPQ has also global reach and support, it has been translated in more than 30 languages.

Weakness: The questionnaire is long (the longest among these four).

Facet Five

Assessment practitioners tend to prefer ipsative (forced-choice) questionnaires, while academics don’t like ipsative data, they prefer normative ones. Academics can’t do their favorite calculations with ipsative data. Normative questionnaires also have problems, but those issues don’t disturb the calculations used in academic publications. Therefore, the academic world is using normative questionnaires. The ipsative format, in general, is better in preventing distortion caused by social desirability (faking), and shows bigger differences in profiles, and these are features that real-world practitioners like. Facet Five was an attempt to prove that normative personality questionnaires can outperform ipsative ones also in the real world. The authors used a creative “semantic opposite description pairs” item format. The unusual format and content take people by surprise and reduce the risk of faking. The personality model is also creative. The authors reduced the number of scales using factor analysis technique, achieving a relatively lean personality model with 13 Sub-Scales in 5 Dimensions.

Strengths: relatively simple profile among wide-band personality inventories, thanks to the low number of scales. Individual profiles are sorted into one of 16 types called “family”. The authors created comprehensive narrative reports for each family, which enables Facet Five to provide superb narrative reports that are easy to read and understand.

Weakness: the content of the scales are somewhat limited: the “Openness” Big Five dimension is missing. Because you need to read two descriptions for every item, the questionnaire is relatively long, especially when we consider the low number of scales. The “family” approach brings in some error in the narrative reports.


Trait-Map takes a different approach in the ipsative-normative debate. The authors noticed an important feature of personality that is seldom emphasized, but everybody intuitively agrees to: while we can rank people from many perspectives, for example by age, height, IQ or skill levels, we cannot reasonably claim that some people have “higher personality” or “more personality” than others. Translating this common-sense observation to personality development terminology, this means that people differ in their trait composition, but the total sum of personality (the total sum of their trait scores) is the same for everyone. Therefore, ipsative questionnaires may be better measures of personality than normative ones. The authors of Trait-Map saw the challenge of ipsative questionnaires not in the scoring method like shl, but in the fact that items in the same block interfere with each other, and that interaction is a source of distortion. Another problem is the length, ipsative questionnaires tend to be much longer. The Trait-Map questionnaire design is an answer to these problems, using mathematics (combinatorial optimization) to minimize this distortion by spreading it evenly among the 25 traits.

Strengths: ipsative questionnaire format that is somewhat more resistant to faking. Comprehensive, yet relatively easy to use. As the youngest test in the group, the Trait-Map items are the closest to the language of contemporary business. The questionnaire is the shortest, the scale names and charts are the most user-friendly among these four assessments.

Weakness: fewer publications and available translations.

Written by: Gabor Nagy

More information about Trait-Map: Personality Assessment: Trait-Map®