Six domains of personality 1 A Spectral Clustering Approach to the Structure of Personality : Contrasting the FFM and HEXACO Models

Alternative analytic methods may help resolve the dimensionality of personality and the content of those dimensions. Here we tested the structure of personality using spectral clustering and conventional factor analysis. Study 1 analysed responses from 20,993 subjects taking the 300-item IPIP NEO personality questionnaire. For factor analysis, a five-factor solution recovered the FFM domains while the six-factor solution yielded only a small and hard to interpret sixth factor. By contrast, spectral clustering analysis yielded six-cluster solutions congruent with the HEXACO model. Study 2 analysed data from 1128 subjects taking the 100-item HEXACO-PI-R. Unambiguous support was found for a six-cluster solution. The psychological content of the 6 clusters and their relationship to the FFM domains is discussed.


Introduction
Taxonomy is basic for any scienceoften being referred as the "facts" of a field, for which theories then compete to account (McCrae & John, 1992). The end of the 20th century saw the emergence of a broad consensus regarding the basic dimensions of personality as consisting of five orthogonal, broad bandwidth domains, with numerous facets clustered beneath these core domains. Variants such as the Five Factor Model (FFM) (McCrae & Costa, 1997, 2003 or the Big Five (Goldberg, 1990) have often been seen as opportunities to refine and redefine the orientation of domains within this space, but both support five basic domains of personality. Considering the question of whether there are additional basic factors of personality not already included among the FFM, McCrae and John (1992, p. 190) concluded that this -appears increasingly unlikely, given the wealth of data in support of the comprehensiveness of the FFM‖. However while the item space of large FFM inventories may be comprehensive, there remains the possibility that the contents of personality may usefully be arranged in terms of other than five basic domains.
One actively researched alternative to the FFMthe HEXACO modelsuggests that personality consists of six rather than five basic domains, with the additional domain being one of Honesty-Humility (Ashton & Lee, 2007). Models with fewer domains, for instance Eysenck's PEN structure, retain some support also (e.g. Tiliopoulos, Pallier, & Coxon, 2010). A second area of research focuses on psychological content: the characteristics of high and low poles on each of the basic domains of personality (e.g. Costa & McCrae, 1998). Alternative methods to analyse the structure contained in personality data may yield differential support or elucidate differences between these competing models.
Measuring the structure of personality requires both that the questionnaire or other data adequately sample the universe of personality variation, and that the analytic methods applied to these data are sensitive to this information. The Five Factor Model emerged from a research programme designed, above all, to generate a comprehensive sampling of human personality (McCrae & John, 1992). If the FFM items universe is comprehensive, then adequate analytic tools should then reveal the basic dimensions of personality, even if these are fewer or larger than five. In Studies 1 and 2, we apply one such toolspectral clusteringto two large (n = 20,993, n = 1,126) data sets, one sampling the FFM domains, and one explicitly designed to sample the HEXACO domains.
Six domains of personality Before proceeding to these two empirical studies using spectral clustering, we first outline the logic of this method, endeavouring to leverage the reader's existing knowledge of factor analysis.

Spectral Clustering
Factor analysis and confirmatory factor analysis (Joreskog, 1969) remain the primary techniques for exploring structure in questionnaire data and are the foundation for the FFM. Of course a set of well known methods have emerged to selecting number of factors in exploratory factor analysis both visually (Cattell, 1966) and analytically (Horn, 1965;Revelle & Rocklin, 1979). A vast range of research has also focussed on contrasting distinct solutions for personality structure via exploratory and confirmatory techniques (e.g. Church & Burke, 1994), and determining criteria for evaluating the structure of personality. Confirmatory factor analytic studies of the FFM have suggested poor fit to the theorised model (Gignac, Bates, & Jang, 2007).
Newer exploratory SEM approaches, however, which relax the strict criteria of CFA, allowing an EFA-structure at the item level, indicate that well-fitting five-factor models can be constructed within this framework (Marsh et al., 2010). Interest in exploratory SEM is growing, as is interest in determining criteria for evaluating structure more generally (Hopwood & Donnellan, 2010).
Alongside conventional factor analysis-based approaches, alternative analytic strategies widely used outside personality, been used to address the question of the basic structure of personality (Tiliopoulos et al., 2010). In particular, spectral clustering (used in the present report) has emerged as a valuable tool among clustering techniques (Von Luxburg, 2007).
While spectral clustering shares with factor analysis the basic objective of creating a lowdimensional representation of data, it differs substantially in its optimisation target (see e.g., Braun, Leibon, Pauls, & Rockmore, 2011;Leibon, Pauls, Rockmore, & Savell, 2008;Ng, Jordan, & Weiss, 2001). Factor analysis operates on the correlation matrix and tries to replicate it as closely as possible in a smaller number of dimensions (Cattell, 1978;Spearman, 1927).
Spectral clustering also has the objective of summarising the data in fewer dimensions, but differs from factor analysis both because it operates on a spatial transformation of the correlation matrix (where each item becomes a point in space) and because it attempts to segregate these items into discrete clusters so that the similar items are kept in the same cluster, while dissimilar Six domains of personality items are put into different clusters (see e.g., Braun et al., 2011;Leibon et al., 2008;Ng et al., 2001).
Aspects of spectral clustering have close analogues with factor analysis. For instance, spectral clustering has a cluster-number parameter (k) corresponding to the number of factors requested for extraction in conventional factor analysis. Other aspects, however, are quite distinct. In particular, spectral clustering translates correlations among items into distances (sometimes described as adjacencies -the inverse of distances), and can transform these distances to emphasise particular kinds of relationships by varying what is known as the scale parameter (sigma).
The explanation below is designed to give the casual reader a good intuition about spectral clustering; in addition, online supplementary material is provided which contains a technical description, a full list of references and a runnable Matlab implementation of the procedures for those wishing to implement the procedure and/or to learn more about it. Readers interested in the formal mathematical detail should of course consult the source references.

Mapping correlations as points in space
The algorithm underlying spectral clustering converts the correlation matrix into a spatial representationpoints in space (See Figure 1). Doing this requires taking the correlation matrix and converting each correlation into a measure of distance (i.e. how far apart is each pair of items?). The measure of distance has two important properties: (i) the measure cannot fall below zero (distances cannot be negative), and (ii) distances are smallest when correlations are largest (so that the items are close together in space), larger for independent items, and largest for negatively linked items.
The conversion of data from a conventional n x n correlation matrix of variables into spatial distances between these variables is shown in Figure 1. As shown, items that correlate strongly positively are placed near to each other. Items having a strong negative correlation with each other are placed most far apart. Intermediate correlations translate into intermediate distances. The 3-item case depicted in Figure 1 is chosen so that each item distance can be realised in the 2-D plane of the page. In the general case, mapping n items can require up to n-1 spatial dimensions. With the translation from correlations into distances achieved, Figure 2 shows how a spatial representation of items is split into clusters.

Six domains of personality
The clustering operation (See Figure 2) creates solutions with k-clusters of items by cutting the paths connecting item-pairs. This generates sets of items each connected to each other within a cluster, but not connected to any item outside the cluster. This is an iterative procedure, which can take a considerable time, as multiple alternative cutting solutions are generated and compared against a criterion, namely to minimise the total length of remaining connections while forming k-clusters of items. This is shown in Figure 2: It can be seen that this criterion results in the creation of clusters consisting of items that are close to one another, but distant from items in other clusters. We next discuss a feature of spectral clustering which allows it to re-scale the translation of correlations into distances, and which plays an important role in allowing this method to detect structure in data.

Scale Parameter: Sigma
A specific advantage of the spectral clustering algorithm is its inclusion of a scale parametersigma (See Figure 3). By adjusting this parameter, it is possible to vary the relative weight placed on the weakest versus the strongest correlations when performing the optimisation. The scale parameter has some analogues in processes widely used to re-weight correlation matrices in other fields (Sammon, 1969), but has not been deployed in Factor Analysis of personality. The ability to reweight correlations is valuable for discovering and understanding structure within data. In particular, setting the scale parameter to a low value emphasises strong correlations among pairs of items. For personality data, the relevance of scale is particularly apparent if a questionnaire is thought to contain a small number of items strongly targeting a domain. For instance, in the present case, we expect a questionnaire designed primarily to assess the FFM domains to contain a relatively small number of Honesty-Humility items. If the HEXACO model is correct these items will nevertheless show strong correlations with each other, and relatively weak correlations with items in the FFM domains. Figure 4 shows how a low value for sigma can correctly identify clusters represented by only a sparse set of valid Six domains of personality items. The low values of sigma increase all inter-item distances but magnify large distances disproportionately more. This has the effect of increasing the distance between valid clusters so they can be more readily identified (this effect of sigma is material for Study 1 below, where we examine personality structure in a questionnaire believed to have sparse coverage of one domain).
----------Insert Figure 4 about here ----------When all domains are well represented in the original set of items, changing sigma will not necessarily have any effect on the best-fitting value for cluster number (k) (See Figure 5).
With adequate numbers of items measuring each domain, it is possible to readily identify the corresponding cluster at all value of sigma (this effect is material for our hypotheses in Study 2 below).
----------Insert Figure 5 about here ----------Thus spectral clustering affords the possibility, but not the necessity, of discovering additional meaningful clusters. It can provide both a robustness check when compared to factor analysis, and, because of the control it offers over sensitivity to item relationships, can detect clusters sampled by relatively few items. Next, we apply this method to a large data set of IPIP NEO items.

Introduction
In Study 1 we conducted factor analytic and spectral clustering analyses of responses to a battery of 300 items selected to represent the dimensions of the Five Factor Model (Johnson, 2005). This allows us to contrast support for the major competing personality models: FFM models, six-domain HEXACO models, and models arguing for other numbers of basic domains, Six domains of personality such as Eysenck's three-factor PEN model (S. B. Eysenck, Eysenck, & Barrett, 1985). To the best of our knowledge, this is the first application of spectral clustering to personality data.
If five domains correctly describe personality, then we expect spectral clustering analysis to recover this, as factor analysis does. If, however, the six-domain HEXACO model (Ashton & Lee, 2005) is correct, and if the items of the NEO richly sample this domain, then spectral clustering should recover this six-cluster solution at all values of sigma, the scaling factor.
Finally, if six clusters correctly characterise personality structure, but the IPIP NEO items only sparsely sample the Honesty-Humility domain, then the six cluster solution should fit well at lower values of the scaling parameter. This outcome emerges because small values of sigma optimise the detection of clusters that contain relatively few items sampling a domain, when these few items are weakly related to the main bulk of the items. In a questionnaire optimised to sample the domains of the five-factor model it is likely that relatively few questions exist which primarily sample Honesty-Humility, and thus this last outcome is the one we deemed most likely.

Method
Participants. For this analysis, we used data originally collected by John A. Johnson (2005). Johnson's study included over twenty thousand online responses to a five-factor personality inventory. Subjects for this study were not actively recruitedthey either discovered the web site on their own or heard about it via word-of-mouth. In total, 23,994 subjects participated between August 6, 1999 and March 18, 2000. -Reported ages ranged from 10 to 99, with a mean age of 26.2 and SD of 10.8 years.‖ (Johnson, 2005, p113). One of Johnson's aims was to discover effective strategies for detecting and excluding invalid data in online data acquisition. In the end, 20,993 of the original 23,994 participants' submissions were retained after excluding subjects for long strings of identical or missing responses and duplicate submissions detected via a algorithm comparing duplicate IP addresses, time intervals, and nickname overlap. The final sample consisted of 13,249 females (mean age 26.1 years) and 7,744 males (mean age 26.2 years).
Measures. Subjects completed 300 items from the International Personality Item Pool's (IPIP) NEO questionnaire, which has been developed to measure the same constructs as the NEO-PI-R (Goldberg et al., 2006). The test includes five domain-level constructs: Neuroticism Six domains of personality (N), Extraversion (E), Openness (O), Agreeableness (A), and Conscientiousness (C). Both the IPIP NEO and the NEO-PI-R measure six facets per domain. The mean correlation of facets on the IPIP proxy with corresponding facets in the NEO-PI-R is 0.94 after correcting for unreliability (Goldberg, 1999).

Analyses
Prior to analysis, scores on reverse-scored items were re-coded. Spectral clustering was done across a range of cluster numbers (k: 2:10) and across a range of values of sigma (from .4 to 1 in steps of .05). We did not consider values of sigma below 0.4 because below this threshold, the resulting network of items and their relations becomes disconnected (many of the distances go towards infinity). Best-fitting solutions were chosen according to cluster consistency evaluation. This method of evaluating fit involves generating solutions for all studied levels of k and sigma in the full data set. Additional solutions are then generated based on random samples of half the items. This is repeated 1000 times for each possible combination of k and sigma. The proportion of sampled items whose cluster classification differs from that in the clustering based on the full set of items indicates inconsistency. The best-fit value for k and for sigma is that which generates solutions that, across the 1000 repetitions, show the lowest proportion of reclassified items. The analysis took several days of runtime on the computer.

Results
The goodness of fit measure used for parameter selection is the average level of itemmisclassification in the resampling analyses. Figure 6 shows the minimum percentage classification error across 100 runs for each value of sigma and k (the number of clusters, analogous to the number of factors in factor analysis). Darker shading indicates greater misclassification (bad fit), with lighter shading indicating less misclassification (good fit). The minimum error in each row is marked with a star; more than one cell may be starred in a row if the minima are not significantly different from one another. Table 1 gives numerical values of the misclassification rate and the standard error for various values of sigma and k for analyses of the IPIP data.  Table 1 about here ----------Six domains of personality In spectral clustering, the value(s) of k that minimise misclassification error may vary depending on sigma. For the IPIP data, for all values of sigma greater than 0.7, the optimal cluster number (k) was five. However, for values of sigma smaller than 0.7, multiple values of k were not statistically significantly different from each other. For sigma between 0.4 and 0.7 minima emerged for both five and six cluster solutions (with the exception of sigma = 0.45 where five was not a minima). A three-cluster solution emerged in only one instance; likewise, a seven-cluster solution emerged as minima in in a single instance.
Thus, while five-cluster solutions were preferred at high values of sigma, in line with the hypothesised sparse inclusion of valid honesty-Humility items in the NEO IPIP inventory, at low values of sigma which emphasise these sparse but strong item clusters, six-cluster solutions fit the data as well or better.

Comparison of domains from spectral clustering and factor analysis
For the purposes of comparison with the results from clustering, exploratory factor analyses extracting five and six factors were conducted, using a varimax rotation. In both cases, a factor classification was calculated for each item by assigning it to the factor on which the item has its highest loading following varimax rotation. This allowed a comparison of itemclassifications between the factor analytic and spectral clustering solutions.
-------Insert Table 2 about here -------------For the five-cluster and five-factor models, item classifications were very similar: 244 of the 300 items were assigned identically in both the factor and cluster solutions. Both solutions closely resembled the conventional Five Factor Model domains of Neuroticism, Extraversion, Openness, Agreeableness, and Conscientiousness. By contrast, the six-cluster and six-factor solutions differed sharply. For the factor analytic solution, just seven (of a possible 300) items had their highest loading on the sixth factor. Moreover these were drawn from disparate facets, not related to theoretical models: (four items were initially located in the Self-Consciousness facet (N4) of Neuroticism, and one item from each of three additional facets of Agreeableness or Neuroticism). The six-cluster solution (see Table 2) yielded a substantial 38-item sixth cluster, with a coherent pattern of item sources: 12 items originally from the NEO-IPIP Conscientiousness domain, 20 items from Agreeableness, and six items from Neuroticism. Thus while the sixth factor was a small nuisance factor with a majority of members from a single facet, the sixth cluster was both substantial and meaningful. This was the case across multiple levels of sigma, with all six-cluster solutions from Study 1 being similar in item content, number of items contained, and resemblance to the HEXACO model. Table 2  Finally, a significant change from the FFM alignment of facets was seen for Extraversion, with 14 items shifting out of Extraversion and into Conscientiousness. All 10 items from the facet -activity level‖ (E4: e.g. -Like to take it easy‖ (reversed) and -Am always busy‖) shifted to Conscientiousness. Four items from the Extraversion -assertiveness‖ facet (E3 e.g., -Take charge‖ and -Take control of things‖) were also absorbed into the Conscientiousness cluster.
In order to test if (and how) the Honesty-Humility and other clusters of the six-cluster solution map onto the five-factor solution in this sample, scores on each cluster, and on each Six domains of personality factor were derived for each subject, and the correlations among these scores computed (this correlation matrix is shown in Table 3). As can be seen, the six cluster Honesty-Humility cluster scores correlated with scores on five-factor domain scores for C (.61) and A (.52), but also O (-.32), E (-.38) and (weakly) with N (0.1).

Discussion
Study 1 tested whether spectral clustering would extract an interpretable and well-fitting six-cluster solution from IPIP NEO item data when the scale parameter was small enough to enhance sensitivity to thinly sampled, but strong, item associations. The results supported this hypothesis, with five clusters extracted using larger scaling values, but six clusters fitting better for smaller values of the scaling parameter. It is worth reiterating that changing the scale parameter will not necessarily alter the results from spectral clustering (e.g. in Study 2 below, the scale parameter turns out not to affect the results). When results do differ at different scales, this can provide insights into the structure of the data. In this case, the results suggest that the IPIP NEO does sample (albeit weakly) behaviours comprising a sixth domain of personality.
In terms of psychological content, the sixth cluster closely followed that of the Honesty-Humility domain described in the HEXACO modelperhaps surprisingly so given that we worked from an item bank chosen to model the NEO-PI-R, and which may not adequately sample the facets of Honesty-Humility (Ashton & Lee, 2005). The sixth cluster was populated with Agreeableness and Conscientiousness items, supporting the observation by Lee and Ashton (2004) that their Honesty-Humility factor loads primarily on NEO facets falling within the Agreeableness and Conscientiousness domains when only five factors are extracted. Ashton and Lee (2005) found the NEO-PI-R facets of Straightforwardness and Modesty were those most closely related to Honesty-Humility. These map onto the IPIP facets of Morality and Modesty.
Thus the prediction that IPIP Morality (A2) and Modesty (A5) items would constitute the Honesty-Humility cluster was supported. In addition we found that the Honesty-Humility cluster loaded strongly on Cautiousness (8 items) and also included four Dutifulness items. The present data, then, accord with the idea that facets usually incorporated into Conscientiousness are part of Honesty-Humility. The results thus inform HEXACO theory to some degree. Finally, Six domains of personality 13 Ashton and Lee (2005) reported a small negative correlation (-.12) of Honesty-Humility with FFM Neuroticism. This also was supported, with Honesty-Humility being associated with gluttony (overindulgence) and splurge-spending items linked to N in five-factor models.
The six-cluster solution versions of Neuroticism, Extraversion, Openness, Agreeableness, and Conscientiousness remained similar FFM representation, with some notable and informative changes. In particular, most of the Openness to Emotion facet (O3) was moved out of Openness in the six-cluster solution. O3 items tapping compassion versus callousness moved to Agreeableness, leaving Openness more focused on curiosity aboutand understanding ofemotion. For Extraversion, activity and assertiveness moved to Conscientiousness. As activity is often viewed (along with sociability) as a core aspect of Extraversion (H. J. Eysenck, 1998) this is a significant change, leaving Extraversion much more focused on sociability, warmth (Friendliness, Gregariousness, Cheerfulness) and Excitement-Seeking (for instance items about enjoyment of travel and concerts moved from Openness to Extraversion).
In summary, spectral clustering at smaller values of the scale parameter in Study 1 revealed six clusters. The sixth cluster was substantive and clearly followed the characteristics of HEXACO Honesty-Humility, suggesting that the IPIP NEO does sample this domain. This suggests that, in a battery containing more items designed to assess Honesty-Humility, this sixth cluster should emerge at all values of sigma. If this interpretation is correct, and the five-cluster solutions found at higher values of the scale parameter in Study 1 reflect the paucity with which the Honesty-Humility domain is sampled in the IPIP NEO, then, in a dataset containing items which load strongly on the Honesty-Humility domain, these five-cluster solutions should never emerge. Instead, six-cluster solutions should be ubiquitously obtained independent of scale. We test this prediction in Study 2 using a large sample of responses to the HEXACO instrument of Ashton and Lee (2010).

Introduction
Study 1 indicated that IPIP NEO items include Honesty-Humility, which was reliably identified at smaller values of the scale parameter that deemphasize weaker correlations. In Study 2, we wished to test the prediction that, in a dataset designed to strongly sample Honesty-Humility, (i.e., the HEXACO questionnaire) six clusters would be obtained at all values of the scale parameter sigma. Moreover, if the items for Honest-Humility are removed, then we support for six clusters should be lost, and instead, only five clusters should emerge. We tested these predictions in an independent sample of subjects who had completed the HEXACO, as reported by Ashton and Lee (2010).

Participants.
A total of 1,128 subjects, mainly college students, were combined from various samples (See Ashton & Lee, 2010). The final sample consisted of 693 females (mean age 21.0 years) and 429 males (mean age 21.6 years) and six subjects who did not provide their sex (mean age 28.0). All data were used.

Results
Study 2 used the same spectral clustering algorithms as in Study 1. As previously, we optimized against the value of k (the number of clusters) that minimized misclassification error. Figure 7 shows the percentage of classification error across 1000 runs for each value of sigma and k. As before, cells marked with stars highlight the minimum misclassification error for each value of k. In the HEXACO data, the optimal number of clusters (k), i.e., minimum item misclassification was six for all values of sigma explored. Table 4 gives numerical values of the misclassification rate and the standard error across sigma and k.  Table 4 about here --------------In study 1, we hypothesized that the IPIP NEO data contained weak traces of Honesty-Humility, which were only adequately detected when this information was amplified by choosing appropriately low values of sigma, and that a measure such as the HEXACO questionnaire replete with items marking Honesty-Humility would discover this 6 th cluster independent of sigma. As can be seen graphically in Figure 7, and in line with this hypothesis, support for six clusters was found at all tested values of sigma (best fitting solutions are starred and all favor 6 clusters).

Six domains of personality
An implicit corollary prediction follows: if a subset of the HEXACO data is analysed, removing all items from the Honest-Humility scale, then evidence for the sixth cluster should also be removed, and five clusters should be optimal at all values of sigma. Alternative outcomes such as splitting off one of the other domains into two clusters would imply that the clustering algorithm is biased to extract additional domains when run at low values of sigma, independent of evidence for such additional clusters.
To test this prediction we re-ran the spectral clustering analyses on HEXACO item data, first removing all Honesty-Humility items. The results accorded with prediction: For all values of sigma tested (0.4 to 1), fit was best at k = 5. Thus, with the Honesty-Humility items removed, five clusters were returned at all scales (see Figure 8)

Discussion
Study 2 indicated that HEXACO data contain six clusters, and this solution fitted best at all usable values of sigma (values above 0.4). In addition, removing the Honesty-Humility items from the scale prior to analysis also removed support for the sixth-cluster. Jointly, these findings support the hypothesis that six domains of personality generate a well-fitting model, and are required at all values of sigma when the domains relating to humility and honesty are strongly sampled. The second set of analyses in Study 2 -removing the items of the Honesty-Humility domainand analyzing the remaining subset of HEXACO data confirmed the prediction that, when Honesty-Humility is no longer sampled in a questionnaire where the items indexing the remaining five domains have been selected to avoid this domain, then six cluster solutions will notfit well, confirming that the analyses are not simply splitting factors at small values of sigma.
We next discuss the results of both studies, and their implications.

General Discussion
Four main results emerged. First, spectral clustering, even on NEO IPIP data, yielded a six-cluster solution corresponding to that of the HEXACO model, not only in having a sixth cluster, but in the Honesty-Humility-like nature of items comprising the sixth cluster, and in the Six domains of personality effects on the remaining five clusters representing Extraversion, Neuroticism/Emotionality, Openness, Agreeableness and Conscientiousness. By contrast, factor analysis yielded a six-factor solution containing a small and hard to interpret sixth factor. Despite its very different approach when compared to the algorithm of factor analysis, the five-cluster solution from the spectral clustering algorithm was highly similar to the five-factor solution from factor analysis (with both reflecting the FFM). The finding that six-cluster solutions were preferred depended on the weighting given to the weaker connections among the data (more weight favoring a five-cluster FFM solution, less weight favouring either a six-cluster HEXACO solution or a five-cluster FFM solution) casts light on the distinction between these two competing models for the structure of personality, and suggests that the NEO questionnaire contains traces of an Honesty-Humility dimension which is not detected by conventional factor analysis because it is too thinly sampled.
Second, spectral clustering suggested modifications to the psychological content of the 5FM domains, particularly in the six-cluster solution. Third, Honesty-Humility emerged at all scales in HEXACO data. Fourth, after removing Honesty -Humility items from the HEXACO, a 6 th cluster could not be extracted at any value of scale, suggesting the content of the HEXACO items assessing the core 5FM domains is restricted to these domains.

Five Cluster Solutions
The five-cluster solutions reflects the domain structure postulated by the classic FFM. This is significant given the fact that these two statistical techniquesfactor analysis and spectral clusteringare based on different transformations of the raw data, and have different optimisation targets. The present results, then, represent a robustness check on the FFM, which the FFM passes.

Psychological Content of the Six Domains of Personality
As might be expected for competing models of the same data, as can be seen in Table 2, the FFM and six-cluster HEXACO solutions are similar in many important ways. Each of the FFM domains, however, also was modified to a greater or lesser extent. The more substantial of these are discussed next. Neuroticism retained most items losing content from N4 (selfconsciousness) to Extraversion and from N5 (Immoderation) to Honesty-Humility and (for two items) to Conscientiousness. Openness retained most of its original items, but lost half the content of O3 (Openness to Emotions) in meaningful ways to Neuroticism/Emotionality. O3items referring to the direct experience of emotion moved to Neuroticism, The affective content of Openness to experience, then, was refocused on understanding emotion, with O retaining more intellectual-items such as -[I] try to understand myself‖. Changes occurred also for Extraversion: this domain lost four items from E3 (Assertiveness) and the entire Activity Level facet (E4) to Conscientiousness. Extraversion was thus re-focused on warm sociability and excitement seeking, with Conscientiousness gaining an emphasis on capability for sustained activity, and social leadership. The implications of this change for Conscientiousness are discussed below.
Agreeableness and Conscientiousness were clearly the domains most likely to shift in focus when Honesty-Humility is extracted as a separate domain. Agreeableness changed in ways consistent with the theoretical predictions of the HEXACO model (Ashton & Lee, 2005). 6cluster agreeableness lost all items relating to Modesty (A5) as well has half the items relating to Morality (A) and Cooperation (A4). These all shifted to Honesty-Humility. Together with its gain of empathy-related content from facet O3, Agreeableness was re-focused tightly on prosociality, cooperation, and fellow-feeling. The movement of hostility items from Neuroticism to Agreeableness predicted by Ashton and Lee (2007) did not occur. Hostility and Neuroticism, then, appear robustly linked, while Anger is more accurately reflects the affect associated with low Agreeableness.
As noted above, Conscientiousness gained the E3 (Assertiveness) and E4 (Activity Level) items from Extraversion, but lost around half of the C3 (Dutifulness) and most (8 of 10) items assessing Caution (C6). These moved to Honesty-Humility, and Conscientiousness was refocused away from caution and integrity and towards work capacity and task engagement, as predicted by Ashton and Lee (2007). This modified Conscientiousness factor may more effectively absorb work-capacity related scales such as Need for Achievement (Duckworth, Peterson, Matthews, & Kelly, 2007;Tellegen, 2000). FFM Conscientiousness correlates highly with measures of effortful persistence (> .65) (Duckworth & Quinn, 2009), but it would be valuable to test whether scores on the conscientiousness cluster extracted in Study One are more strongly linked to persistence and achievement. Finally, the relationship of the sixth-cluster to the conventional five-factor domains (Shown in Table 3) shed light on how Honesty-Humility largely sits within the space delineated by the FFM. Honesty-humility emerges not only as high Conscientiousness and Agreeableness, but also (more weakly) higher Neuroticism and lower Openness to experience and Extraversion.

Strengths and Weaknesses
Both studies used large sample sizes and comprehensive item batteries. In addition, the batteries differed in ways that allowed us to test the robustness of the 6-cluster model in item pools chosen to represent five-and six-factor models. While this dataset is very large, and we were able to recover and validate solutions using resampling techniques within this parcel of subjects, it would be valuable to examine the results of spectral clustering in data derived from other cultures and subject pools, test formats, and, particularly, on the NEO-PI-R (Costa & McCrae, 1992) itself. External validation in terms of differential validity of the five-and sixcluster solutions in genetic and experimental studies would also be valuable.

Conclusions
Applying spectral clustering to two large personality datasets yielded six-cluster solutions in both IPIP NEO and HEXACO data. Scale effects indicated that the IPIP NEO questionnaire is comprehensive, in that it samples Honesty-Humility, but weakly, so that this domain is not detected by factor analyses, but is reliably recovered by spectral clustering. The analyses also shed light on the psychological content of the personality domains, and suggest new associations based on these reformulations of psychological focus.      σ (the scale parameter). As in Figures 6 and 7, the stars indicate where the minimum occurs for each value of σ. In this study, minima are realized for five clusters over all σ Six domains of personality Tables   Table 1: Classification error data from Study 1, showing mean misclassification errors (and SE) as a function of sigma for each of the 3-to 7-cluster solutions.     The standard errors are small enough to ensure that all the differences between the minimum rate and the other values within each pane are statistically significant at the 95% confidence level.