Edited by: Holmes Finch, Ball State University, USA
Reviewed by: Yoon Soo Park, University of Illinois at Chicago, USA; Prathiba Natesan, University of North Texas, USA
*Correspondence: Ingrid Koller
This article was submitted to Quantitative Psychology and Measurement, a section of the journal Frontiers in Psychology
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
The valid measurement of latent constructs is crucial for psychological research. Here, we present a mixed-methods procedure for improving the precision of construct definitions, determining the content validity of items, evaluating the representativeness of items for the target construct, generating test items, and analyzing items on a theoretical basis. To illustrate the mixed-methods content-scaling-structure (CSS) procedure, we analyze the Adult Self-Transcendence Inventory, a self-report measure of wisdom (ASTI, Levenson et al.,
Construct validity is an important criterion of measurement validity. Broadly put, a scale or test is valid if it exhibits good psychometric properties (e.g., unidimensionality) and measures what it is intended to measure (e.g., Haynes et al.,
The development and evaluation of tests and questionnaires is a complex and lengthy process. The phases of this process have been described, for example, in the Standards for Educational and Psychological Testing by the American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (
Content validity (Rossiter,
In a seminal paper, Haynes et al. (
Several approaches to evaluate content validity have been described in the literature. One of the first procedures was probably the Delphi method, which was used since 1940 by the National Aeronautics and Space Administration (NASA) as a systematic method for technical predictions (see Sackman,
Most procedures currently used to investigate content validity are based on the quantitative methods described by Lawshe (
There exists, however, no systematical procedure that could be used as a general guideline for the evaluation of content validity (cf. Newman et al.,
Scale items are often developed on the basis of theoretical definitions of the construct, and sometimes they are even analyzed for content validity in similar ways as described above, but after this step, item selection is usually purely empirical. A set of items is completed by a sample of participants, and response frequencies and indicators of reliability such as item-total correlations are used to select the best-functioning items. Rossiter (
The main aim of this paper is to introduce the
1. Development of the expert questionnaire | Define clear instructions and working definitions for the subdimensions of the target construct; construct an item booklet. |
2. Selection of experts | Select a minimum of five experts from different fields (including experts from within and outside the respective content domain and experts in psychometrics). |
3. Individual data collection with each expert | Face to face interview or survey study (paper-pencil, online); no time limit. |
4. Summary of the results based on predefined rules | Summarize the results: mean percentages of the assignments, relevant dimensions for each item. Content-analyze responses to open-ended questions or think-aloud responses. |
5. Meeting of the experts, discussion of the results | A minimum of two experts from different fields discuss the results (optimally, all experts in a focus group setting). Possibly: second round of individual assignment of the items to dimensions. |
6a. Final assignment of the items to the dimensions | In a second discussion with the experts, finalize the assignment of items to dimensions, modify the original dimension definitions, taking into account the theoretical and empirical literature. |
6b. Definition of possible psychometric hypotheses | Define psychometric hypotheses (e.g., dimensionality) and psychometric problems (e.g., DIF, comprehension problems). |
6c. Definition of possible associations between dimensions | If possible/desirable, define different structural models for the instrument (e.g., unidimensional vs. multidimensional). |
7. Validation study | Investigate the validity of the instrument in a representative sample using an appropriate psychometric model (item-response models, factor-analytic approaches). |
8. Final definition of the latent construct. If necessary go back to point 1 or to point 5 | Based on all results, refine the operational definition of the target construct measured by the instrument, and identify other latent constructs that influence the response process. Based on the research interest, answer further questions to topics like discriminant and congruent validity, representativeness of the items for the target construct, or integrate the results in the state of the research of the target construct. |
To demonstrate our procedure here, we use the Adult Self-Transcendence Inventory (ASTI), a self-report scale measuring the complex target construct of wisdom. In the first part of the study, an expert panel analyzed the content of the ASTI items with respect to the underlying constructs in order to investigate dimensionality, identify potential predictors of differential item functioning, and analyze the appropriateness of the definition of the construct for the questionnaire. In the second part, data from a sample of 1215 participants were used to evaluate the items using multidimensional item response theory models, building upon the results of the first part. It is not at all mandatory to use item response modeling for such analyses; other psychometric methods, such as exploratory or confirmatory factor analyses, may also be employed, although our impression is that item response models are particularly well-suited to test specific hypotheses about item functioning. At the end, the results and the proposed procedure are discussed and a new definition of the target construct that the ASTI measures is given.
Before we describe the CSS-procedure in detail, we introduce the topic of measuring wisdom and the research questions for the presented study. After that, the CSS procedure is described and illustrated using the ASTI as an example.
Wisdom is a complex and multifaceted construct, and after 30 years of psychological wisdom research, measuring it in a reliable and valid way is still a major challenge (Glück et al.,
Individuals high in
Levenson et al. (
In a study including wisdom nominees as well as control participants, Glück et al. (
In the current study, we used the CSS procedure to gain insights about the structure of the scale, with the goal of identifying possible subscales. We only analyzed the 24 items measuring self-transcendence, excluding the 10 alienation items. Tables
As the study design is intended to be a template for other studies, we first present a point-by-point description of the steps in Table
The research questions for the current study were as follows.
Independent of whether the research question of interest concerns the construction of a new measurement instrument, the evaluation of an existing measurement, or evaluating the representativeness of the items for the target construct, the first step is to lay out the definition of the target construct in a sufficiently comprehensive way (e.g., by a systematical review). That is, all relevant dimensions of the target construct should be defined in such a way that there is no conceptual overlap between them. Because the goal of the current study was to investigate the items and definitions of the ASTI and not the representativeness of the items for the construct of wisdom, we used the four levels of self-transcendence (self-knowledge, non-attachment, integration, and self-transcendence) as defined above.
The second step is the generation of the questionnaire for the expert ratings based on the definitions of the target dimensions and the items. It includes a clear written instruction, the definitions of each subdimension, and an item booklet (see Figure
In the present study, the experts were first instructed to carefully read the definitions of the four dimensions underlying the ASTI shown above. Then they should read each item and use percentages to assign each item to the four dimensions: if an item tapped just one dimension, they should write “100%” into the respective column, if an item tapped more than one dimension they should split the 100% accordingly. For example, an item might be judged as measuring 80% self-transcendence and 20% integration. It is also possible to “force” experts to assign each item to only one dimension. This might be useful for re-evaluating an item assignment produced in earlier CSS rounds. As a first step, however, we believe that this kind of assignment could lead to a loss of valuable psychometric information about the items and increase the possibility of assignment errors.
If the experts felt that an item was largely measuring something other than the four dimensions, they should not fill out the percentages but make a note in an empty space below each item. They were also asked to note down any other thoughts or comments in that space. In addition, they were asked two specific questions: (1) “Do you think the item could be difficult to understand? If yes, why?,” and (2) “Do you think the item might have a different meaning for certain groups of people (e.g., men vs. women, younger vs. older participants, participants from different professional fields, or levels of education)? If yes, why?” The responses to these last questions allowed us to formulate a-priori hypotheses about differential item functioning (DIF; e.g., Holland and Wainer,
We recommend to recruit at least five experts (cf. Haynes et al.,
In the present study, nine experts (seven women and two men) participated. Because the interest of the study was to validate the items of the ASTI, all experts were psychologists; five were mainly experts in the field of wisdom research and related fields (including the second and third author), and four were mainly experts of test psychology and assessment in different research fields (including the first author). All experts worked with the German version of the ASTI, except for the second author who used the original English version. In the present study, the experts were invited by email and the questionnaire was sent to them as an RTF document.
The experts filled out the questionnaire individually and without time limits.
Next, the responses of the experts were summarized according to pre-defined rules. As Table
E1 | 70 | 30 | 0 | 1 | Notes | … | … |
E2 | 50 | 50 | 0 | 0 | Notes | … | … |
… | 100 | 0 | 0 | 1 | … | … | … |
… | … | … | … | … | Notes | … | … |
E |
… | … | … | … | Notes | … | … |
80 | 20 | 5 |
Of course, these three categories are only examples, other categories are also possible. This qualitative part of the study can offer theoretical insights about the target construct as well as the individual items.
The last row of Table
In the current case, as the goal was not to select items but to gain information about an existing scale, we used a much lower cutoff criterion of 0.30 to determine which dimension(s) experts considered as most fitting for each item. We then presented the results, in an aggregated form, to a subgroup of the experts with the goal redefining and perhaps differentiating the dimensions to allow for a clearer assignment of items. Thus, the number of experts who assigned the item to the most prominent dimension with a percentage of at least the cutoff value was an indicator of homogeneity of the experts' views (see Tables
I10 | I have a good sense of humor about myself. | IN = 5 | In its earlier form (I don't take myself too seriously) the item didn't work. | |
I19 | I feel that I know myself. | SK = 9 | No comments. | |
I20 | I am accepting of myself, including my faults. | IN = 6 | Different understanding of self-acceptance across different cultures. | |
SK = 4 | ||||
I21 | I am able to integrate the different aspects of my life. | IN = 8 | Dependent on age and life situation. | |
I01 | I often engage in quiet contemplation. | NA = 4 | It is more a component of emotion regulation; difficulties in comprehension. | |
I05 | My peace of mind is not easily upset. | NA = 4 | Not possible to assign it to one dimension; the definition may be too imprecise; what does peace of mind mean?; life events could play a role. | |
I09 | I do not become angry easily. | NA = 5 | Not possible to assign it to one dimension; the definition may be too imprecise; it is more a component of emotion regulation. | |
SK = 4 | ||||
I22 | I can accept the impermanence of things. | NA = 8 | If a participant encountered a loss recently the item may be biased (emotion). |
I03 | I don't worry about other people's opinions of me. | NA = 5 | Extraversion and egoisms can also play an important role. | |
ST = 3 | ||||
I06 | My sense of well-being does not depend on a busy social life. | NA = 5 | Extraversion and egoisms can also play an important role; it could be easier if “social life” were replaced by “people.” | |
ST = 4 | ||||
I08 | My happiness is not dependent on other people and things. | NA = 5 | Egoisms can also play an important role, difficult for so many postmodern people for whom “relationships” and possessions are paramount. | |
ST = 5 | ||||
I12 | Material possessions don't mean much to me. | NA = 7 | Meaning depends on participant's material possessions. | |
ST = 5 | ||||
I02 | I feel that my individual life is a part of a greater whole. | ST = 8 | Its dependent on the personal life situation, e.g., soldier. | |
I04 | I feel a sense of belonging with both earlier and future generations. | ST = 8 | Dependent on age. | |
I07 | I feel part of something greater than myself. | ST = 9 | Religiosity can play an important role. | |
I13 | I feel compassionate even toward people who have been unkind to me. | ST = 7 | Empathy is an important component; the sentence is jolty; time lag can play a role (When was a person unfriendly to me?). | |
I16 | I often have a sense of oneness with nature. | ST = 7 | Dependent on age; the absence of this sense is one of the most problematic issues in postmodern society. | |
I24 | Whatever [good] I do for others, I do for myself. | ST = 7 | The understanding could be too Christian; dualists don't get it, it might be the only item the scale needs. | |
I25 | Whatever [bad] I do to others, I do to myself. | ST = 5 | (German item) the understanding could be too Christian. |
I11 | I find much joy in life. | ST = 3 | In all four dimensions it is possible to have fun; not easy to assign it to one dimension; it is more a consequence of self-transcendence. | |
SK = 3 | ||||
I14 | I am not often fearful. | ST = 3 | Not really possible to assign it to one dimension; it is a negatively formulated item; fearful about what?; the item works differently for women and men. | |
SK = 3 | ||||
I15 | I can learn a lot from others. | ST = 5 | It is more a consequence of self-transcendence; dependent on situation. | |
NA = 4 | ||||
I17 | I am able to accept my mortality. | ST = 8 | Dependent on situation (e.g., illness); based on the definition, it is not possible to assign it to one dimension, problematic for young and healthy people. | |
I18 | I often “lose myself” in what I am doing. | ST = 2 | Flow item; it is more a consequence of self-transcendence. | |
SK = 4 | ||||
NA = 2 | ||||
I23 | I have grown as a result of losses I have suffered. | ST = 4 | Dependent on age, it is possible to grow in each dimension; it could be the path way to all dimensions. | |
SK = 2 | ||||
NA = 3 |
Next, the experts are invited to discuss the assignments and comments as a group. This discussion is particularly fruitful if the assignments were relatively heterogeneous. It can lead to clarifications and possible modifications of the definitions of the dimensions, removal of items that clearly do not fit the construct, and even generation of additional items. If the original assignments were very heterogeneous, it makes sense to repeat the individual assignment and collective discussion in order to achieve a sufficient level of agreement among the experts. However, this iterative process can become very complex and is not always feasible. In any case, a minimum of two experts from different fields (for example, one content expert and one psychometrician) should make the decisions together.
The results of the analysis and discussion of the experts' assignments can take various forms. Usually, some items are clearly assigned to a specific dimension, others turn out to be so equivocal that they are eliminated. In some cases, however, the conceptualization of the dimensions needs to be reconsidered. For example, as mentioned above, if a number of items are assigned to two dimensions with about equal weight, this may mean that the two dimensions need to be collapsed or that an additional dimension is required that is conceptually located between the two. If the comments of experts provide new insights for possible dimension definitions or labels, these comments can also be included in the formulation of new definitions.
In the present study, it was not possible to discuss the results with all experts. Thus, the third author, an expert on the topic of wisdom and psychometrics, and the first author, a psychometrician not familiar with the concept of wisdom discussed the results, performed the final assignment of the items, and formulated new names and definitions for the resulting dimensions where they differed from the original ones.
The results based on the assignments and the final discussion of the two experts are given in Tables
The four items in this scale all describe aspects of knowing and accepting oneself, including possibly diverging aspects and positive and negative sides (see Table
All items of this scale are about valuing and maintaining one's tranquility even in the face of reasons to get angry or upset (see Table
This scale comprises items concerning the individual's independence of external things, namely, other people's opinions, a busy social life, or material possessions, and of other people and things in general (see Table
All items in this scale were unanimously assigned to the self-transcendence dimension; they refer to individuals experiencing themselves as part of or closely related to something larger than themselves—“a greater whole,” “earlier and future generations,” or nature (see Table
The fifth dimension was labeled “presence in the here-and-now and growth”: its items describe individuals who are able to live in the moment: they find joy in their life and in what they are doing in a given moment, without being fearful of the future or preoccupied with the finitude of life (see Table
A goal of the analyses was to test whether the hypotheses gained from the expert judgments could be used to improve the psychometric functioning of the ASTI. Specifically, we wanted to test whether the ASTI as a whole formed an unidimensional scale, and if not, whether the five subdimensions derived from the expert assignments of the items would form unidimensional scales. Also, we wanted to test whether single items within each scale diverged from the others. For the theory-based item analysis, we summarized the comments from Tables
Test fairness (e.g., differential item functioning; see Section Testing Model Fit): For eight items (I02, I05, I12, I15, I17, I20, I21, I22), the experts noted possible context dependencies (e.g., the response may be dependent on life situation, life events, material possessions, health, or culture) and for five items (I04, I16, I17, I21, I23), possible influences of respondent age. For one item (I14), the experts suspected differences between men and women.
Influences of other constructs: The expert judgments generally suggested that the items of the ASTI are good indicators for the target construct. But for some items, other constructs, such as emotion regulation (I01, I09), extraversion (I03, I06), egoism (I03, I06, I08), empathy (I13), or spirituality (I07, I24, I25) may influence responses.
Linguistic factors: Only for five items (I01, I05, I06, I13, I14) linguistic problems (e.g., difficulties in comprehension) were suspected.
Sometimes researchers have theoretical assumptions about relationships between the various dimensions. Item response models can be used to test such hypotheses, e.g., to test predictions about correlations between dimensions or the structural relationships among them. In the current example, we only explored the latent correlations between the dimensions.
In the current study, we used item response models to test the psychometric functioning of the ASTI based on the results of the expert assignments.
Data were collected individually from 1215 participants in Austria and Germany by trained students as part of their class work. A total of 666 participants were students (431 women,
Participants filled out a set of paper-and-pencil scales and answered demographic questions. Overall, participation took about 25 min on average. The questionnaire included the ASTI and additional scales outside the scope of this paper.
To test the unidimensionality of the new subscales, we used an approach from the family of Rasch models. Rasch models (Rasch,
Specifically, we used the multidimensional random coefficient multinomial logit model (MRCML model; Adams et al.,
The item parameters were estimated using marginal maximum likelihood estimation (MLEs) and the person parameters using weighted maximum likelihood estimation (WLEs). The item analysis procedure follows Pohl and Carstensen (
As explained earlier, a main goal of the study was to integrate the proposed dimensions and the experts' hypotheses concerning item fit with a psychometric investigation of the items. As shown in the section “Definition of Psychometric Hypotheses,” strong hypotheses concerning dimensionality and candidate predictors for possible item misfit were identified. Accordingly, in the psychometric analysis these predictors will be used to test for significant item misfit.
Before starting with the actual analyses, the category frequencies for each item were assessed because low frequencies can cause estimation problems. If the frequency of a response category was below 100, it was collapsed with the next category (see Pohl and Carstensen,
Person-item-maps display the distribution of the person parameters and the range of item parameters. These plots show whether any participants showed extreme response tendencies, which might lead to particularly high or low raw scores, and how the item parameters are distributed over the latent dimension. Thus, it can be examined whether the items cover the whole spectrum of the latent dimension or cluster in one part of it. If there are few items in a segment of the spectrum, the latent trait cannot be measured well in that segment.
Up to now, the ASTI was scored as a unidimensional instrument, although the items were constructed so as to represent the subdimensions described earlier. Based on this theoretical background and the expert judgments, the five-dimensional model in Tables
Once the dimensionality of the ASTI is established, we can test the fit of the Rasch model within each subscale, analyzing several indicators of fit for each individual item. First, the assumption of Rasch homogeneity was tested by comparing the PCM against the generalized partial credit model (GPCM, Muraki,
Additionally, the expected score curves of each item were examined.
The fit of individual items was assessed using infit and outfit statistics, i.e., the weighted and unweighted means square statistics (MNSQ; Wright and Masters,
Differential item functioning means that the pattern of response probabilities for some items differs between groups of participants. For example, gender-related DIF would mean that men are more likely to agree to some items of a scale than women. If that were the case, the scale as a whole would be measuring a somewhat different construct for men than for women. Here, DIF was tested with respect to gender (
The experts' suggestions concerning possible DIF from Section Definition of Psychometric Hypotheses were taken into account in interpreting the results of the DIF analyses.
Descriptive statistics (
The person-item-map in Figure
Next, four different models were estimated and compared by means of the BIC. As Table
1DIM_1PL | −33300.89 | 63 | 67049.24 |
1DIM_2PL | −33062.81 | 87 | 66744.00 |
5DIM_1PL | −32871.86 | 77 | 66290.61 |
5DIM_2PL | −32524.25 | 97 | 65737.44 |
SI_1PL | −4442.59 | 09 | 8949.10 |
SI_2PL | −4408.00 | 12 | 8901.22 |
PM_1PL | −5380.55 | 11 | 10839.24 |
PM_2PL | −5370.76 | 14 | 10840.96 |
NA_1PL | −5751.28 | 12 | 11587.79 |
NA_2PL | −5735.62 | 15 | 11577.78 |
ST_1PL | −9883.43 | 20 | 19908.91 |
ST_2PL | −9715.91 | 26 | 19616.49 |
PG_1PL | −7763.56 | 15 | 15633.66 |
PG_2PL | −7721.62 | 20 | 15585.29 |
Table
SI | 1 | 0.657 | 0.323 | 0.227 | 0.739 |
PM | 1 | 0.591 | 0.323 | 0.721 | |
NA | 1 | 0.274 | 0.365 | ||
ST | 1 | 0.552 | |||
PG | 1 | ||||
EAP-Rel. | 0.692 | 0.626 | 0.508 | 0.668 | 0.660 |
Cronbach's α | 0.642 | 0.449 | 0.426 | 0.636 | 0.384 |
95% Confidence | 0.607; | 0.396; | 0.370; | 0.603; | 0.328; |
Interval for α | 0.674 | 0.499 | 0.477 | 0.666 | 0.437 |
Next, we assessed the items of each dimension separately. In general, the infit and outfit statistics showed no misfit of items (see Table
SI | 10 | 3 | 1.29 | 0.69 | −0.85 | 0.05 | −1.98 | 0.09 | 0.27 | 0.07 | – | – | 1.108 | 1.097 | −0.042 | −0.618 | −0.530 |
19 | 3 | 1.28 | 0.65 | −0.90 | 0.05 | −2.33 | 0.10 | 0.52 | 0.07 | – | – | 0.943 | 0.950 | −0.040 | 0.052 | 0.086 | |
20 | 3 | 1.24 | 0.69 | −0.72 | 0.05 | −1.92 | 0.09 | 0.48 | 0.07 | – | – | 0.910 | 0.922 | 0.156 | 0.352 | 0.244 | |
21 | 3 | 1.13 | 0.63 | −0.43 | 0.05 | −2.01 | 0.09 | 1.16 | 0.07 | – | – | 1.016 | 1.019 | −0.074 | 0.212 | 0.198 | |
PM | 01 | 3 | 0.93 | 0.69 | 0.20 | 0.05 | −0.76 | 0.07 | 1.16 | 0.08 | – | – | 1.017 | 1.016 | −0.04 | 0.046 | −0.012 |
05 | 4 | 1.60 | 0.83 | −0.13 | 0.04 | −1.67 | 0.10 | −0.21 | 0.06 | 1.47 | 0.09 | 0.968 | 0.967 | −0.024 | 0.104 | 0.190 | |
09 | 4 | 1.66 | 0.91 | −0.21 | 0.04 | −1.37 | 0.10 | −0.29 | 0.06 | 1.03 | 0.08 | 1.006 | 1.003 | 0.042 | −0.316 | −0.140 | |
22 | 3 | 1.04 | 0.67 | −0.11 | 0.05 | −1.21 | 0.08 | 1.00 | 0.07 | – | – | 1.008 | 1.007 | 0.022 | 0.164 | −0.038 | |
NA | 03 | 4 | 1.45 | 0.89 | 0.08 | 0.04 | −1.15 | 0.08 | 0.10 | 0.06 | 1.29 | 0.09 | 1.013 | 1.011 | 0.226 | −0.076 | −0.070 |
06 | 4 | 1.41 | 0.90 | 0.14 | 0.04 | −1.09 | 0.08 | 0.20 | 0.06 | 1.32 | 0.09 | 1.008 | 1.008 | 0.002 | −0.042 | 0.002 | |
08 | 3 | 0.99 | 0.72 | 0.02 | 0.04 | −0.72 | 0.07 | 0.76 | 0.07 | – | – | 0.938 | 0.940 | 0.026 | 0.17 | 0.158 | |
12 | 4 | 1.56 | 0.81 | −0.11 | 0.04 | −1.78 | 0.10 | −0.04 | 0.06 | 1.50 | 0.09 | 1.047 | 1.047 | −0.254 | −0.052 | −0.090 | |
ST | 02 | 4 | 1.69 | 0.91 | −0.23 | 0.04 | −1.26 | 0.10 | −0.50 | 0.06 | 1.05 | 0.08 | 0.911 | 0.911 | −0.140 | 0.07 | 0.042 |
04 | 4 | 1.72 | 0.85 | −0.31 | 0.04 | −1.63 | 0.11 | −0.58 | 0.06 | 1.26 | 0.08 | 1.044 | 1.036 | −0.054 | 0.06 | 0.058 | |
07 | 4 | 1.47 | 0.95 | 0.08 | 0.04 | −0.90 | 0.08 | −0.05 | 0.06 | 1.21 | 0.09 | 0.883 | 0.889 | −0.102 | −0.108 | −0.130 | |
13 | 3 | 1.27 | 0.73 | −0.58 | 0.04 | −1.11 | 0.08 | −0.06 | 0.06 | – | – | 1.121 | 1.092 | 0.144 | −0.29 | −0.292 | |
16 | 4 | 1.58 | 0.90 | −0.12 | 0.04 | −1.34 | 0.09 | −0.20 | 0.06 | 1.20 | 0.08 | 1.024 | 1.023 | 0.106 | 0.252 | 0.210 | |
24 | 3 | 0.84 | 0.70 | 0.38 | 0.05 | −0.51 | 0.06 | 1.27 | 0.08 | – | – | 1.059 | 1.052 | 0.098 | −0.012 | 0.032 | |
25 | 4 | 1.50 | 0.90 | 0.04 | 0.04 | −1.25 | 0.09 | −0.05 | 0.06 | 1.41 | 0.09 | 0.997 | 0.995 | −0.054 | 0.028 | 0.080 | |
PG | 11 | 3 | 1.42 | 0.65 | −0.92 | 0.05 | −1.64 | 0.10 | −0.20 | 0.06 | – | – | 0.984 | 0.987 | −0.246 | 0.056 | 0.056 |
14 | 4 | 1.65 | 0.88 | −0.13 | 0.04 | −1.04 | 0.09 | −0.53 | 0.06 | 1.17 | 0.08 | 1.043 | 1.037 | 0.224 | 0.324 | 0.324 | |
15 | 3 | 1.38 | 0.64 | −0.90 | 0.05 | −1.78 | 0.10 | −0.03 | 0.06 | – | – | 0.963 | 0.966 | −0.062 | −0.126 | −0.126 | |
17 | 3 | 1.17 | 0.76 | −0.31 | 0.04 | −0.74 | 0.07 | 0.12 | 0.06 | – | – | 0.988 | 0.988 | 0.242 | 0.060 | 0.060 | |
18 | 4 | 1.56 | 0.87 | −0.08 | 0.04 | −1.33 | 0.09 | −0.12 | 0.06 | 1.20 | 0.09 | 1.054 | 1.052 | 0.088 | −0.368 | −0.368 | |
23 | 3 | 1.20 | 0.72 | −0.41 | 0.04 | −1.04 | 0.08 | 0.21 | 0.06 | – | – | 0.961 | 0.963761 | −0.246 | 0.052 | 0.052 |
Table
SI | Gender | −4438.3 | 10 | 8947.62 | −4436.87 | 13 | 8966.08 |
Age | −4442.48 | 10 | 8955.99 | −4420.46 | 13 | 8933.25 | |
Group | −4442.59 | 10 | 8956.20 | 4424.60 | 13 | 8941.53 | |
EM | Gender | −5362.98 | 13 | 10818.30 | −5362.68 | 16 | 10839.00 |
Age | −5378.62 | 13 | 10849.56 | −5368.48 | 16 | 10850.61 | |
Group | −5378.51 | 13 | 10849.34 | −5373.51 | 16 | 10860.66 | |
NA | Gender | −5749.97 | 13 | 11592.27 | −5740.28 | 16 | 11594.20 |
Age | −5742.59 | 13 | 11577.52 | −5740.19 | 16 | 11594.01 | |
Group | −5742.27 | 13 | 11576.87 | −5739.46 | 16 | 11592.57 | |
ST | Gender | −9864.67 | 21 | 19878.50 | −9858.77 | 27 | 19909.31 |
Age | −9871.33 | 21 | 19891.81 | −9860.06 | 27 | 19911.89 | |
Group | −9880.66 | 21 | 19910.47 | −9868.11 | 27 | 19927.99 | |
CO | Gender | −7762.27 | 17 | 15645.29 | −7745.62 | 22 | 15647.49 |
Age | −7762.64 | 17 | 15646.03 | −7741.21 | 22 | 15638.67 | |
Group | −7756.91 | 17 | 15634.55 | −7743.28 | 22 | 15642.81 |
Table
The GPCM fit the scale better than the PCM, but again, the difference in BIC was small and the score curves showed good agreement between expected and observed response curves and slopes (see Table
Table
Table
In the following, we first discuss the methodological implications of our research, and then, its substantive implications concerning the use of the ASTI to measure wisdom.
This paper introduced the CSS procedure for evaluating content validity and discussed its advantages for the theory-based evaluation of scale items. In our experience, the method provides highly interesting practical and theoretical insights into target constructs. It does not only allow for evaluating and validating existing instruments and for improving the operationalization of a target construct, but it also offers advantages for constructing new items for existing instruments or even for developing whole new instruments. The procedure can be applied in all subdisciplines of psychology and other fields, wherever the goal is to measure specific constructs. In addition, it does not matter which kinds of items (e.g., questions, vignettes) and response formats (e.g., dichotomous, graded, open-ended) are used. The in-depth examination of the target construct is likely to increase the validity of any assessment.
We propose to follow certain quality criteria in studies using our approach. First, to optimize replicability, all steps should be carefully documented. A detailed documentation of procedures increases the validity of the study, irrespective of whether the data collection is more quantitative (as in the present study) or more qualitative (e.g., focus group discussions as in Castel et al.,
In addition to utilizing the ASTI to demonstrate our approach, we believe that we have gained important insights about the ASTI, as well as about self-transcendence in general, from this study. Through the exercise of assigning and reassigning the items to the dimensions of the construct and discussing the contradictions and difficulties we encountered, we gained a far deeper understanding of the measured itself.
In general, the analyses demonstrated the importance of constructing more “difficult” items, i.e., items with a lower level of agreement. This is a general issue with self-report wisdom scales (see Glück et al.,
For now, we have identified five subdimensions that include the 24 positive items (in German, 25) of the ASTI. The 10 negative items measuring alienation were not included in this analysis, as negative items tend to be difficult to assign to the same dimension as positive items. We recommend to leave them in the questionnaire in order to increase the range of item content, but to exclude them from score computations. In further applications of the ASTI, should the five subdimensions be scored separately or should the total score be used? Strong advocates of the Rasch model would certainly argue that using the total score across the subdimensions amounts to mixing apples and oranges. However, other self-report scales of wisdom such as the 3D-WS (Ardelt,
In the following, we describe the final subdimensions of the ASTI that resulted from our analysis and relate them to the theoretical background, thus completing point (8) “Final Definition of the Latent Construct” in the process. The subdimensions are ordered so as to represent a possible developmental order as suggested by Levenson et al. (
The first subdimension includes items that were originally intended to measure Curnow's (
Individuals high in this dimension of the ASTI are aware of the different, sometimes contradictory, facets of their self and their life, and they are able to accept all sides of their personality and integrate the different facets of their life. If the item “I have a good sense of humor about myself,” which was somewhat equivocal among the experts and showed differential item functioning in the quantitative analyses, is excluded, the subdimension includes only three items. Therefore, it seems advisable to add new items that refer to self-knowledge as well as items that differentiate between different kinds of integration (e.g., integration of self aspects, life contexts, and feelings). With a higher number of items, the distinction between knowing and accepting aspects of one's self might also receive more empirical support.
Non-attachment describes an individual's awareness of the fundamental independence of his or her internal self of external possessions or evaluations: non-attached individuals' self-esteem is not dependent on how others think about them or how many friends they have. The scale comprises four items concerning the individual's independence of external things, such as other people's opinions, a busy social life, or material possessions. It is important to note that non-attachment does not mean that people are not committed to others or to important issues in their current world; the main point is that they do not depend on external sources for self-enhancement. The fact that they are not affected by other people's judgments enables them to lead the life that is right for them and accept others non-judgmentally. Like other ideas originating from Buddhism, non-attachment as a path to mental health is currently receiving some attention in clinical psychology (Shonin et al.,
Individuals high in this dimension, which was not part of Curnow's (
Tranquility is a characteristic that many laypeople associate with wisdom (Bluck and Glück,
Highly self-transcendent individuals feel that the boundaries between them and others, even humanity at large, are permeable. They feel related to past and future generations, all human beings, and nature. As they do not need to utilize social relationships to enhance their sense of self, they are able to love and accept other individuals as they are. As Levenson et al. (
There were relatively high latent correlations (around 0.70) between the subdimensions of self-knowledge and integration, peace of mind, and presence in the here-and-now and growth, all of which seem to describe an accepting and appreciative stance toward oneself and one's life. For some purposes, it may make sense to average across these three subdimensions, as their discriminant validity may be limited. At the same time, the manifest correlations between these three subscales are markedly lower than the latent ones
The individual differences in our data (see “Differential Item Functioning”) were largely consistent with the literature. It is important to first note that our sample is not well-suited for analyzing older age: we were able to compare only two age groups roughly corresponding to adolescence and young adulthood on the one hand (15–31 years,
Gender differences were found, interestingly, for four of the five subdimensions. Men had higher scores than women in self-knowledge and integration. This finding may suggest that men indeed know and accept themselves more than women do or that women actually tend to be more self-reflective and self-critical. In any case, the effect was small and needs further investigation. In the subdimensions peace of mind, non-attachment, and self-transcendence, women scored higher than men. These findings may, however, be partly determined by societal expectations for women to be less self-centered and more caring than men, which does not necessarily imply true self-transcendence. Thus, the limitations of self-report measures remain somewhat present even in carefully constructed scales like the ASTI.
In sum, we suggest that researchers using the ASTI may gain significant information if they use separate scores for the subdimensions we have identified in addition to, or instead of, the total score. The self-transcendence subdimension may be the purest indicator of actual self-transcendence. Whether the other subdimensions represent important preconditions, correlates, or even outcomes of self-transcendence is largely an empirical issue to be addressed in the future, which may tell us more about the development of wisdom.
No formal approval was applied for as the guidelines of the local Ethics Committee specify that the type of survey study we performed does not require such approval. All participants filled out an informed-consent form and agreed that their data are used for scientific purposes. No vulnerable populations were involved in this study.
All three authors meet the four criteria for authorship required in the author guidelines. Each author's main tasks were as follows. IK: Development and application of the CSS procedure, expert in the first part of study, data analyses, writing the paper. JG: Discussion partner for the CSS procedure, expert in the first part of study, writing the parts concerning the topic of wisdom (background and results), editing of the manuscript. ML: Construction and provision of the revised (and as yet unpublished) ASTI, discussion of the translation of the items, expert in the first part of the study.
This research was partly funded by the Austrian Science Fund FWF (grant nr. P25425, PI: JG).
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The Supplementary Material for this article can be found online at: