The Imagery Questionnaire: An Investigation of its Validity


Paul J. Chara, Jr. and William S. Verplanck

Oral Roberts University     University of Tennessee

Summary–Research investigated from three perspectives the construct validity of one of the most frequently used imagery measures, Marks’ (1973) Vividness of Visual Imagery Questionnaire. First, the performance of six self-reported “good” and six self-reported “poor” imagers derived from 63 initial respondents to the questionnaire was compared on a test of the recall of projected slides. Second, subject idiosyncratic and interpersonal uses of the imagery rating scale were examined through a scaling of projected slides procedure and an alternate forms test-retest paradigm. Finally, interviews about the nature of each subject’s imaging experiences were conducted. Data indicated a lack of support for the questionnaire’s validity.

Ever since Galton (1907) first published his “breakfast table” imagery questionnaire in 1883, the self-report questionnaire has been the staple of imagery researchers. Currently, as recent volumes of Social Sciences Citation Index attest, Marks’ (1973) Vividness of Visual Imagery questionnaire is one of the most frequently used questionnaires. Not only is the Marks’ questionnaire quite popular, it clearly represents the tradition of imagery questionnaire development. Items on the questionnaire are either identical to or very similar to items on such questionnaires developed by Betts (1909), Sheehan (1967), and Shepard (1983). Marks’ (1973) questionnaire is an ideal instrument to examine when investigating the validity of imagery measures.

In recent years, a sharp controversy has developed between supporters (Marks, 1983a, 1983b) and detractors (Kaufman, 1981, 1983) over the validity of Marks’ questionnaire. Some researchers (White, Sheehan, & Ashton, 1977) have concluded that: “Basically, there is no direct way of knowing factually that the inventories we have reviewed measure imagery and not some other process that is related in function” (p. 163). This research was directed toward addressing this problem, mainly that of assessing the construct validity of the questionnaire.

The construct validity of Marks’ questionnaire was examined from both empirical and semantical perspectives. Empirically, because the questionnaire is designed to probe memory imagery–defined by Hilgard (1981) as “…a recall with imaginal content deriving from earlier perceptual experience” (p. 8)–the performance of self-reported good and poor imagers was compared on a test of picture recall. This performance measure predominantly utilized a recall paradigm as opposed to the recognition paradigms used by Marks (1973, 1983b) to be more in keeping with the conceptual definition provided by Hilgard (1981).

Semantically, research focused on both interpersonal and idiosyncratic uses of the word imagery by subjects. This was accomplished through interviews and by giving subjects the questionnaire a second time in a variant form. The second administration added visual analogs of varying clarity (slides developed from a scaling of slides procedure) to the questionnaire rating scale. The visual analogs were added to determine if subjects’ use of the rating scale would remain consistent if that rating scale was more objectified by the presence of slides. In the second experiment reported by Marks (1973), the mean numerical rating response of questionnaire derived “poor imagers” changed very little when asked to rate their images of pictorial rather than imaginal experiences, while “good imagers” demonstrated a significant change (one and a half intervals on a five-point scale) in mean rating response toward poorer imagery.

If the construct validity is to be supported, we should expect not only idiosyncratic and interpersonal conceptual consistency by questionnaire respondents but also better performance by self-reported “good imagers” than “poor imagers” on a test of pictorial recall.


Forty-six female and seventeen male undergraduate volunteers randomly selected from an introductory psychology class were administered Marks’ questionnaire. Summed questionnaire scores for the three men and three women reporting the best visual imagery (Mdns=25 for men, 20 for women) and equivalent numbers of both genders reporting the poorest visual imagery (Mdns=44 for men, 48 for women) led to the creation of “good” and “poor” imagers groups, respectively. These 12 subjects participated in further research 10 to 14 days later.


Fifteen photographic slides of a mountain cove, in clarity varying from very blurry to very sharp, and in illumination ranging from very dark to normal to too bright, were taken during 1 min. in the early afternoon on a bright and cloudless day.

Imagery Measures

Three questionnaires were used. In addition to Marks’ (1973) questionnaire, a 30-item variation was used. Six questions concerning the recall of information in the slides constituted the final questionnaire. A maximum of 50 points, 10 per question, could be obtained on this latter questionnaire.2

Experimental Setting and Apparatus

Sessions were conducted in a roomy, one-window office which could be darkened to ensure similar lighting conditions for all subjects to view the slides. A 35-mm projector placed 6 meters from a screen on a wall was used to project the slides which subjects viewed from a distance of 5 m from the screen.

Design and Procedure

Each participant took part in an hour-long session that consisted of four stages.

Stage 1: Scale establishment.–Subjects’ initial task was to scale the 15 slides via the assigning of numbers from the best to the worst in terms of how closely each slide provided a detailed and vivid representation of the mountain cove scene. After three runs of the 15 slides, subjects were then asked through a modified method of fractionation (see Stevens, 1975) to select five slides that best represented the following places on their 5-point scale: (1) The “best” slide, (2) the “worst” slide, (3) the midpoint between the “best” and “worst” slides (“middle”), (4) the midpoint between the “best” and “midpoint” slides (“good slide”), and (5) the midpoint between the “worst” and “midpoint” slides (“poor slide”). There was unanimous agreement among subjects as to the “best” and “worst” slides. Five, six, and seven different slides were chosen for the “good”, “middle”, and “poor” slides, respectively.

Stage 2: Picture recall.–Each participant was presented once more the five slides selected in order from the “best” to the “worst” slide before the projector was turned off and the room lights were turned on. Subjects were administered the 6-item recall questionnaire after 60 sec. of conversation (to guard against possible verbal rehearsal by participants) regarding the ease or difficulty of the scaling procedure.

Stage 3: Administration of the 30-item questionnaire.–Subjects were next asked to fill out the 30-item questionnaire using a rating scale that included both the written definitions of Marks’ (1973) 5-point rating scale and one of the corresponding five slides selected by subjects previously during the scaling procedure (e.g., rating 1 with “best” slide; rating 2 with “good” slide, etc.). After task instructions were given the sequence of five subject-selected slides was shown in order from the “best” to the “worst” with the corresponding rating definition read simultaneously with each slide. The procedure was repeated again after subjects indicated finishing 10 and 20 items on the questionnaire to ensure familiarity with both the verbal and visual components of the rating scale.

Stage 4: Discussions with subjects.–After completion of the 30-item questionnaire, participants were asked two questions: 1. “What does a rating of one mean to you?” 2. “Please look out the window for 30 seconds and then turn around, face the door and describe your present imaging experience of the scene outside the window. Using Marks’ rating scale, how would you describe this experience?” Subjects were dismissed after a short debriefing period in which they were also asked not to talk to anyone about the experiment for a week.

Correlation Between Scores on Recall and Imager Questionnaires

The Spearman rho between scores on Marks’ questionnaire (good imagers M=22.83 ± 2.34; poor imagers M=44.5 ± 3.55) and the recall questionnaire (good imagers M=25.33 ± 6.57; poor imagers M=17.09 ± 17.09) was insignificant (n=12, p=.028, P>.05). Partitions of subjects into men, women, “good” or “poor” imagers only yielded similar results. In addition, the median score for the “good” imagers (Mdn=25). Furthermore, the top three scorers on the recall questionnaire were members of the “poor” imagers group.

Imagery Conceptual Variation

Several instances of subjects’ inconsistency in the use of the imagery concept were observed.

Initial responses on Marks’ questionnaire.–About two-thirds of the 12 subjects’ responses to the questionnaire items were rating value one or two on the five-point scale. This was also observed not only in the original 63 respondents but also in previous research (Chara, 1977). In light of the fact that the over-all median on the recall task was only 26.5 out of a possible 60 points, subjects’ confidence in their imagery ability may be somewhat inflated.

Correlation between the two imagery questionnaires.–The Spearman rank-order correlation between the 12 common items in each imagery questionnaire was low (n=12, p=.290, P>.05). Most interesting was the fact that the greatest change in scores occurred with the “good” imagers. While the mean total scores on the 12 items for the “poor” imagers changed very little from Marks’ questionnaire (M=35.2) to the 30-item questionnaire (M=34.8), a radical change was observed for the “good” imagers (Marks’ M=16.8; 30-item M=24.2).

Interviews.–In response to the question “What does a rating of one mean to you?”, “poor” imagers were more likely (50%) to mention the word imagery in their explanations than “good” imagers (17%). Words such as familiar, frequent,and recent were used more often by subjects in their answers than words such as image, picture, or see. Furthermore, when asked to describe their imaging experience subsequent to looking out the office window, not one subject reported an experience resembling “normal vision,” a questionnaire rating of one. In fact, the more participants were probed about Marks’ imagery items they had rated one, the less likely they were to even continue in saying they had an imaging experience “clear and reasonably vivid” (rating two).

Little evidence in this study supports the construct validity of Marks’ questionnaire. Empirically, scores on the questionnaire were largely unrelated to scores on a recall test. Semantically, individual inconsistency in the use of the imagery concept, as reported by Kaufman (1981), was observed. Furthermore, it appears likely that interpersonal differences in the use of Marks’ (1973) rating scale may be significant. Respondents demonstrated a significant diversity of opinion in selecting the three middle slides during the scaling of slides procedure. If there is such a diversity of opinion with projected images, how much less consensual agreement might be expected among those dealing with theimaginal images of the Marks’ rating scale?

Due to the limited number of subjects involved in the study, stronger statements regarding the validity of the questionnaire must be withheld until further research is accomplished. This study suggests further research in two areas be done. First, it appears that most “imagery” studies dealing with pictorial “recall” involve recognition rather than recall paradigms (Marks, 1973, 1983b). A recall task, however, appears to be a much more demanding task than a recognition task and more in keeping with Hilgard’s (1981) definition of the phenomena the questionnaire purportedly measures (visual recall). Studies examining concordance between performance on recall tasks and questionnaire scores are therefore warranted. Second, this research suggests that subjects’ use of the questionnaire rating scale may be strongly influenced by such “cognitive deceit” phenomena as the “overconfidence effect” (Fischoff, 1982) and the “better than average effect” (Felson, 1981). To paraphrase an ancient Chinese proverb: “Two-thirds of what we see (image) may be behind our (mind’s) eye.” This possibility bears closer attention.

1 This research was partially based on a dissertation submitted to the University of Tennessee, Knoxville, in partial fulfillment of the requirements for the Ph.D. degree.

2 The 30-item variation of the VVIQ, the “visual recall questionnaire” and a detailed description of the scaling procedures employed in this study are on deposit with Microfiche Publications, POB 3513, Grand Central Station, New York, NY 10017. Ask for document NAPS-04446.

