Approved For Release 2000/08/08 : CIA-RDP96-00789R003200110001-4
Psychological Bulletin Copyright 1994 by the American Psychological ANciation, Inc.
909/94/$3.00
1994,hVol. 115, No. 1, 4-18
Does Psi Exist? Replicable Evidence for an Anomalous Process of
Information Transfer
Daryl J. Bem and Charles Honorton
Most academic psychologists do not yet accept the existence of psi, anomalous processes of informa-
tion or energy transfer (such as telepathy or other forms of extrasensory perception) that are cur-
rently unexplained in terms of known physical or biological mechanisms. We believe that the repli-
cation rates and effect sizes achieved by one particular experimental method, the ganzfeld procedure,
are now sufficient to warrant bringing this body of data to the attention of the wider psychological
community. Competing meta-analyses of the ganzfeld database are reviewed, I by R. Hyman (1985),
a skeptical critic of psi research, and the other by C. Honorton (1985), a parapsychologist and major
contributor to the ganzfeld database. Next the results of I l new ganzfeld studies that comply with
guidelines jointly authored by R. Hyman and C. Honorton (1986) are summarized. Finally, issues
of replication and theoretical explanation are discussed.
The term psi denotes anomalous processes of information or
energy transfer, processes such as telepathy or other forms of
extrasensory perception that are currently unexplained in
terms of known physical or biological mechanisms. The term is
purely descriptive: It neither implies that such anomalous phe-
nomena are paranormal nor connotes anything about their un-
derlying mechanisms.
Does psi exist? Most academic psychologists don't think so.
A survey of more than 1,100 college professors in the United
States found that 55% of natural scientists, 66% of social scien-
tists (excluding psychologists), and 77% of academics in the arts,
humanities, and education believed that ESP is either an estab-
lished fact or a likely possibility. The comparable figure for psy-
Daryl J. Bem, Department of Psychology, Cornell University; Charles
Honorton, Department of Psychology, University of Edinburgh, Edin-
burgh, Scotland.
Sadly, Charles Honorton died of a heart attack on November 4, 1992,
9 days before this article was accepted for publication. He was 46. Para-
psychology has lost one of its most valued contributors. I have lost a
valued friend.
This collaboration had its origins in a 1983 visit I made to Honorton's
Psychophysical Research Laboratories (PRL) in Princeton, New Jersey,
as one of several outside consultants brought in to examine the design
and implementation of the experimental protocols.
Preparation of this article was supported, in part, by grants to Charles
Honorton from the American Society for Psychical Research and the
Parapsychology Foundation, both of New York City. The work at PRL
summarized in the second half of this article was supported by the
James S. McDonnell Foundation of St. Louis, Missouri, and by the
John E. Fetzer Foundation of Kalamazoo, Michigan.
Helpful comments on drafts of this article were received from Debo-
rah Delanoy, Edwin May, Donald McCarthy, Robert Morris, John
Palmer, Robert Rosenthal, Lee Ross, Jessica Utts, Philip Zimbardo, and
two anonymous reviewers.
Correspondence concerning this article should be addressed to Daryl
J. Bern, Department of Psychology, Uris Hall, Cornell University, Ith-
aca, New York 14853. Electronic mail may be sent to d.bemQcor-
nell.edu.
chologists was only 34%. Moreover, an equal number of psy-
chologists declared ESP to be an impossibility, a view expressed
by only 2% of all other respondents (Wagner & Monnet, 1979).
We psychologists are probably more skeptical about psi for
several reasons. First, we believe that extraordinary claims re-
quire extraordinary proof. And although our colleagues from
other disciplines would probably agree with this dictum, we are
more likely to be familiar with the methodological and statisti-
cal requirements for sustaining such claims, as well as with pre-
vious claims that failed either to meet those requirements or
to survive the test of successful replication. Even for ordinary
claims, our conventional statistical criteria are conservative.
The sacred p - .05 threshold is a constant reminder that it is far
more sinful to assert that an effect exists when it does not (the
Type I error) than to assert that an effect does not exist when it
does (the Type 11 error).
Second, most of us distinguish sharply between phenomena
whose explanations are merely obscure or controversial (e.g.,
hypnosis) and phenomena such as psi that appear to fall outside
our current explanatory framework altogether. (Some would
characterize this as the difference between the unexplained and
the inexplicable.) In contrast, many laypersons treat all exotic
psychological phenomena as epistemologically equivalent;
many even consider dej3 vu to be a psychic phenomenon. The
blurring of this critical distinction is aided and abetted by the
mass media, "new age" books and mind-power courses, and
"psychic" entertainers who present both genuine hypnosis and
fake "mind reading- in the course of a single performance. Ac-
cordingly. most to persons would not have to revise their con-
ceptual model of reality as radically as we would in order to
assimilate the existence of psi. For us, psi is simply more ex-
traordinary.
Finally, rrseatrh in cx4nitisr 4nd social psychology has sensi-
tized us to the errors and bxasts that plague intuitive attempts
to draw valid infrtcr tc's from the data of everyday experience
(Gilcnich. 1991; '4311Ctt A I~ts>s, 1980; Tversky & Kahneman,
1971), This k-Ads us to 911c virtually no probative weight to an-
ccdotal wt;w rrprts of psi, the main source cited by
Approved For Release 2000/08/08
: CIA-RDP96-00789R003200110001-4
Approved For Release 2000/08/08 : CIA-RDP96-00789R003200110001-4
ANOMALOUS INFORMATION TRANSFER
academic colleagues as evidence for their beliefs about psi
agner & Monnet, 1979).
I iar than others with recent experimental research on psi.
ke most psychological research, parapsychological research is
ical research, however, contemporary parapsychological re-
rch is not usually reviewed or summarized in psychology's
_
xtbooks, handbooks, or mainstream journals. For example,
yed even mentions the experimental procedure reviewed in
is article, a procedure that has been in widespread use since
c early 1970s (Roig, Icochea, & Cuzzucoli, 1991). Other sec-
ndary sources for nonspecialists are frequently inaccurate in
heir descriptions of parapsychological research. (For discus-
ions of this problem, see Child, 1985, and Palmer, Honorton,
Utts, 1989.)
This situation may be changing. Discussions of modern psi
esearch have recently appeared in a widely used introductory
cxtbook (Atkinson, Atkinson, Smith, & Bern, 1990, 1993), two
mainstream psychology journals (Child, 1985; Rao & Palmer,
1987), and a scholarly but accessible book for nonspecialists
(Broughton, 1991). The purpose of the present article is to sup-
plement these broader treatments with a more detailed, meta-
analytic presentation of evidence issuing from a single experi-
mental method: the ganzfeld procedure. We believe that the
replication rates and effect sizes achieved with this procedure
are now sufficient to warrant bringing this body of data to the
attention of the wider psychological community.
The Ganzfeld Procedure
By the 1960s, a number of parapsychologists had become dis-
satisfied with the familiar ESP testing methods pioneered by
J. B. Rhine at Duke University in the 1930s. In particular, they
believed that the repetitive forced-choice procedure in which a
subject repeatedly attempts to select the'correct "target" sym-
bol from a set of fixed alternatives failed to capture the circum-
stances that characterize reported instances of psi in everyday
life.
Historically, psi has often been associated with meditation,
hypnosis, dreaming, and other naturally occurring or deliber-
ately induced altered states of consciousness. For example, the
view that psi phenomena can occur during meditation is ex-
pressed in most classical texts on meditative techniques; the be-
lief that hypnosis is a psi-conducive state dates all the way back
to the days of early mesmerism (Dingwall, 1968); and cross-
cultural surveys indicate that most reported "real-life" psi ex-
periences are mediated through dreams (Green, 1960; Prasad
& Stevenson, 1968; L. E. Rhine, 1962; Sannwald, 1959).
There are now reports of experimental evidence consistent
with these anecdotal observations. For example, several labora-
tory investigators have reported that meditation facilitates psi
performance (Honorton, 1977). A meta-analysis of 25 experi-
ments on hypnosis and psi conducted between 1945 and 1981
in 10 different laboratories suggests that hypnotic induction
may also facilitate psi performance (Schechter, 1984). And
dream-mediated psi was reported in a series of experiments
conducted at Maimonides Medical Center in New York and
published between 1966 and 1972 (Child, 1985; Ullman,
Krippner, & Vaughan, 1973).
In the Maimonides dream studies, two subjects-a "receiver"
and a "sender"-spent the night in a sleep laboratory. The re-
ceiver's brain waves and eye movements were monitored as he
or she slept in an isolated room. When the receiver entered a
period of REM sleep, the experimenter pressed a buzzer that
signaled the sender-under the supervision of a second experi-
menter-to begin a sending period. The sender would then con-
centrate on a randomly chosen picture (the "target") with the
goal of influencing the content of the receiver's dream.
Toward the end of the REM period, the receiver was awak-
ened and asked to describe any dream just experienced. This
procedure was repeated throughout the night with the same
target. A transcription of the receiver's dream reports was given
to outside judges who blindly rated the similarity of the night's
dreams to several pictures, including the target. In some studies,
similarity ratings were also obtained from the receivers them-
selves. Across several variations of the procedure, dreams were
judged to be significantly more similar to the target pictures
than to the control pictures in the judging sets (failures to repli-
cate the Maimonides results were also reviewed by Child, 1985).
These several lines of evidence suggested a working model of
psi in which psi-mediated information is conceptualized as a
weak signal that is normally masked by internal somatic and
external sensory "noise." By reducing ordinary sensory input,
these diverse psi-conducive states are presumed to raise the sig-
nal-to-noise ratio, thereby enhancing a person's ability to detect
the psi-mediated information (Honorton, 1969, 1977). To test
the hypothesis that a reduction of sensory input itself facilitates
psi performance, investigators turned to the ganzfeld procedure
(Brand, Wood, & Braud, 1975; Honorton & Harper, 1974; Par-
ker, 1975), a procedure originally introduced into experimental
psychology during the 1930s to test propositions derived from
gestalt theory (Avant, 1965; Metzger, 1930).
Like the dream studies, the psi ganzfeld procedure has most
often been used to test for telepathic communication between a
sender and a receiver. The receiver is placed in a reclining chair
in an acoustically isolated room. Translucent ping-pong ball
halves are taped over the eyes and headphones are placed over
the ears; a red floodlight directed toward the eyes produces an
undifferentiated visual field, and white noise played through the
headphones produces an analogous auditory field. It is this ho-
mogeneous perceptual environment that is called the Ganzfeld
("total field"). To reduce internal somatic "noise," the receiver
typically also undergoes a series of progressive relaxation exer-
cises at the beginning of the ganzfeld period.
The sender is sequestered in a separate acoustically isolated
room, and a visual stimulus (art print, photograph, or brief vid-
eotaped sequence) is randomly selected from a large pool of
such stimuli to serve as the target for the session. While the
sender concentrates on the target, the receiver provides a con-
tinuous verbal report of his or her ongoing imagery and menta-
tion, usually for about 30 minutes. At the completion of the
ganzfeld period, the receiver is presented with several stimuli
(usually four) and, without knowing which stimulus was the
target, is asked to rate the degree to which each matches the
imagery and mentation experienced during the ganzfeld period.
If the receiver assigns the highest rating to the target stimulus, it
Approved For Release 2000/08/08 : CIA-RDP96-00789R003200110001-4
Approved For Release 2000/08/08 : CIA-RDP96-00789R003200110001-4
is scored as a "hit." Thus, if the experiment uses judging sets
containing four stimuli (the target and three decoys or control
stimuli), the hit rate expected by chance is .25. The ratings can
also be analyzed in other ways; for example, they can be con-
verted to ranks or standardized scores within each set and ana-
lyzed parametrically across sessions. And, as with the dream
studies, the similarity ratings can also be made by outside judges
using transcripts of the receiver's mentation report.
Meta-Analyses of the Ganzfeld Database
In 1985 and 1986, the Journal ofParapsychology devoted two
entire issues to a critical examination of the ganzfeld database.
The 1985 issue comprised two contributions: (a) a meta-analy-
sis and critique by Ray Hyman (1985), a cognitive psychologist
and skeptical critic of parapsychological research, and (b) a
competing meta-analysis and rejoinder by Charles Honorton
(1985), a parapsychologist and major contributor to the ganz-
feld database. The 1986 issue contained four commentaries on
the Hyman-Honorton exchange, a joint communique by Hy-
man and Honorton, and six additional commentaries on the
joint communique itself. We summarize the major issues and
conclusions here.
Replication Rates
Rates by study Hyman's meta-analysis covered 42 psi ganz-
feld studies reported in 34 separate reports written or published
from 1974 through 1981. One of the first problems he discov-
ered in the database was multiple analysis. As noted earlier, it
is possible to calculate several indexes of psi performance in a
ganzfeld experiment and, furthermore, to subject those indexes
to several kinds of statistical treatment. Many investigators re-
ported multiple indexes or applied multiple statistical tests
without adjusting the criterion significance level for the number
of tests conducted. Worse, some may have "shopped" among
the alternatives until finding one that yielded a significantly suc-
cessful outcome. Honorton agreed that this was a problem.
Accordingly, Honorton applied a uniform test on a common
index across all studies from which the pertinent datum could
be extracted, regardless of how the investigators had analyzed
the data in the original reports. He selected the proportion of
hits as the common index because it could be calculated for the
largest subset of studies: 28 of the 42 studies. The hit rate is
also a conservative index because it discards most of the rating
information; a second place ranking-a "near miss"-receives
no more credit than a last place ranking. Honorton then calcu-
lated the exact binomial probability and its associated z score
for each study.
Of the 28 studies, 23 (82%) had positive z scores (p = 4.6 X
10-4, exact binomial test with p = q = .5). Twelve of the studies
(43%) had z scores that were independently significant at the 5%
level (p = 3.5 X 10-9, binomial test with 28 studies, p = .05,
and q = .95), and 7 of the studies (25%) were independently
significant at the 1% level (p = 9.8 X 10-9). The composite
Stouffer z score across the 28 studies was 6.60 (p = 2.1 X 10-11).'
A more conservative estimate of significance can be obtained
by including 10 additional studies that also used the relevant
judging procedure but did not report hit rates. If these studies
are assigned a mean z score of zero, the Stouffer z across all 38
studies becomes 5.67 (p = 7.3 X 10-9).
Thus, whether one considers only the studies for which the
relevant information is available or includes a null estimate for
the additional studies for which the information is not available,
the aggregate results cannot reasonably be attributed to chance.
And, by design, the cumulative outcome reported here cannot
be attributed to the inflation of significance levels through
multiple analysis.
Rates by laboratory. One objection to estimates such as
those just described is that studies from a common laboratory
are not independent of one another (Parker, 1978). Thus, it is
possible for one or two investigators to be disproportionately
responsible for a high replication rate, whereas other, indepen-
dent investigators are unable to obtain the effect.
The ganzfeld database is vulnerable to this possibility. The
28 studies providing hit rate information were conducted by
investigators in 10 different laboratories. One laboratory con-
tributed 9 of the studies, Honorton's own laboratory contrib-
uted 5, 2 other laboratories contributed 3 each, 2 contributed 2
each, and the remaining 4 laboratories each contributed 1.
Thus, half of the studies were conducted by only 2 laboratories,
1 of them Honorton's own.
Accordingly, Honorton calculated a separate Stouffer z score
for each laboratory. Significantly positive outcomes were re-
ported by 6 of the 10 laboratories, and the combined z score
across laboratories was 6.16 (p = 3.6 X 10-10). Even if all of
the studies conducted by the 2 most prolific laboratories are
discarded from the analysis, the Stouffer z across the 8 other
laboratories remains significant (z = 3.67, p = 1.2 X 10-4). Four
of these studies are significant at the 1% level (p = 9.2 X 10-6,
binomial test with 14 studies, p = .01, and q = .99), and each
was contributed by a different laboratory. Thus, even though
the total number of laboratories in this database is small, most
of them have reported significant studies, and the significance
of the overall effect does not depend on just one or two of them.
Selective Reporting
In recent years, behavioral scientists have become increas-
ingly aware of the "file-drawer" problem: the likelihood that
successful studies are more likely to be published than unsuc-
cessful studies, which are more likely to be consigned to the file
drawers of their disappointed investigators (Bozarth & Roberts,
1972; Sterling, 1959). Parapsychologists were among the first to
become sensitive to the problem, and, in 1975, the Parapsycho-
logical Association Council adopted a policy opposing the selec-
tive reporting of positive outcomes. As a consequence, negative
findings have been routinely reported at the association's meet-
ings and in its affiliated publications for almost two decades. As
has already been shown, more than half of the ganzfeld studies
included in the meta-analysis yielded outcomes whose signifi-
cance falls short of the conventional .05 level.
A variant of the selective reporting problem arises from what
' Stouffer's z is computed by dividing the sum of the z scores for the
individual studies by the square root of the number of studies (Rosen-
thal,.1978).
Approved For Release 2000/08/08 : CIA-RDP96-00789R003200110001-4
Approved For Release 2000/08/08 : CIA-RDP96-00789R003200110001-4
ANOMALOUS INFORMATION TRANSFER 7
man (1985) has termed the "retrospective study." An inves-
tor conducts a small set of exploratory trials. If they yield
II results, they remain exploratory and never become part of
c official record; if they yield positive results, they are defined
a study after the fact and are submitted for publication. In
pport of this possibility, Hyman noted that there are more
gnificant studies in the database with fewer than 20 trials than
ing equal, statistical power should increase with the square
mption that "all other things" are in fact equal across the
udies and disagreed with Hyman's particular statistical analy-
s, he agreed that there is an apparent clustering of significant
udics with fewer than 20 trials. (Of the complete ganzfeld da-
base of 42 studies, 8 involved fewer than 20 trials, and 6 of
ose studies reported statistically significant results.)
Because it is impossible, by definition, to know how many
nknown studies-exploratory or otherwise-are languishing
file drawers, the major tool for estimating the seriousness of
lective reporting problems has become some variant of Ro-
nthal's file-drawer statistic, an estimate of how many unre-
rted studies with z scores of zero would be required to exactly
ncel out the significance of the known database (Rosenthal,
979). For the 28 direct-hit ganzfeld studies alone, this estimate
423 fugitive studies, a ratio of unreported-to-reported studies
f approximately 15:1. When it is recalled that a single ganzfeld
scion takes over an hour to conduct, it is not surprising that-
espite his concern with the retrospective study problem-Hy-
an concurred with Honorton and other participants in the
ublished debate that selective reporting cannot plausibly ac-
ount for the overall statistical significance of the psi ganzfeld
atabase (Hyman & Honorton, 1986).2
fethodological Flaws
If the most frequent criticism of parapsychology is that it has
not produced a replicable psi effect, the second most frequent
criticism is that many, if not most, psi experiments have inade-
quate controls and procedural safeguards. A frequent charge is
that positive results emerge primarily from initial, poorly con-
trolled studies and then vanish as better controls and safeguards
rrc introduced.
Fortunately, meta-analysis provides a vehicle for empirically
tsValuating the extent to which methodological flaws may have
contributed to artifactual positive outcomes across a set of stud-
ks. First, ratings are assigned to each study that index the degree
tow hich particular methodological flaws are or are not present;
these ratings are then correlated with the studies' outcomes.
Large positive correlations constitute evidence that the ob-
'crsed effect may be artifactual.
In psi research, the most fatal flaws are those that might per-
mit a subject to obtain the target information in normal sensory
fashion, either inadvertently or through deliberate cheating.
This is called the problem of sensory leakage. Another poten-
tiall) serious flaw is inadequate randomization of target selec-
UCCL
Srruo..y leakage. Because the ganzfeld is itself a perceptual
iscatation procedure, it goes a long way toward eliminating po-
tcntial sensory leakage during the ganzfeld portion of the ses-
sion. There are, however, potential channels of sensory leakage
after the ganzfeld period. For example, if the experimenter who
interacts with the receiver knows the identity of the target, he or
she could bias the receiver's similarity ratings in favor of correct
identification. Only one study in the database contained this
flaw, a study in which subjects actually performed slightly below
chance expectation. Second, if the stimulus set given to the re-
ceiver for judging contains the actual physical target handled by
the sender during the sending period, there might be cues (e.g.,
fingerprints, smudges, or temperature differences) that could
differentiate the target from the decoys. Moreover, the process of
transferring the stimulus materials to the receiver's room itself
opens up other potential channels of sensory leakage. Although
contemporary ganzfeld studies have eliminated both of these
possibilities by using duplicate stimulus sets, some of the earlier
studies did not.
Independent analyses by Hyman and Honorton agreed that
there was no correlation between inadequacies of security
against sensory leakage and study outcome. Honorton further
reported that if studies that failed to use duplicate stimulus sets
were discarded from the analysis, the remaining studies are still
highly significant (Stouffer z = 4.35, p = 6.8 X 10-6).
Randomization. In many psi experiments, the issue of
target randomization is critical because systematic patterns in
inadequately randomized target sequences might be detected by
subjects during a session or might match subjects' preexisting
response biases. In a ganzfeld study, however, randomization is
a much less critical issue because only one target is selected dur-
ing the session and most subjects serve in only one session. The
primary concern is simply that all the stimuli within each judg-
ing set be sampled uniformly over the course of the study. Sim-
ilar considerations govern the second randomization, which
takes place after the ganzfeld period and determines the se-
quence in which the target and decoys are presented to the re-
ceiver (or external judge) for judging.
Nevertheless, Hyman and Honorton disagreed over the find-
ings here. Hyman claimed there was a correlation between flaws
of randomization and study outcome; Honorton claimed there
was not. The sources of this disagreement were in conflicting
definitions of flaw categories, in the coding and assignment of
flaw ratings to individual studies, and in the subsequent statisti-
cal treatment of those ratings.
Unfortunately, there have been no ratings of flaws by inde-
pendent raters who were unaware of the studies' outcomes
(Morris, 1991). Nevertheless, none of the contributors to the
subsequent debate concurred with Hyman's conclusion,
whereas four nonparapsychologists-two statisticians and two
psychologists-explicitly concurred with Honorton's conclu-
sion (Harris & Rosenthal, 1988b; Saunders, 1985; Utts, 199 la).
For example, Harris and Rosenthal (one of the pioneers in the
use of meta-analysis in psychology) used Hyman's own flaw rat-
ings and failed to find any significant relationships between
flaws and study outcomes in each of two separate analyses:
2 A 1980 survey of parapsychologists uncovered only 19 completed
but unreported ganzfeld studies. Seven of these had achieved signifi-
cantly positive results, a proportion (.37) very similar to the proportion
of independently significant studies in the meta-analysis (.43) (Black-
more, 1980).
Approved For Release 2000/08/08 : CIA-RDP96-00789R003200110001-4
Approved For Release 2000/08/08 : CIA-RDP96-00789R003200110001-4
8 DARYL J. BEM AND CHARLES HONORTON
"Our analysis of the effects of flaws on study outcome lends no
support to the hypothesis that Ganzfeld research results are a
significant function of the set of flaw variables" (1988b, p. 3; for
a more recent exchange regarding Hyman's analysis, see Hy-
man, 1991; Utts, 1991 a, 1991 b).
Effect Size
Some critics of parapsychology have argued that even if cur-
rent laboratory-produced psi effects turn out to be replicable
and nonartifactual, they are too small to be of theoretical inter-
est or practical importance. We do not believe this to be the case
for the psi ganzfeld effect.
In psi ganzfeld studies, the hit rate itself provides a straight-
forward descriptive measure of effect size, but this measure can-
not be compared directly across studies because they do not all
use a four-stimulus judging set and, hence, do not all have a
chance baseline of .25. The next most obvious candidate, the
difference in each study between the hit rate observed and the
hit rate expected under the null hypothesis, is also intuitively
descriptive but is not appropriate for statistical analysis because
not all differences between proportions that are equal are
equally detectable (e.g., the power to detect the difference be-
tween .55 and .25 is different from the power to detect the
difference between .50 and .20).
To provide a scale of equal detectability, Cohen (1988) de-
vised the effect size index h, which involves an arcsine transfor-
mation on the proportions before calculation of their difference.
Cohen's h is quite general and can assess the difference between
any two proportions drawn from independent samples or be-
tween a single proportion and any specified hypothetical value.
For the 28 studies examined in the meta-analyses, h was .28,
with a 95% confidence interval from .11 to.45.
But because values of h do not provide an intuitively descrip-
tive scale, Rosenthal and Rubin (1989; Rosenthal, 1991) have
recently suggested a new index, lr, which applies specifically to
one-sample, multiple-choice data of the kind obtained in ganz-
feld experiments. In particular, Tr expresses all hit rates as the
proportion of hits that would have been obtained if there had
been only two equally likely alternatives-essentially a coin flip.
Thus, it ranges from 0 to 1, with .5 expected under the null
hypothesis. The formula is
= P(k- 1)
V P(k-2)+1'
where P is the raw proportion of hits and k is the number of
alternative choices available. Because a has such a straightfor-
ward intuitive interpretation, we use it (or its conversion back
to an equivalent four-alternative hit rate) throughout this article
whenever it is applicable.
For the 28 studies examined in the meta-analyses, the mean
value of it was .62, with a 95% confidence interval from .55 to
.69. This corresponds to a four-alternative hit rate of 35%, with
a 95% confidence interval from 28% to 43%.
Cohen (1988, 1992) has also categorized effect sizes into
small, medium, and large, with medium denoting an effect size
that should be apparent to the naked eye of a careful observer.
For a statistic such as 7r, which indexes the deviation of a pro-
portion from.5, Cohen considers.65 to be a medium effect size:
A statistically unaided observer should be able to detect the bias
of a coin that comes up heads on 65% of the trials. Thus, at .62,
the psi ganzfeld effect size falls just short of Cohen's naked-eye
criterion. From the phenomenology of the ganzfeld experi-
menter, the corresponding hit rate of 35% implies that he or she
will see a subject obtain a hit approximately every third session
rather than every fourth.
It is also instructive to compare the psi ganzfeld effect with
the results of a recent medical study that sought to determine
whether aspirin can prevent heart attacks (Steering Committee
of the Physicians' Health Study Research Group, 1988). The
study was discontinued after 6 years because it was already clear
that the aspirin treatment was effective (p < .00001) and it was
considered unethical to keep the control group on placebo med-
ication. The study was widely publicized as a major medical
breakthrough. But despite its undisputed reality and practical
importance, the size of the aspirin effect is quite small: Taking
aspirin reduces the probability of suffering a heart attack by
only .008. The corresponding effect size (h) is .068, about one
third to one fourth the size of the psi ganzfeld effect (Atkinson
et al., 1993, p. 236; Utts, 1991 b).
In sum, we believe that the psi ganzfeld effect is large enough
to be of both theoretical interest and potential practical impor-
tance.
Experimental Correlates of the Psi Ganzfeld Effect
We showed earlier that the technique of correlating variables
with effect sizes across studies can help to assess whether meth-
odological flaws might have produced artifactual positive out-
comes. The same technique can be used more affirmatively to
explore whether an effect varies systematically with conceptu-
ally relevant variations in experimental procedure. The discov-
ery of such correlates can help to establish an effect as genuine;
suggest ways of increasing replication rates and effect sizes, and
enhance the chances of moving beyond the simple demonstra-
tion of an effect to its explanation. This strategy is only heuris-
tic, however. Any correlates discovered must be considered
quite tentative, both because they emerge from post hoc explo-
ration and because they necessarily involve comparisons across
heterogeneous studies that differ simultaneously on many inter-
related variables, known and unknown. Two such correlates
emerged from the meta-analysts of the psi ganzfeld effect.
Single- versus miilnppkt-Image targets. Although most of the
28 studies in the meta-analysis used single pictures as targets, 9
(conducted by three different investigators) used View Master
stereoscopic slide reels that presented multiple images focused
on a central theme. Studies using the View Master reels pro-
duced significantly higher hit rates than did studies using the
single-image targets (W.76 vs. 34%). 1(26) - 2.22, p = .035, two-
tailed.
Sender-receiver Pairing In 17 of the 28 studies, partici-
pants were free to bring in friends to serve as senders. In 8 stud-
ies, only laboratory-assigned senders were used. (Three studies
used no sender.) Unfortunately, there is no record of how many
participants in the former studies actually brought in friends.
Nevertheless. those 17 studies (eunducted by six different inves-
tigators) had significantly higher hit rates than did the studies
Approved For Release 2000/08/08 : CIA-RDP96-00789R003200110001-4
Approved For Release 2000/08/08 : CIA-RDP96-00789R003200110001-4
ANOMALOUS INFORMATION TRANSFER
that used only laboratory-assigned senders (44% vs. 26%), t(23)
= 2.39, p = .025, two-tailed.
The Joint Communique
After their published exchange in 1985, Hyman and Honor-
ton agreed to contribute a joint communique to the subsequent
discussion that was published in 1986. First, they set forth their
areas of agreement and disagreement:
We agree that there is an overall significant effect in this data base
that cannot reasonably be explained by selective reporting or
multiple analysis. We continue to differ over the degree to which
the effect constitutes evidence for psi, but we agree that the final
verdict awaits the outcome of future experiments conducted by a
broader range of investigators and according to more stringent
standards. (Hyman & Honorton, 1986, p. 351)
They then spelled out in detail the "more stringent stan-
dards" they believed should govern future experiments. These
standards included strict security precautions against sensory
leakage, testing and documentation of randomization methods
for selecting targets and sequencing the judging pool, statistical
correction for multiple analyses, advance specification of the
status of the experiment (e.g., pilot study or confirmatory ex-
periment), and full documentation in the published report of
the experimental procedures and the status of statistical tests
(e.g., planned or post hoc).
The National Research Council Report
In 1988, the National Research Council (NRC) of the Na-
tional Academy of Sciences released a widely publicized report
commissioned by the U.S. Army that assessed several contro-
versial technologies for enhancing human performance, includ-
ing accelerated learning, neurolinguistic programming, mental
practice, biofeedback, and parapsychology (Druckman &
Swets, 1988; summarized in Swets & Bjork, 1990). The report's
conclusion concerning parapsychology was quite negative:
"The Committee finds no scientific justification from research
conducted over a period of 130 years for the existence of para-
psychological phenomena" (Druckman & Swets, 1988, p. 22).
An extended refutation strongly protesting the committee's
treatment of parapsychology has been published elsewhere
(Palmer et al., 1989). The pertinent point here is simply that
the NRC's evaluation of the ganzfeld studies does not reflect an
additional, independent examination of the ganzfeld database
but is based on the same meta-analysis conducted by Hyman
that we have discussed in this article.
Hyman chaired the NRC's Subcommittee on Parapsychol-
ogy, and, although he had concurred with Honorton 2 years ear-
lier in their joint communique that "there is an overall signifi-
cant effect in this data base that cannot reasonably be explained
by selective reporting or multiple analysis" (p. 351) and that
"significant outcomes have been produced by a number of
different investigators" (p. 352), neither of these points is ac-
knowledged in the committee's report.
The NRC also solicited a background report from Harris and
Rosenthal (1988a), which provided the committee with a com-
parative methodological analysis of the five controversial areas
just listed. Harris and Rosenthal noted that, of these areas,
"only the Ganzfeld ESP studies [the only psi studies they evalu-
ated] regularly meet the basic requirements of sound experi-
mental design" (p. 53), and they concluded that
it would be implausible to entertain the null given the combined p
from these 28 studies. Given the various problems or flaws pointed
out by Hyman and Honorton . . . we might estimate the obtained
accuracy rate to be about 1/3 . . . when the accuracy rate expected
under the null is 1/4. (p. 5 1)'
The Autoganzfeld Studies
In 1983, Honorton and his colleagues initiated a new series
of ganzfeld studies designed to avoid the methodological prob-
lems he and others had identified in earlier studies (Honorton,
1979; Kennedy, 1979). These studies complied with all of the
detailed guidelines that he and Hyman were to publish later in
their joint communique. The program continued until Septem-
ber 1989, when a loss of funding forced the laboratory to close.
The major innovations of the new studies were computer con-
trol of the experimental protocol-hence the name autoganz-
feld-and the introduction of videotaped film clips as target
stimuli.
Method
The basic design of the autoganzfeld studies was the same as that
described earlier4: A receiver and sender were sequestered in separate,
acoustically isolated chambers. After a 14-min period of progressive
relaxation, the receiver underwent ganzfeld stimulation while describ-
ing his or her thoughts and images aloud for 30 min. Meanwhile, the
sender concentrated on a randomly selected target. At the end of the
ganzfeld period, the receiver was shown four stimuli and, without know-
ing which of the four had been the target, rated each stimulus for its
similarity to his or her mentation during the ganzfeld.
The targets consisted of 80 still pictures (static targets) and 80 short
video segments complete with soundtracks (dynamic targets), all re-
corded on videocassette. The static targets included art prints, pho-
tographs, and magazine advertisements; the dynamic targets included
excerpts of approximately 1-min duration from motion pictures, TV
shows, and cartoons. The 160 targets were arranged in judging sets of
four static or four dynamic targets each, constructed to minimize simi-
larities among targets within a set.
Target selection and presentation. The VCR containing the taped
targets was interfaced to the controlling computer, which selected the
target and controlled its repeated presentation to the sender during the
ganzfeld period, thus eliminating the need for a second experimenter to
accompany the sender. After the ganzfeld period, the computer ran-
domly sequenced the. four-clip judging set and presented it to the re-
ceiver on a TV monitor for judging. The receiver used a computer game
paddle to make his or her ratings on a 40-point scale that appeared on
' In a troubling development, the chair of the NRC Committee
phoned Rosenthal and asked him to delete the parapsychology section
of the paper (R. Rosenthal, personal communication, September 15,
1992). Although Rosenthal refused to do so, that section of the Harris-
Rosenthal paper is nowhere cited in the NRC report.
4 Because Honorton and his colleagues have complied with the Hy-
man-Honorton specification that experimental reports be sufficiently
complete to permit others to reconstruct the investigator's procedures,
readers who wish to know more detail than we provide here are likely to
find whatever they need in the archival publication of these studies in
the Journal of Parapsychology (Honorton et al., 1990).
Approved For Release 2000/08/08 : CIA-RDP96-00789R003200110001-4
Approved For Release 2000/08/08 : CIA-RDP96-00789ROO3200110001-4
the TV monitor after each clip was shown. The receiver was permitted
to see each clip and to change the ratings repeatedly until he or she was
satisfied. The computer then wrote these and other data from the session
into a file on a floppy disk. At that point, the sender moved to the receiv-
er's chamber and revealed the identity of the target to both the receiver
and the experimenter. Note that the experimenter did not even know
the identity of the four-clip judging set until it was displayed to the re-
ceiver for judging.
Randomization. The random selection of the target and sequencing
of the judging set were controlled by a noise-based random number gen-
erator interfaced to the computer. Extensive testing confirmed that the
generator was providing a uniform distribution of values throughout
the full target range (1-160). Tests on the actual frequencies observed
during the experiments confirmed that targets were, on average, selected
uniformly from among the 4 clips within each judging set and that the
4 judging sequences used were uniformly distributed across sessions.
Additional control features. The receiver's and sender's rooms were
sound-isolated, electrically shielded chambers with single-door access
that could be continuously monitored by the experimenter. There was
two-way intercom communication between the experimenter and the
receiver but only one-way communication into the sender's room; thus,
neither the experimenter nor the receiver could monitor events inside
the sender's room. The archival record for each session includes an au-
diotape containing the receiver's mentation during the ganzfeld period
and all verbal exchanges between the experimenter and the receiver
throughout the experiment.
The automated ganzfeld protocol has been examined by several
dozen parapsychologists and behavioral researchers from other fields,
including well-known critics of parapsychology. Many have partici-
pated as subjects or observers. All have expressed satisfaction with the
handling of security issues and controls.
Parapsychologists have often been urged to employ magicians as con-
sultants to ensure that the experimental protocols are not vulnerable
either to inadvertent sensory leakage or to deliberate cheating. Two
"mentalists," magicians who specialize in the simulation of psi, have
examined the autoganzfeld system and protocol. Ford Kross, a profes-
sional mentalist and officer of the mentalist's professional organization,
the Psychic Entertainers Association, provided the following written
statement "In my professional capacity as a mentalist, I have reviewed
Psychophysical Research Laboratories' automated ganzfeld system and
found it to provide excellent security against deception by subjects"
(personal communication, May, 1989).
Daryl J. Bem has also performed as a mentalist for many years and is
a member of the Psychic Entertainers Association. As mentioned in
the author note, this article had its origins in a 1983 visit he made to
Honorton's laboratory, where he was asked to critically examine the
research protocol from the perspective of a mentalist, a research psy-
chologist, and a subject. Needless to say, this article would not exist if he
did not concur with Ford Kross's assessment of the security procedures.
Experimental Studies
Altogether, 100 men and 140 women participated as receivers in 354
sessions during the research program.5 The participants ranged in age
from 17 to 74 years (M = 37.3, SD = 1 1.8), with a mean formal educa-
tion of 15.6 years (SD = 2.0). Eight separate experimenters, including
Honorton, conducted the studies.
The experimental program included three pilot and eight formal
studies. Five of the formal studies used novice (first-time) participants
who served as the receiver in one session each. The remaining three
formal studies used experienced participants.
Pilot studies. Sample sizes were not preset in the three pilot studies.
Study 1 comprised 22 sessions and was conducted during the initial
development and testing of the autoganzfeld system. Study 2 comprised
9 sessions testing a procedure in which the experimenter, rather than
the receiver, served as the judge at the end of the session. Study 3 com-
prised 35 sessions and served as practice for participants who had com-
pleted the allotted number of sessions in the ongoing formal studies but
who wanted additional ganzfeld experience. This study also included
several demonstration sessions when TV film crews were present.
Novice studies. Studies 101-104 were each designed to test 50 par-
ticipants who had had no prior ganzfeld experience; each participant
served as the receiver in a single ganzfeld session. Study 104 included 16
of 20 students recruited from the Juilliard School in New York City to
test an artistically gifted sample. Study 105 was initiated to accommo-
date the overflow of participants who had been recruited for Study 104,
including the 4 remaining Juilliard students. The sample size for this
study was set to 25, but only 6 sessions had been completed when the
laboratory closed. For purposes of exposition, we divided the 56 sessions
from Studies 104 and 105 into two parts: Study 104/105(a) comprises
the 36 non-Juilliard participants, and Study 104/105(b) comprises the
20 Juilliard students.
Study 201. This study was designed to retest the most promising
participants from the previous studies. The number of trials was set to
20, but only 7 sessions with 3 participants had been completed when
the laboratory closed.
Study 301. This study was designed to compare static and dynamic
targets. The sample size was set to 50 sessions. Twenty-five experienced
participants each served as the receiver in 2 sessions. Unknown to the
participants, the computer control program was modified to ensure that
they would each have I session with a static target and I session with a
dynamic target.
Study 302. This study was designed to examine a dynamic target
set that had yielded a particularly high hit rate in the previous studies.
The study involved experienced participants who had had no prior ex-
perience with this particular target set and who were unaware that only
one target set was being sampled. Each served as the receiver in a single
session. The design called for the study to continue until 15 sessions
were completed with each of the targets, but only 25 sessions had been
completed when the laboratory closed.
The I I studies just described comprise all sessions conducted during
the 6.5 years of the program. There is no "file drawer" of unreported
sessions.
Overall hit rate. As in the earlier meta-analysis, receivers'
ratings were analyzed by tallying the proportion of hits achieved
and calculating the exact binomial probability for the observed
number of hits compared with the chance expectation of .25.
As noted earlier, 240 participants contributed 354 sessions. For
reasons discussed later, Study 302 is analyzed separately, reduc-
ing the number of sessions in the primary analysis to 329.
As Table I shows, there were 106 hits in the 329 sessions, a
hit rate of 32% (z = 2.89, p = .002, one-tailed), with a 95%
confidence interval from 30% to 35%. This corresponds to an
effect size (7r) of.59, with a 95% confidence interval from.53 to
.64.
Table 1 also shows that when Studies 104 and 105 are com-
bined and re-divided into Studies 104/105(a) and 104/105(b), 9
5 A recent review of the original computer files uncovered a duplicate
record in the autoganzfeld database. This has now been eliminated, re-
ducing by one the number of subjects and sessions. As a result, some of
the numbers presented.in this article differ slightly from those in Hon-
orton et al. (1990).
Approved For Release 2000/08/08 : CIA-RDP96-00789ROO3200110001-4
Approved For Release 2000/08/08 : CIA-RDP96-00789R003200110001-4
ANOMALOUS INFORMATION TRANSFER
Table 1
Outcome by Study
N
N
N
%
Effect
Study
Study/subject description
subjects
trials
hits
hits
size a
z
l
Pilot
19
22
8
36
.62
0.99
2
Pilot
4
9
3
33
.60
0.25
3
Pilot
24
35
10
29
.55
0.32
101
Novice
50
50
12
24
.47
-0.30
102
Novice
50
50
18
36
.63
1.60
103
Novice
50
50
15
30
.55
0.67
104/105(a)
Novice
36
36
12
33
.60
0.97
104/105(b)
Juilliard sample
20
20
10
50
.75
2.20
201
Experienced'
3
7
3
43
.69
0.69
301
Experienced
25
50
15
30
.56
0.67
302
Experienced
25
25
16
548
.788
3.048
Overall
(Studies 1-301)
Note. All z scores are based on the exact binomial probability, with p = .25 and q = .75.
8 Adjusted for response bias; the hit rate actually observed was 64%.
of the 10 studies yield positive effect sizes, with a mean effect
size (ir) of.61, t(9) = 4.44, p = .0008, one-tailed. This effect size
is equivalent to a four-alternative hit rate of 34%. Alternatively,
if Studies 104 and 105 are retained as separate studies, 9 of the
10 studies again yield positive effect sizes, with a mean effect
size (ir) of .62, t(9) = 3.7 3, p = .002, one-tailed. This effect size
is equivalent to a four-alternative hit rate of 35% and is identical
to that found across the 28 studies of the earlier meta-analysis.'
Considered together, sessions with novice participants (Stud-
ies 101-105) yielded a statistically significant hit rate of 32.5%
(p = .009), which is not significantly different from the 31.6%
hit rate achieved by experienced participants in Studies 201 and
301. And, finally, each of the eight experimenters also achieved
a positive effect size, with a mean it of .60, t(7) = 3.44, p = .005,
one-tailed.
The Juilliard sample. There are several reports in the liter-
ature of a relationship between creativity or artistic ability and
psi performance (Schmeidler, 1988). To explore this possibility
in the ganzfeld setting, 10 male and 10 female undergraduates
were recruited from the Juilliard School. Of these, 8 were music
students, 10 were drama students, and 2 were dance students.
Each served as the receiver in a single session in Study 104 or
105. As shown in Table 1, these students achieved a hit rate of
50% (p = .014), one of the five highest hit rates ever reported for
a single sample in a ganzfeld study. The musicians were partic-
ularly successful: 6 of the 8 (75%) successfully identified their
targets (p = .004; further details about this sample and their
ganzfeld performance were reported in Schlitz & Honorton,
1992).
Study size and effect size. There is a significant negative cor-
relation across the 10 studies listed in Table 1 between the num-
ber of sessions included in a study and the study's effect size (7r),
r = -.64, t(8) = 2.36, p < .05, two-tailed. This is reminiscent
of Hyman's discovery that the smaller studies in the original
ganzfeld database were disproportionately likely to report sta-
tistically significant results. He interpreted this finding as evi-
dence for a bias against the reporting of small studies that fail to
achieve significant results. A similar interpretation cannot be
applied to the autoganzfeld studies, however, because there are
no unreported sessions.
One reviewer of this article suggested that the negative corre-
lation might reflect a decline effect in which earlier sessions of a
study are more successful than later sessions. If there were such
an effect, then studies with fewer sessions would show larger
effect sizes because they would end before the decline could set
in. To check this possibility, we computed point-biserial corre-
lations between hits (1) or misses (0) and the session number
within each of the 10 studies. All of the correlations hovered
around zero; six were positive, four were negative, and the over-
all mean was.01.
An inspection of Table 1 reveals that the negative correlation
derives primarily from the two studies with the largest effect
sizes: the 20 sessions with the Juilliard students and the 7 ses-
sions of Study 201, the study specifically designed to retest the
most promising participants from the previous studies. Accord-
ingly, it seems likely that the larger effect sizes of these two stud-
ies-and hence the significant negative correlation between the
number of sessions and the effect size-reflect genuine perfor-
mance differences between these two small, highly selected sam-
ples and other autoganzfeld participants.
Study 302. All of the studies except Study 302 randomly
sampled from a pool of 160 static and dynamic targets. Study
302 sampled from a single, dynamic target set that had yielded
a particularly high hit rate in the previous studies. The four film
clips in this set consisted of a scene of a tidal wave from the
movie Clash of the Titans, a high-speed sex scene from A Clock-
work Orange, a scene of crawling snakes from a TV documen-
tary, and a scene from a Bugs Bunny cartoon.
6 As noted above, the laboratory was forced to close before three of
the formal studies could be completed. If we assume that the remaining
trials in Studies 105 and 201 would have yielded only chance results,
this would reduce the overall z for the first 10 autoganzfeld studies from
2.89 to 2.76 (p = .003). Thus, inclusion of the two incomplete studies
does not pose an optional stopping problem. The third incomplete
study, Study 302, is discussed below.
Approved For Release 2000/08/08 : CIA-RDP96-00789R003200110001-4 ip
Approved For Release 2000/08/08 : CIA-RDP96-00789R003200110001-4
DARYL J. BEM AND CHARLES HONORTON
The experimental design called for this study to continue un-
til each of the clips had served as the target 15 times. Unfortu-
nately, the premature termination of this study at 25 sessions
left an imbalance in the frequency with which each clip had
served as the target. This means that the high hit rate observed
(64%) could well be inflated by response biases.
As an illustration, water imagery is frequently reported by
receivers in ganzfeld sessions, whereas sexual imagery is rarely
reported. (Some participants probably are reluctant both to re-
port sexual imagery and to give the highest rating to the sex-
related clip.) If a video clip containing popular imagery (such as
water) happens to appear as a target more frequently than a
clip containing unpopular imagery (such as sex), a high hit rate
might simply reflect the coincidence of those frequencies of oc-
currence with participants' response biases. And, as the second
column of Table 2 reveals, the tidal wave clip did in fact appear
more frequently as the target than did the sex clip. More gener-
ally, the second and third columns of Table 2 show that the fre-
quency with which each film clip was ranked first closely
matches the frequency with which each appeared as the target.
One can adjust for this problem by using the observed fre-
quencies in these two columns to compute the hit rate expected
if there were no psi effect. In particular, one can multiply each
proportion in the second column by the corresponding propor-
tion in the third column-yielding the joint probability that the
clip was the target and that it was ranked first-and then sum
across the four clips. As shown in the fourth column of Table 2,
this computation yields an overall expected hit rate of 34.08%.
When the observed hit rate of 64% is compared with this base-
line, the effect size (h) is .61. As shown in Table 1, this is equiv-
alent to a four-alternative hit rate of 54%, or a 7r value of .78,
and is statistically significant (z = 3.04, p = .0012).
The psi effect can be seen even more clearly in the remaining
columns of Table 2, which control for the differential popularity
of the imagery in the clips by displaying how frequently each
was ranked first when it was the target and how frequently it was
ranked first when it was one of the control clips (decoys). As can
be seen, each of the four clips was selected as the target relatively
more frequently when it was the target than when it was a decoy,
a difference that is significant for three of the four clips. On
average, a clip was identified as the target 58% of the time when
it was the target and only 14% of the time when it was a decoy.
Dynamic versus static targets. The success of Study 302
raises the question of whether dynamic targets are, in general,
more effective than static targets. This possibility was also sug-
gested by the earlier meta-analysis, which revealed that studies
using multiple-image targets (View Master stereoscopic slide
reels) obtained significantly higher hit rates than did studies us-
ing single-image targets. By adding motion and sound, the video
clips might be thought of as high-tech versions of the View Mas-
ter reels.
The 10 autoganzfeld studies that randomly sampled from
both dynamic and static target pools yielded 164 sessions with
dynamic targets and 165 sessions with static targets. As pre-
dicted, sessions using dynamic targets yielded significantly
more hits than did sessions using static targets (37% vs. 27%;
Fisher's exact p < .04).
Sender-receiver pairing. The earlier meta-analysis revealed
that studies in which participants were free to bring in friends
N Q
s ~
y X
U. U
N a' r`
M N N
0 o M o
nor-. V1~N~
4 ~`I tyN OON~pN
~tv vO vv?~
Approved For Release 2000/08/08 : CIA-RDP96-00789R003200110001-4
Approved For Release 2000/08/08 : CIA-RDP96-00789R003200110001-4
ANOMALOUS INFORMATION TRANSFER 13
to serve as senders produced significantly higher hit rates than
studies that used only laboratory-assigned senders. As noted,
however, there is no record of how many of the participants in
the former studies actually did bring in friends. Whatever the
case, sender-receiver pairing was not a significant correlate of
psi performance in the autoganzfeld studies: The 197 sessions
in which the sender and receiver were friends did not yield a
significantly higher proportion of hits than did the 132 sessions
in which they were not (35% vs. 29%; Fisher's exact p = .28).
Correlations between receiver characteristics and psi perfor-
mance. Most of the autoganzfeld participants were strong be-
lievers in psi: On a 7-point scale ranging from strong disbelief in
psi (1) to strong belief in psi (7), the mean was 6.2 (SD = 1.03);
only 2 participants rated their belief in psi below the midpoint
of the scale. In addition, 88% of the participants reported per-
sonal experiences suggestive of psi, and 80% had some training
in meditation or other techniques involving internal focus of
attention.
All of these appear to be important variables. The correlation
between belief in psi and psi performance is one of the most
consistent findings in the parapsychological literature (Palmer,
1978). And, within the autoganzfeld studies, successful perfor-
mance of novice (first-time) participants was significantly pre-
dicted by reported personal psi experiences, involvement with
meditation or other mental disciplines, and high scores on the
Feeling and Perception factors of the Myers-Briggs Type Inven-
tory (Honorton, 1992; Honorton & Schechter, 1987; Myers &
McCaulley, 1985). This recipe for success has now been inde-
pendently replicated in another laboratory (Broughton, Kan-
thamani, & Khilji, 1990).
The personality trait of extraversion is also associated with
better psi performance. A meta-analysis of 60 independent
studies with nearly 3,000 subjects revealed a small but reliable
positive correlation between extraversion and psi performance,
especially in studies that used free-response methods of the kind
used in the ganzfeld experiments (Honorton, Ferrari, & Bern,
1992). Across 14 free-response studies conducted by four inde-
pendent investigators, the correlation for 612 subjects was .20
(z = 4.82, p = 1.5 X 10-6). This correlation was replicated in
the autoganzfeld studies, in which extraversion scores were
available for 218 of the 240 subjects, r = .18, t(216) = 2.67, p =
.004, one-tailed.
Finally, there is the strong psi performance of the Juilliard
students, discussed earlier, which is consistent with other studies
in the parapsychological literature suggesting a relationship be-
tween successful psi performance and creativity or artistic abil-
ity.
Discussion
Earlier in this article, we quoted from the abstract of the Hy-
man-Honorton (1986) communique: "We agree that the final
verdict awaits the outcome of future experiments conducted by
a broader range of investigators and according to more stringent
standards" (p. 351). We believe that the "stringent standards"
requirement has been met by the autoganzfeld studies. The re-
sults are statistically significant and consistent with those in the
earlier database. The mean effect size is quite respectable in
comparison with other controversial research areas of human
performance (Harris & Rosenthal, 1988a). And there are reli-
able relationships between successful psi performance and con-
ceptually relevant experimental and subject variables, relation-
ships that also replicate previous findings. Hyman (1991) has
also commented on the autoganzfeld studies: "Honorton's ex-
periments have produced intriguing results. If. . . independent
laboratories can produce similar results with the same relation-
ships and with the same attention to rigorous methodology, then
parapsychology may indeed have finally captured its elusive
quarry" (p. 392).
Issues of Replication
As Hyman's comment implies, the autoganzfeld studies by
themselves cannot satisfy the requirement that replications be
conducted by a "broader range of investigators." Accordingly,
we hope the findings reported here will be sufficiently provoca-
tive to prompt others to try replicating the psi ganzfeld effect.
We believe that it is essential, however, that future studies
comply with the methodological, statistical, and reporting stan-
dards set forth in the joint communique and achieved by the
autoganzfeld studies. It is not necessary for studies to be as au-
tomated or as heavily instrumented as the autoganzfeld studies
to satisfy the. methodological guidelines, but they are still likely
to be labor intensive and potentially expensive.'
Statistical Power and Replication
Would-be replicators also need to be reminded of the power
requirements for replicating small effects. Although many aca-
demic psychologists do not believe in psi, many apparently do
believe in miracles when it comes to replication. Tversky and
Kahneman (1971) posed the following problem to their col-
leagues at meetings of the Mathematical Psychology Group and
the American Psychological Association:
Suppose you have run an experiment on 20 subjects and have ob-
tained a significant result which confirms your theory (z = 2.23,
p