Document Type: 
Document Number (FOIA) /ESDN (CREST): 
Release Decision: 
Original Classification: 
Document Page Count: 
Document Creation Date: 
November 4, 2016
Document Release Date: 
January 22, 2003
Sequence Number: 
Case Number: 
Publication Date: 
June 24, 1991
Content Type: 
PDF icon CIA-RDP96-00789R003100030001-4.pdf34.76 MB
Approved For Release 2003/04/18 : CIA-RDP96-00789R003160d?gCR917-4406-10 Anomalous Mental Phenomena: Selected Papers Compiled By: The Cognitive Sciences Laboratory 24 June 1991 Science Applications International Corporation An Employee-Owned Company 5150 El Carnino Real, Suite B-31, Los Altos, California 94022 (415) 960-5910 Other SAIC Offices: Atoredvgatop aitovelatcgdanotritig tfetujmyrymoo ?elm 6sitostmem 249o, Seattle, Tucson Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 I INTRODUCTION In this volume, we present a selected set of papers on, and/or in support of, anomalous mental phenom- ena. No section could possibly be complete; however, we have chosen papers that are representative of their particular sections. The sections, which are separated by blue sheets, are as follows: Stclian I Introduction II Meta-analyses of Anomalous Mental Phenomena Number of Papers 8 III Main-stream Publications 7 IV Anomalous-mental-phenomena Journal Publications 6 V Magnetoencephalography 3 VI Physics 6 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 II META-ANALYSES OF ANOMALOUS MENTAL PHENOMENA As in all behavioral sciences, replication of experiments in anomalous mental phenomena (AMP) is critical before any putative effects can be verified as part of nature. Because of the complex nature of most behavioral experiments, drawing conclusions from a body of similar experiments has been prob- lematical. Meta-analysis, however, is a relatively new statistical approach that has been specifically de- signed to address the particular difficulties inherent in the behavioral sciences. The papers in this section have been selected because they represent all such analyses of a substantial portion of the published AMP literature to date. Through replication and meta-analysis, the general scientific community will have tools with which to judge the claims of the AMP literature. The number that appears in the upper right?hand corner of the first page for each publication is keyed to the following descriptions: 1. Utts, J., "Successful Replication Versus Statistical Significance," Journal of Parapsychology, Vol. 52, pp. 305-320, (December, 1988). By defining, in statistical terms, the meaning of replication for few?a effects, Utts, a Professor of Statistics from the University of California at Davis, sets the statistical basis for meta-analysis. 2. Honorton, C., "Error Some Place!" Journal of Communication, pp. 103-116, (Winter, 1975). This paper predates the development of formal meta-analysis, but Honorton provides a critical review of all the ESP card-guessing experiments from 1934 to 1939. The paper includes a description of the claims and counter-claims surrounding the controversy of the day. 3. Honorton, C. and Ferrari, D. C., 'Future telling:' A meta-Analysis of Forced-Choise Precognition Experiments, 1935-1987," Journal of Parapsychology, Vol. 53, pp. 282-308, (December, 1989). Using the full complement of meta-analytical tools, Honorton provides a critical review of all the ESP experiments during which the target material (i.e., usually ESP cards) is generated after the guess has been recorded. 4. Honorton, C., Berger, R. E., Varvoglis, M. R, Quant, M., Derr, P, Schechter, E., I., and Ferrari, D. C., "Psi Communication in the Ganzfeld," Journal of Parapsychology, Vol. 54, pp. 99-137, (June, 1990). This paper provides a meta-analysis of Ganzfeld experiments (i.e., a form of anomalous cognition). The database is comprised of 11 series for a total of 355 individual trials. 5. Radin, D. I. and Nelson, R. D., "Evidence for Consciousness-Related Anomalies in Random Physical Systems," Foundations of Physics, Vol. 19, No. 12, pp. 1499-1514, (December, 1989). Radin and Nelson analyze over 800 experiments that claim evidence for mental human-machine interactions (i.e., anomalous perturbation). After a careful analysis, which includes accounting for experiment flaws, they conclude that there is substantial statistical evidence to support the claim. 6. Honorton, C., Ferrari, D. C., and Bem, D. J., "Extraversion and ESP Performance: Meta-Analysis and a New Confirmation," Proceedings of the Parapsychological Association 33rd Annual Convention, Chevy Chase, MD, (August, 1990). In an important link to traditional psychological experimentation, this paper provides a meta-analysis for the correlation of ESP performance and a traditional personality variable, extraversion. Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 7. Rosenthal, R., "Meta-Analytic Procedures and the Nature of Replication: The Ganzfeld Debate," Journal of Parapsychology, Vol. 50, pp. 319-336, (December, 1986) Rosenthal, a professor of psychology at Harvard University, is one of the early developers of the meta-analysis techniques. In this paper, he comments about the Garafeld controversy. 8. Utts, J., "Replication and Meta-Analysis in Parapsychology," Accepted for publication in Statistical Sciences. In this paper, Utts, provides an independent aid objective overview of the AMP meta-analyses that follow. Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Journal of Porapychology, Vol. 52, December 1988 SUCCESSFUL REPLICATION VERSUS STATISTICAL SIGNIFICANCE BY JESSICA UTTS ABSTRACT: The aim of this paper is to show that successful replication in para- psychology should not be equated with the achievement of statistical significance,: whether at the .05 or at any other level. The p value from a hypothesis test is closely related to the size of the sample used for the test; so a definition of suc- cessful replication based on a specific p value favors studies done with large sam- ples. Many "nonsignificant" studies may simply be ones for which the sample size was not large enough to detect the small magnitude effect that was operating. Con- versely, "significant" studies may result froin a small but conceptually insignificant bias, magnified by a very large sample. The paper traces the history of the definition of statistical significance in para- psychology and then outlines the problems with using hypothesis-testing results to define successful replications, especially when applied in a cooklatiok fashion. Fi- nally, suggestions are given for alternative approaches to looking at experimental data. These include calculating statistical power before doing an experiment, using estimation instead of, or In conjunction with, hypothesis testing, and implementing some of the ideas from Bayesian statistics. Replication is a major issue in parapsychology. Arguments about whether a given research paradigm has been successful tend to fo- cus on what the replication rate has been. For exainplc, the recent review of parapsychology by the National Research Council includes statements such as "...of these 188 [RNC] experiments with some claim to scientific status, 58 reported statistically significant results (compared with the 9 or 10 experiments that would be expected by chance)" (Druckman gc Swets, 1988, p. 185). In each section,. the report critically evaluates "significant" experiments and ignores "nonsignificant" experiments. The extent to which nonsignificant experiments are ignored is exemplified by the following oversight, in which the tqtal number of studies is equated with the number of "successful" studies: "Of the thirteen scientifically reported experi- ments [of remote viewing], nine are classified as successful in their outcomes by Hansen et al.... As it turns out, all but one of the nine scientifically reported studies of remote viewing suffer from the flaw of sensory cueing" (p. 183, emphasis added). Apparently the au- thors decided that the four experiments that did not attain a p value of .05 or less did not even warrant acknowledgment. Approved For Release 2003/04/18: CIA-RDP96-00789R003100030001-4 P-1?000?0004?00t168/00-96dati-VI3 914170/C00Z aseeieu JOd 130A0iddV 306 The JoUrnal of Parapyith01.00 The practice of defining a successful replication as :in .experi- ment that attains a p value of .05 or less is common in parapsychol- ogy, psychology, and some other disciplines that use statistics. How- ever, like many other conventions in science, it is based on a series of historical events rather than on rational thought. In this paper, I will trace some of the history leading to this definition of a "suc- cessful" experiment, outline some problems with this approach, and suggest some methods that parapsychologists should consider in ad- dition to the usual hypothesis-testing regimen. Rao (1984) and Hon- orton (1984) have discussed similar problems and solutions in the context of psi experiments. HISTORY It has not always been the case among parapsychologists that an experiment was deemed successful if it reached a significance level of p = .05. In 1917, John Edgar Coover, who was the Thomas Wel- ton Stanford Psychical Research Fellow at Stanford University from 1912 to 193-7, published a book with the results from several exper- iments he had conducted up to that time (Coover, 1917/1975). Al- though hypothesis testing as we know it today had not yet been for- malized, he essentially conducted tests on many facets of this data and found no evidence for psi that was convincing to him. His con- clusions regarding these results are typified by an example he gave in which the hit rate for 518 trials was 30.1%, when 25% was ex- pected by chance (exact p value = .00476): We get 0.9938 [p-value = 1 ? 0.9938 = 0.0062] for the probability that chance deviations will not exceed this limit [of 30,1 percent]....Since this value, then, lies within the field of chance deviation, although the probability of its occurrence by chance is fairly low, it cannot be ac- cepted as a decisive indication of some cause beyond chance which op- erated in favor of success in guessing. (p. 82) He then revealed what level of evidence would convince him that nonchance factors were operating: "...if we meet the requirement of a degree of accuracy usual in scientific work by making P = 0.9999779, when absolute certainty is P = 1, then [there is] satisfac- tory evidence for some cause in addition to chance" (p. 83). In other words, he was defining significance with a p value of 2.21 x Coover was not alone in requiring that, results conform to arbi- trarily stringent significance levels. In 1940, when Rhine et al. pub- Replication vs. Significance 307 fished Extra-Sensmy Perception After Sixty Years, they included the fol- lowing definitions in the glossary: p-value = probability of success in each trial SIGNIFICANCE: When the probability that chance factors alone pnig citiccd a given deviation is sufficiently small to provide relative certaintE that chance is not a reasonable expectation, the deviation is sign:flea:1.* above or below the chance level. Among ESP results, this is arbitrarila taken to mean a deviation in the expected direction such that the criticain ratio is 2.5 times the standard deviation (or four times the probable erg ror) or greater. (p. 423-424) Thus, significance was defined by z 2.5, or p .0062. Seventeen years later, in their book Parapsychology: Frontier Sci- ence of the Mind, Rhine and Pratt (1957) suggested that .01 was theti appropriate threshold: In order for such judgments to have the necessary objectivity, a criterionP. of significance is established by practice and general agreement amon the research workers in a particular field.... Most workers in parapsy- ? cholog-y accept a probability of .01 as the criterion of significance. (p.0 186) Finally, the Journal of Parapsychology has included a definition of0 significance in its glossary for many years, but the appropriate pag value has fluctuated back and forth between .01 and .02, finally set- ding at .02 in 1968. The following are excerpts from those glossar-.9 ies: oo c.o December 1949: "A numerical result is significant when it equals(T) or surpasses some criterion of degree of chance improbabil-E iv. Common criteria are: a prOba-bili-ty value Of .0-1 or less." 8 0 March 1950 to June 1957: "The criterion commonly used in this Journal is a probability value of .02 or less." September 1957: "The criterion commonly used in this Journal ?% is P = .01." December 1957 to December 1967: "The criterion commonly used in parapsychology today is a probability value of .01 or less." March 1968 to December 1986: "The criterion commonly used in parapsychology today is a probability value of .02 (odds of 50 to 1 against chance) or less.... Odds of 20 to 1 (probability of .05) are regarded as strongly suggestive." 308 The journai of Parapyychology Replication os. Significance 309 March 1987: The term significance no longer appears in the glos- sary. By the mid-1980's, despite the value of .02 given in the Journal of Parapsychology, significance seemed to have been determined to -0> correspond to a p value of .05. For example, in their bibliography n of remote-viewing research, Hansen, Schlitz, and Tart (1984) claim: < ? "We have found that more than half (fifteen out of twenty-eight) of a the published formal experiments have been successful, where only m one in twenty would be expected by chance." As mentioned in my 0 n introduction, .05 was the value used by the National Research Coun- cil in their recent evaluation of parapsychology. Both Hyman (1985) (7 and Honorton (1985) used .05 as the criterion for a successful ganz- a) feld study. In discussing the Schmidt REG experiments, Palmer K.) (1985) implicitly used .05 as the cut-off for significance by observ- ing: "Based on Z-tests ... 25 of the 33 (76%) were significant at the E--4- .05 level, two-tailed. In two of the seven non-significant studies...." ^ (p. 102). This definition of significance is obviously not unique to para- psychology. A popular introductory textbook in psychology states O that: ? Psychologists used a statistical inference procedure that gives them an estimate of the probability that an observed difference could have oc- curred by chance. This computation is based on the size of the differ- ? ence and the spread of the scores. By common agreement, they accept a difference as "real" when the probability that it might be due to ? chance is less than 5 in 100 (indicated by the notation p < .05). A sig- ? nificant difference is one that meets this criterion.... With a statistically significant difference, a researcher can draw a conclusion about the be- havior that .was under investigation. (Zimbardo, 1988, p. 54) 8 0 Given the weight that has been attached to .05 as the criterion (6) for significance, one would think that it resulted from careful con- sideration of the issue by statisticians and psychologists. Unfortu- ? nately, such is not the case. Its roots apparently lie in the following 4' passage published in 1926 by one of the founders of modern statis- tics, Sir Ronald A. Fisher: It is convenient to draw the line at about the level at which we can say: "Either there is something in the treatment, or a coincidence has oc- curred such as does not occur more than once in twenty trials." ...If one in twenty does not seem high enough odds, we may, if wc prefer it, draw the line at one in fifty (the 2 per cent point), or one in a hundred (the 1 per cent point). Personally, the writer prefers to set a 0 CD cr) low standard of significance at the 5 per cent point, and ignore entirely all results which fail to reach that level. A scientific fact should be re- garded as experimentally established only if a properly designed exper- iment rarely fails to give this level or significance. (Fisher, 1926, p. 504; also quoted in Savage, 1976, p. 471) Thus began the belief that an experiment is successful only if the null hypothesis can be rejected using a = 0.05. As an immediate consequence of this belief, Fisher and his followers created tables of F statistics that included values only for tail areas of .05 and .01. Since researchers did not have access to computer algorithms to de- termine intermediate p values, success came to be measured in terms of these two values alone. PROBLEMS WITH HYPOTHESIS TESTING Misconceptions about p Values Most modern research reports include p values instead of simply discussing whether an experimental result is significant at a pre- specified level. Although this is somewhat better than the old method of "one star or two" (corresponding to a significant result at .05 br .01, respectively), it is still a misleading way to examine experimental results. The problem is that many?researchers interpret p values as being related to the probability that the null hypothesis is true. Even some so- phisticated researchers tend to think that an extremely small p value must correspond to a very large effect in the population and that a large p value (say > .10) means that there is no effect. In other words, the size of the p value is incorrectly interpreted as the size of the effect. It should be interpreted as the probability of observing results as extreme or more so than those observed, if there is no effect. To see how arbitrary it is to base a decision about the truth or falsity of a statement on a p value, consider a binomial study based on a sample of size n which results in z = 0.30, p value = .38, one- tailed. One would probably abandon the hypothesis under study and decide not to pursue the given line of research. Now suppose that the study had been run with a sample of size 100n instead and resulted in the exact same proportion of hits. Then we would find z = 3.00, p value = .0013. These results would be regarded as highly significant! Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 310 The Journal of Pamp.sychology As another example, consider a chi square test for randomness based on a sequence of n numbers, each of which can take the val- ues 1, 2, ... 10. Suppose that the test results in a chi-square value of 11.0, df = 9, p value = 0.28. Now suppose the sequence was three times as long but the proportions of each digit remained the same. Then each term in the numerator of the chi-square statistic would be multiplied by 32, whereas each term in the denominator would only be multiplied by 3. The degrees of freedom would not change, but the new result would be x2 = 33.0, df = 9, p value = .00013. In the first case, the conclusion would be that the sequence was suf- ficiently random, yet a sequence three times as long with the same pattern would be seen to deviate considerably from randomness! This problem was recognized more than 50 years ago by Berk- son (1938): We may assume that it is practically certain that any series of real ob- servations does not actually follow a normal curve with absolute exactitude in all respects, and no matter how small the discrepancy between the normal curve and the true curve of observations, the chi-square P will be small if the sample has a sufficiently large number of observations in If this be so, then we have something here that is apt to trouble the conscience of a reflective statistician using the chi-square test. For I sup- pose it would be agreed by statisticians that a large sample is always better than a small sample. If, then, we know in advance the P that will result from an application of a chi-square test to a large sample, there would seem to be no use in doing it on a smaller one, but since the result of the former test is known, it is no test at all. (pp. 526-527, emp ha cis in_ original)_ Replication Very often researchers simply do not understand the connection between the p value and the size of the sample. For example, Ro- senthal and Gait() (1963) asked nine faculty members and ten grad- uate students in a university psychology department to rate their degree of belief or confidence in results of hypothetical studies with various p values and with sample sizes of 10 and 100. Given the same p value, one should have more confidence in a study with a smaller sample because it would take a larger underlying effect to obtain the small p value for a small sample. Unf_orturia.tely, theseApprovea i-or Keieaie 2003/04/1 ?: Replication vs. Significance 311 respondents demonstrated that they were far more likely to believe results based on the large sample when the p values were the same. (For a discussion of this example and some other problems with hy- pothesis testing in psychology, see Bakan, 1967.) One consequence of this misunderstanding is that researchers misinterpret what constitutes a "successful replication" of an exper- iment. Tversky and Kahneman (1982) asked 84 members of the American Psychological Association or the Mathematical Psychology Group the following question: Suppose you have run an experiment on 20 subjects, and have obtained a significant result which confirms your theory (z = 2.23, p < .05, two- tailed). You now have cause to run an additional group of 10 subjects. What do you think the probability is that the results will be significant, by a one-tailed test, separately for this group? (p. 23) The median answer given was .85. Only 9 of the 84 respondents gave an answer between .40 and .60. Assuming that the value ob- tained in the first test was close to the true population value, the probability of achieving a p value .05 on the second test is actually only about .47. This is because the sample size in the second study is so .small. The effect would have to be quite large in order to be detected with such a small sample. In the same survey, Tversky and Kahneman also asked: An investigator has reported a result that you consider implausible. He ran 15 subjects, and reported.a significant value, t = 2.46. Another in- vestigator has attempted to duplicate his procedure, and he obtained a nonsignificant value of t with the same number of subjects. The direc- tion was the same in both sets of data. You are reviewing the literature. What is the highest value of t in the second set of data that you would describe as a failure to replicate? (p. 28) The majority of respondents considered t = 1.70 as a failure to replicate. But if the results from both studies are combined, then (assuming equal variances) the result is t = 2.94, df = 29, /9 value = .003. The paradox is that the new study decreases faith in the orig- inal result if viewed separately but increases it when combined with the original data! This misunderstanding about replication is quite prevalent in the psi literature, as demonstrated by the emphasis on successful repli- cation, where success is defined in terms of a specific.p value, re- gardless of sample size. As an example of how unnecessarily dis- 8 CIARDP9iP00189R001/r100036004Giters, I have shown elsewhere (Utts, 17-1?000?0001.?00t168/00-96d0U-VIO 814170/C00Z aseeieu JOd PeACLIddV 319 The foam (I I Of PO rapychology 1986) that if the true hit rate in a binomial study (such as a ganzfeld experiment) is actually 33%, and 25% is expected by chance, then a study based on a sample of size 26 should be expected to be "suc- cessful" (p .05) only about one fifth of the time. Even a study based on a sample of size 100 should be "successful" only about half of the time. It is no wonder that there are so many "unsuccessful" attempts at replication in psi. As another example of the paradoxical nature of this definition of replication, consider the "unsuccessful" direct-hit ganzfeld studies covered by the meta-analyses of Hyman (1985) and Honorton (1985). Using those studies with p(hit) = .25, there were 13 out of 24" that were nonsignificant, a = 0.05, one-tailed. (See Honorton, p. 84, Table Al.) But when these 13 "failures" are combined, the re- sult is 106 hits out of 367 trials, z = 1.66, p = .0485! Problems with Point Null Hypotheses A point null hypothesis is one that specifies a partkular value ("point") as the one being tested. Most hypothesis testing is done with point null hypotheses. The problem with this approach is that any given hypothesis is bound to be false, even if just by a minuscule amount. For example, in a coin-tossing experiment, the null hy- pothesis is that the coin is fair, that is to say, Ho: P = .5000000. This is never precisely true in nature. All coins and coin-tossers in- troduce a slight bias into ?the experiment. This slight bias can pro- duce a very small p value if the sample size is large enough. If, for example, the true probability of heads is .5001, and the observed proportion of heads falls right at. this value, then the null hypothesis will be rejected at .05 if the sample size is at least 6.7 x 107. As long as there is any bias at all, the p value can be made arbitrarily small by taking a large enough sample. In practice, this problem was rarely serious before it became pos- sible to collect large amounts of data rapidly using computers. Stat- isticians have often used ESP as an example of one of the few cases where it really is possible to specify an exact value for the null hy- pothesis. But even this view is changing, as shown by this comment from a recent issue of a popular statistics journal: It is rare, and perhaps impossible, to have a null hypothesis that can be. exactly modeled as 0 = 00. One might feel that hypotheses such as Ho: A subject has no ESP, or Ho: Talking to plants has no effect on their growth, are representable as exact (and believable) point nulls, but, even here, Replication vs. Significance 313 minor biases in the experiments will usually prevent exact representa- tions as points. (Berger & Delampady, 1987, p. 320) In summary, hypothesis testing as it is currently formulated tends to be a misleading approach to examining data. Small samples tend to lead to "nonsignificant" studies, whereas large samples can> lead to extremely small p values, even if the null hypothesis is on1y:00 slightly wrong. Many researchers do not understand the meaning of a p value and do not understand how closely replication issues areg tied to sample size. Arguments about replication should not bea -n based on p values alone. 0 SOLUTIONS Power Calculations c7 If a hypothesis test is to be done at all, a researcher should at least determine in advance whether it is likely to be successful. Thep statistical power of a test is the probability that the null hypothesi will be rejected. It obviously depends on what the true underlying. state of nature is. Because this information cannot be known (orC2 there would be no point in doing the experiment), it is a good ide to look at power for a variety of possibilities before conducting duo experiment. The results will tell you whether you are likely to be;z9 able to reject the null hypothesis, using the sample size you have' planned, for specific values of the magnitude of the effect. Statistical power is a function of the sample size; the true under- lying magnitude of the effect, the level of significance for which th experiment would be considered a success, and the method og analysis used. It does not depend on the data. As an example, suppose you are planning to conduct a test og the hypothesis H.: P = .25 using a series of 10 independent trialsE Power calculations would proceed as follows: 1. Find the cutoff point for the number of hits that would leacla to rejection of Ho. In this case, the p value for 5 hits is .08, and foal:. 6 hits it is .02, so 6 hits would probably be required to reject the null hypothesis. 2. Power for a specific alternative is the probability that the null hypothesis would be rejected if that alternative value is true. In this case, power = P(6 or more hits). This can be computed directly, using the binomial formula, for any specified hit rate. Here are some examples: 314 The Journal of Parapsychology Hit rate Power = P(6 or more hits) 0.30 .047 0.33 .073 0.40 .166 0.50 .377 a Notice that even if the true hit rate is 50% instead of the chance level m of 25%, the chances of a "successful" replication are poor, that is, only 0 n 37.7%. In most psi applications, 30% or 33% is probably a more re- alistic approximation to the true hit rate, so there would be a very 47 small chance of having this experiment succeed with only 10 trials. co As a second example, suppose you are planning to run the same N) experiment with 100 trials and are planning to use the normal ap- proximation instead of an exact test. Further, suppose you will re- ject the null hypothesis if z 1.645, where z is the usual critical o ratio, corrected for continuity: z = (number of hits - 0.5 - "co 25) / V(100 x .25 x .75) = .23(number of hits - 25.5). Using ? ? simple algebra, note that z 1.645 when the number of hits 0 32.65. Thus, the null hypothesis will be rejected if there are 33 or 5, more hits, so power = P(33 or more hits). Computing this for the same hypothetical hit rates as in the previous example gives: 1:1 c.0 Cr) Hit rate Power = P(33 or more hits) 0 0.30 0.33 0.40 0.50 .289 .538 .939 .9998 0 gs Now there is a more reasonable chance for a successful study, al- c6) though it is still only 29% even if the true hit rate is 30%. cs For studies in which the null hypothesis does not involve a single " value, it can be more difficult to compute power because it is not so easy to specify a reasonable alternative. In these cases, it is still pos- sible to look at the p value that can be expected if psychic function- ing were to occur at specified levels for the sample size planned. For example, McClenon and Hyman (1987) conducted a remote-viewing study with eight trials, one for each of eight subjects, and used the preferential-ranking method of Solfvin, Kelly, and Burdick (1978) on the subject rankings. Each subject was asked to rank-order eight Replication. vs. Significance 315 choices of potential targets as compared to the response he or she had produced. By chance, the average rank should be 4.5. If psychic functioning had reduced the average rank to 4.0, the p value would have been .298, not significant. Even if the average > rank had been reduced to 3.5, the study would still not have been;03 significant, p value = .126. The average rank would have to be 3.0 a before this study would achieve a significant result. A parapsychol- ogist experienced in remote viewing should be able to determine in a advance whether such a study would be likely to be successful with such a small sample. The lesson here is that a "nonsignificant" study may be nothing more than a study with low power. Before investing time and money in a new study, it should be determined whether it is likely to succeed if psychic functioning is operating at a given level. Estimation An approach that avoids many of the problems with hypothesis testing is to construct a "confidence interval" or an "interval esti- mate" for the magnitude of an effect. This is done by computing an interval of values that almost certainly covers the true population value. The degree of certainty is called-the confidence coefficient and is specified by the researcher. Common values are 95% and 99%. As,an example, consider a binomial study with 100 trials that results in 35 hits. Using the normal approximation, one would ex- pect the proportion of hits in the sample to be within 1.96 standard deviations of the true hit rate 95% of the time. The appropriate standard deviation for the proportion P of hits is VP(I - P)/n. Thus, a 95% confidence interval for the true hit rate is found by adding-and-sub traeting--1.96 of-these-standard-deviations to the 8 portion of hits observed in the sample. The resulting interval in this cs case is 0.35 - 0.09 to 0.35 + 0.09, or 0.26 to 0.44. This tells us that t.64 with a fair amount of certainty (95%), the true hit rate is covered g by the interval from 0.26 to Q.44: For the same proportion of hits 7% in a study with 1,000 trials, the interval would be from 0.32 to 0.38. 4=. The larger the sample size, the shorter the width of the interval. Consider two studies designed to test H.: P = .5: p value Study 1 3.60 .0004 1,000 Study 2 2.40 .0164 100 316 ? The Journal of Parapsychology Which study provides more convincing evidence that there is a strong effect? In keeping with the results of Rosenthal and Gaito (1963) discussed earlier, most people would say that the first study ikows a stronger effect, both because the p value is smaller and be- -cause it is based on a larger sample. In fact, the opposite is true. Ihe number of hits for the two studies are 557 (55.7%) and 62 2%), respectively; the smaller study had a higher hit rate. The % confidence intervals for the hit rates in the two studies are P.53 to 0.59) and (0.53 to 0.72), respectively, so in both studies we 3:fe relatively sure that the hit rate is at least 53%, but in the second ctudy it could be as high as 72% whereas in the first it is probably 5o higher than 59%. mu/ In studies with huge sample sizes, confidence intervals make it ident that an infinitesimal p value does not correspond to an ef- Tt- cc of large magnitude. For example, consider a study based on G.) 800,000 trials and designed to test I-I?: I' = .50. Suppose there were t0,500 hits. Then z = 3.16, and the p value is 7.9 x 10. But what Fjoes this mean in practical terms? A 95% confidence interval for the ue hit rate is from 0.5019 to 0.5081. Thus, it appears that the true %lit rate is indeed different from 0.50, but reporting the results in *his way makes it clear that the magnitude Of the difference is very mall. The reader can decide whether an effect of this size has any crneaning in the context of the experiment. 0 6 In summary, confidence intervals are preferable to hypothesis Tests for the following reasons: CO C.0 1. They show the magnitude of the effect. X o 2. They show that the accuracy of the conclusion is highly de- o atendent on the sample size. 0 3. They remove the focus from decision making, which is arbi- o Srary at best because of sample size problems. cs 4. They highlight the distinction between statistical significance Snd practical significance. 4. 5. They allow the reader of a research report to come to his or her own conclusion. Meta-Analyses Meta-analytic techniques may be viewed by some parapsycholo- gists as the solution to studying the issue of replication. Even though these techniques can address the replication issue in useful ways, Replication vs. Significance 317 they also contain some dangerous pitfalls. For example, both Hy- man (1985) and Honorton (1985) used "vote-counting" in their meta-analyses of the ganzfeld data base. In other words, they tallied the number of significant studies in the data base. This procedure inherits all of the problems associated with the original determina- tion of whether a study was "significant" in the first place. A series of studies, each with low power, may all be determined to be non- significant, when the combined data may lead to an extremely sig- nificant result. Conversely, a series of studies based on large samples may all be significant, but the magnitude of the effect may be very small. A vote-count showing that most studies are significant could mislead researchers into believing that there was a large effect. The concept of effect size was introduced to account for the fact that individual study results are highly dependent on sample size. ,Estimating the effect sizes for a series of studies and seeing whether ' they are similar is a useful way of studying replication. However, examining only the effect size for an individual study does not give any indication of the accuracy of the result. This should be done in conjunction with some estimate of. the accuracy of the result, Such as a confidence interval. Bayesian Methods Man.)/ statisticians believe that the conceptual framework of hy- pothesis testing and interval estimation is philosophically incorrect. Rather, they start by assigning-prior probabilities, based on subjec- tive belief, to various hypotheses, and then combine these "priors" with the data to compute final or "posterior" probabilities for the hypotheses. This is called the Bayesian approach to statistics. An in- troduction to the ideas of Bayesian analysis can be found in Berger and Berry (1988) or Edwards, Lindman, and Savage (1963). A more technical reference is Berger (1985). Berger and Berry (1988), in a recent article in American Scientist, discussed the use of Bayesian methods instead of classical methods: The first step of this demonstration is to calculate the actual probability that the hypothesis is true in light of the data. This is the domain of Bayesian statistics, which processes data to produce "final probabilities" ...for hypotheses. Thus, the conclusion of a Bayesian analysis might be that the final probability of H is 0.30. The direct simplicity of such a statement compared with the convo- luted reasoning necessary to interpret a P-value is in itself a potent ar- 318 The Journal of Parapsychology gument for Bayesian methods. Nothing is free, however, and the ele- gantly simple Bayesian conclusion requires additional input. To obtain the final probability of a hypothesis in light of the experimental data, it is specify the probability of the hypothesis before or apart from the experimental data. Where does this initial probability come from? The answer is simple. 0 It must be subjectively chosen by the person interpreting the data. A person who doubts the hypothesis initially might choose a probability of a. 0.1; by contrast, someone who believes in it might choose 0.9. (p. 162) 0 They then provide an example of testing the hypothesis H: P = .5, X where P is the proportion of hits expected in a binomial experi- (r7 merit. Suppose that in 17 trials there are 13 successes (76.5%). Then the p value is .049, two-tailed. Unless, of course, the experiment was co designed to stop at the fourth failure instead of at the 17th trial. 0 Then the p value, with the identical data, would only be .021. Such (.4 problems arise with classical methods, but not with Bayesian meth- ods. Using the Bayesian approach, suppose that one's prior belief that H is true is 50%. If H isn't true, the prior belief is that the true 0 value of P is equally likely to be anywhere between 0.5 - c and 0.5 + c (where c is some constant), but could not possibly be farther *1 than that from 0.5. The choice of c represents prior opinion about 0 -0 the strength of the effect, if there is one. Choosing c = 0.1 (the effect isn't likely to be very strong even if it exists) results in a final cE) probability of 0.41 for H (given that there were 13 successes in 17 0 -?1 trials), whereas choosing c = 0.4 results in a probability of 0.21 for co H. In other words, the final degree of belief in H is dependent on 0X one's prior belief about the strength of the effect. It. also depends c,c3 on prior opinion about the veracity of H, and on the observed data. One reason that Bayesian methods are not more widely used is ,c3 that they are often difficult to apply. Another reason is that re- ow searchers are uncomfortable with having to specify subjective de- grees of belief in their hypotheses. This approach makes particular 4. sense for parapsychology, however, because most researchers have strong opinions about the probability that psi is real, and these opin- ions play a central role in how psi researchers and critics evaluate the evidence. Posterior probabilities in Bayesian analyses are a func- tion of both the prior probabilities and the strength of the evidence; it may be informative to formalize these opinions and to see how much evidence would be needed to increase the posterior probabil- ity of a psi hypothesis to a non-negligible level when the prior prob- ability was close to zero. Replication vs. Significance REFERENCES 319 BAKAN, D. (1967). On method: Toward a reconstruction of psychological investi- gation. San Francisco: Jossey-Bass, Inc. . > 1.3HRGER, J. O. (1985). Statistical decision thew). and Bayesian analysts. New York: Springer-Verlag. BERGER, j. 0., & BERRY, D. A. (1988, March-April). Statistical analysis an2t the illusion of objectivity. American Scientist, pp. 159-165. a BERGER, J. 0., & DELAMPADY, M. (1987). Testing precise hypotheses. Statism tical Science, 2(3), 317-334. 0 BERKSON, J. (1938). Some difficulties of interpretation encountered in tITE1 application of the chi-square test. Journal of the American Statistical Ass. dation, 33, 526-542. COOVER, J. E. (1975). Experiments in psychicial research. New York: Arne Press. (Originally published 1917) DRUCKMAN, D., & SWETS, J. A. (1988). Enhancing human performance. WasiE ington, D. C.: National Academy Press. EDWARDS, W., LINDMAN, H., 8c SAVAGE, L. J. (1963). Bayesian statistical i* ference for psychological research. Psychological Review, 70, 193-242. 7:%o FISHER, R. A. (1926). The arrangement of field experiments. Journal of 016 Ministry of Agriculture of Great Britain, 33, 503-513. 1-JANSEN, C. P., SC:I [LITZ. M. J.. Se TART. C. T. (1984). Bibliography, remote4, viewing research, 1973-1981. In R. Targ & K. Harary, The mind ra (pp. 265-269). New York: Villard Books. HONORTON, C. (1984). How to evaluate and improve the replicability in? parapsychological effects. In 13. Shapin & L. Coly (Eds.), The repeatabilitg problem in parapsychology (pp. 238-255). New York: Parapsychologg Foundation, Inc. C.0 HONORTON, C. (1985). Meta-analysis of psi ganzfeld research: A respons to Hyman. Journal of Parapsychology, 49, 51-91. ? 0 -HvivtAN, ganzfeid -psi experiment: A critical-appraisal. Jour-a 0 nal of Parapsychology, 49, 3-49. 0 MCCLENON, J., & HYMAN R. (1987). A remote viewing experiment cont. ducted by a skeptic and a believer. Zetetic Scholar, Nos. 12/13, 21-33. g PALMER, J. (1985). An evaluative report on the current status of parapsy2 chology. U.S. Army Research Institute for the Behavioral and SocialP Sciences, Alexandria, VA. RAO, K. R. (1984). Replication in conventional and controversial sciences. In 13. SIIAPIN & L. COLY (Eds.), The repeatability problem in parapsychology, (pp. 22-41). New York: Parapsychology Foundation, Inc. RHINE, J. B., & PRATT, J. G. (1957). Parapsychology: Frontier science of the mind. Springfield IL: Charles C. Thomas. RHINE, J. B., PRATT, J. G., STUART, C. E., Smrru, 13. M., & GREENWOOD, J. A. (1940). Extra-sensory perception after sixty years. Boston: Bruce Hum- phries. 320 The Journal of Parapsychology ROSENTI IA1., R., tic GniTo, J. (19(13). The interpretation of' levels or signifi- cance by psychological researchers. Journal of Psychology, 55, 33-38. SAVAGE, L. J. (1976). On rereading R. A. Fisher. Annals of Statistics. 4, 4,11- 500. 10LFVIN, G. F., KELLY, E. F., & BURDICK, D. S. (1978). Some new methods -013 for preferential-ranking data. Journal of the American Society for Psychical a Research, '72, 93-110. > -0 gVERSKY, A., & KAHNEMAN, D. (1982). Belief in the law of small numbers. -0 -s a In D. Kahneman, P. Slovic, Sc A. Tversky (Eds.), Judgment under uncer- 0 eT tainty: Heuristics and biases. Cambridge: Cambridge University Press. < CD 7T-rs, J. M. (1986). The ganzfeld debate: A statistician's perspective. Journal a (DX of parapsychology, 50, 393-402. m 0 arImBARD0, P. G. (1988). Psychology and life. Glenview, IL: Scott, Foresman n 0? and Co. X CD CD CV su iivision of Statistics u) CD aniversity of California n.) t avis, CA 95616 o o ?.. co o .. 4. - 0 " Is. co ? ? i3 0 0 -0 F. CD x co 0 6 -0 o to -4 co co O CD X o -4 o co o to c.4 _. X o o o o o c.4 c.4 _. o o o o o o _. c.4 4. o o o _. Approved For Release 2003/04/18: CIA-RDP96-00789R003100030001-4 Paranormal Communication "Error Some Place!" by Charles Honorton Review of the ESP controversy traces debate from statistical and methodological issues to the a priori critique and the paradigm of "normal science." Asked his opinion of ESP. a skeptical psychologist once retorted, "Error Some Placel" I believe he was right, but for the wrong reasons. Western science has always been ambivalent toward the mental side of reality, and it is perhaps not surprising that the occurrence of "psychic" phenomena is one of the most controversial topics in the history of science. The first serious effort toward scientific examination of psi claims was undertaken by the Society for Psychical Research (SPR), founded in London in 1882 for the purpose of "making an organized and systematic attempt to investigate the large group of phenomena designated by such terms as mesmeric, psychical, and spiritualistic." The SPR leadership included many distinguished scholars of the period, and similar organizations quickly spread to other countries, including the American Society for Psychical Re- search, founded in New York in 1885 under the aegis of William James, who himself took an active role in early investigations of mediumistic communications. These turn-of-the-century investigators focused much of their attention on authenticating individual cases of spontaneous experiences suggestive of psi communication. While a great deal of provocative material was care- fully examined and reported (e.g., 18), the limitations inherent in the case study approach prohibited definitive conclusions. However thoroughly au- thenticated, spontaneous cases cannot provide adequate assessment of such potential sources of contamination as chance coincidence, unconscious in- ference and sensory leakage, retroactive falsification, or deliberate fraud. Charles Honorton is director of research in the Division of Parapsychology and Psychophysics. Department of Psychiatry, Maimonides Medical Center, Brooklyn, N.Y. Approved For Release 2003/04/18 : CIA-RDP96-00789R0031000300014 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 1 Journal of Conununication, Winter 1975 Early experimental approaches pri arily involved the "telepathic" reD o- duction of drawings at a distance (6 ). While often striking correspondences were obtained, the experimental cOnditions did not usually provide or random selection of target (stimulus) material, and were not always adequate with respect to the possibility of sensory leakage, intentiona: or otherwise. Neither the spontaneous case stLelies nor the early experimental eF.crts - made much impact upon the scientiFc community, though they drew criucal comment from prominent period sSientists. "Neither the testimony of 311 the Fellows of the Royal Society, nor even the evidence of my own proclaimed Helmholtz, "would leac me to believe in the transmission of thought from one person to anothen independently of the recognized ch--n- nels of sense." Thomas Huxley declined an invitation to participate in7-=, some of the early SPR investigations, saying he would sooner listen to the idle gossip of old women. The ruclments -of an experimental methodology for testing psi were suggested t ree centuries ago by Francis B In Sylva Sylvartirn, a work puhlished posthumously, Bacon discu "experiments in consort, monitory touching transmission of spirits rnd_ forces of imagination." He suggested that "the motions of shuffling cars, or casting of dice" could be used to test the "binding of thoughts. . . . The experiment of binding of thotights should be diversified and triec to the full; and you are to note whetler it hit for the most part though not ? always" (2). The application of probability theory to the assessment of deviaujons from theoretically expected chance outcomes was introduced to psyclic.21_ research in 1884 by the French Noel laureate, Charles Richet, in experi. rents involving card-guessing. Thel popularity of card-guessing as anl ex- perimental methodology was greatly( influenced by the work of J. B. Rhine and his associates at Duke Universiv in the early l930s: Rhine (50) dev isec a standard set of procedures around a simplified card deck contaiAim; randomized se uences of five georinetric forms (circle, cross, wavy li Q-? square, and ircle . These "ESP cards" were prepared in packs of 25, ian( \--"eadi "run" through the pack was associated with a constant binomial p.irot ability of 1/5, since subjects were riot given trial-by-trial feedback. Prceid- ing the experimental conditions ware adequate to eliminate illicit sensor cubes, recording errors, and ration6.1 inference, statistically significant partures from binomial chance exPectation were interpreted as indicating extrasensory communication. Initially, "telepathy" tests consisted of having a subject in one r oor _ attempt to identify the order of the cards as they were observed b-r? a., "agent" in another room. In "clairiloyance" tests, the subject attempted to "guess" the order of the cards diredtly, as they lay-concealed in an opaqt; 104 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Paranormal Communication / "Error Some Place!" container or in another room, without an agent. "Precognition" tests, introduced somewhat later (59), required the subject to make anticipatory guesses of the card order befOre the pack was shuffled or otherwise random- ized. Rhine introduced the term "ESP" in his first major report on the Duke University work in 1934 (50). He reported a total of 85,724 card-guessing trials, carried out with a wide variety of subjects and under a wide range of test conditions. The results as a whole were astronomically significant, though informal exploratory trials were indiscriminately pooled with those carried out under more carefully controlled conditions. The best-controlled work during this period was the Pearce-Pratt distance series of clairvoyance tests (58), in which the subject, Pearce, located in one building, attempted to identify the order of the cards as they were handled, but not viewed, by Pratt, the experimenter, located in another building. The level of accuracy obtained in this series of 1,850 trials was associated with a probability of As a stimulant to experimental research, Rhine's work had unprece- dented influence. For the first time a common methodology was adopted and employed on a large scale by a number of independent and widely separated investigators. For the first time, also, the scientific community was confronted with a body. of data, collected through conventional meth- ods, which it could no longer ignore?nor too hastily accept. The wide- scale adoption of the card?guessing methodology was accompanied by a plethora of critical articles, challenging almost every aspect of the evalua- tive techniques and the experimental conditions. During the period be- tween 1934 and 1940, approximately 60 critical articles by 40 authors ap- peared, primarily in the psychological literature. While card-guessing it no longer the primary methodology in experimental parapsychology, the questions which arose over its use are of equal relevance to the more sophisticated approaches used today. The first major issue concerned the validity of the assumption that the probability of success in the card-guessing experiments was actually .1/5. If chance expectation is other than 1/5, the significance of the observed deviations would obviously be in doubt. This issue was quickly resolved by mathematical proof and through empirical "cross-checks," a form of control series in which responses (guesses) were deliberately compared with target orders for which they were not intended (e.g., responses on run ni matched with the target sequence for run n.,). Empirical cross-checks were reported for 24 separate experimental series involving a total of 12,228 runs (305,700 individual trials). While the actual experimental run scores (e.g., guesses on run n) compared to targets for run n1) were highly sig- nificant and yielded a mean scoring rate of 7.23/25, the control-cross-check 11 Approved For Release 2003/04/18 : CIA-RDP96-00789R0031000300014 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 lout-nal of Communication, Winter 1975 scores were in all cases nonsignificar4, with a mean scoring rate of 5.04 (4). -- Several critics questioned the applicability of the binomial distribution as a basis for assessing the statistical; significance of ESP card-guessing data. Willoughby (711) proposed the use Of an empirical control series, but later withdrew the suggestion after comparing the two methods (79. Alternative methods of deriving the probable !error and recommendations for usibg the empirical standard deviation were also proposed and later withdraWn ? (21, 22). Concern over this issue diiininished and was generally abandoned following the publication of a large chance control series involving half a million trials and demonstrating close approximation to the binomlial model (12). Another question arose about Whether the binomial model provides sufficient approximation to the normal distribution to allow use of normal probability integral tables for det'ermination of significance levels (17). Stuart and Greenwood (73) showed that when the normal distribution is used as an approximation to the binomial model, discrepancies are portant only with cases of borderline significance and few trials. The use of the binomial criticallratio (a) to evaluate the significance. of the ESP card-guessing deviations Was generally approved b.:. professional statisticians (6, 20). Fisher (10), however, commented that high levels of- statistical significance should not bei accepted as substitutes foe:- independent replication. In another vein, Hundngton (20) asked, -If mathematics has successfully disposed of the hypothesis of chance, what has psychology to ? say about the hypothesis of ESP?" The mai:t frequently expressed methodological concern was the possibility of some' form of "sensory leakage," giving the ESE sub feet enough information about the targets to account for significant, ext radiance results As early as 1895, two Danish psychologists, Hansen and Lehmann (16), reported that with the aid of parabolic reflectors-subjects could detec- digits and other material silently concentrated upon by an agent. Jn thes experiments, the subject and agenti sat with their heads close to the foci of two concave mirrors. While the agent concentrated on the number, h' made a special effort to keep his ips closed. Under these conditions, th subjects were frequently successful ip identifying the number. These results were interpreted by Hansen and ILehmann as supporting the hypothes- of "involuntary whispering." The l utilization of subtle sensory cues demonstrated in a careful investigation by S. G. Soal of a stage "telepathist (66). There were also reports, such as the case of "Ilga K.," a mental1i. retarded Latvian child who could read any text, even in a foreign langUag ' when someone stood behind her, reading "silently." Experiments wit.. dictaphonc recordings revealed that "Ilga" was responding to very slight auditory cues (3). 10-6 , Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18: CIA-RDP96-00789R003100030001-4 Journat of Communication,. Winter 1975 able to the ESP hypothesis made 71.5 percent more errors of commission (increasing ESP scores), while those who were unfavorable to the ESP hypothesis made 100 percent more errors of omission (decreasing ESP scores). Murphy (37) reported an analysis of 175,000 trials from experiments reporting positive evidence for ESP and found only 175 errors (0.10 per- - cent). Greenwood (12) reported only 90 recording errors in rechecking his 500,000-trial control study, of which 76 were errors of omission. Some critics also alleged that improper selection of data could account for experimental successes. This could be done in several ways: (a) selection of subjects; (b) selection of particular blocks of data out of larger samples; (c) selection of one of several forms of analysis; and (d) selective reporting of particular studies. The questions raised have sometimes been stated cynically in the form, "Parapsychologists must run 100 subjects before they find one with 'ESP'." As if in defense against this charge, a number of the reported studies specifically stated that all of the data collected were ? included in the analysis (see 43, pp. 118-124, Table 12). Concerning selection of subjects, Warner (76) suggested two criteria: first, results of "poor" subjects must be included up to the point when they are discontinued since it does not matter how many trials a given subject makes as long as all of the trials (for all subjects) are included; second, exclude all preliminary trials (for both "good" and "poor" sub- jects) and use preliminary screening studies to select "good" candidates for formal work. These criteria were generally endorsed by the chief critics of the period (e.g., 23). The question of post hoc selection of analyses was not a point of serious concern in the period between 1934 and 1940, though it is relevant to the assessment of some of the process-oriented investigations reported more recently. The question of whether nonsignificant studies were withheld - from publication involves an issue which is of great toncern to the be- havioral sciences as a whole (70, 81) and one which is difficult to accurately assess since there is no way of knowing how many studies may have been withheld from publication because their results failed to disconfirm he null hypothesis. Several studies of American Psychological Association publication li-_ cies (4, 70, 81) indicate that experimental studies in general are more likely to be published if the null hypothesis is rejected at the conventional 05 and .01 alpha levels than if it is not rejected. These studies also indic.te that a negligible proportion of published studies are replications. Boza th and Roberts (4), in a survey of 1,334 articles from psychological journ Is, found that. 94 percent of the articles involving statistical tests of significance reported rejection of specific null hypotheses; only eight articles (less than 1 percent) involved replications of previously published studies. With respect to the implications of such selection for the ESP hypothesis, there are two partial answers. First, considering the degree of critical, est which prevailed in the 1930s, it seems unlikely that nonsignificant find- ings would have been repressed during this period; second, the high levels Approved For Release 2003/04/18: CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 1Pararrarrnza Communication / "Error Some Moe It is clear that at least some of the early exploratory series reported in Rhine's monograph were open to criticism for inadequate controls against sensory cues. While Rhine did not base major conclusions on such poorly controlled data, inclusion of them in his monograph provided a ready target for critical reviewers and sidetracked discussion away from the better con- trolled work, such as the Pearce-Pratt series, which was not susceptible to explanation by sensory cues. Defects in an early commercial printing of ESP cards were reported by several investigators (18, 25). It was found that the cards were warped and could under certain conditions be identified frOm the back. This discovery circulated widely for a time as an explanation of all successful (i.e., statis- tically significant) experimental series. The parapsychologists retorted that defective cards had not been employed in any bf the experiments reported in the literature and that, in any case, they could not account for results from studies involving adequate screening with such devices as opaque envelopes, screens, distance, or work involving, the precognition paradigm in which the target sequences were not generated until after the subject had made his responses (53, 54, 72). By 1940 nearly one million experimental trials had been reported under conditions which precluded sensory leakage. These included five studies in which the target cards were enclosed in opaque sealed envelopes (41, 45, 46, 54, 59), 16 studies employing opaque screens (7, 8, 11, 19, 83, 34, 35, 38, 41, 42, 44, 45, 46, 59, 71), ten studies involving separation of subjects and targets in different buildings (50, 51, 52, 53, 34, 32, 8, 77, 61, 60), and two studies involving precognition tasks (59, 75). These data are summarized in Table 1. The results were independently significant in 27 of. the 33 experi- ments. By the end of the 1930s there was general agreement that the better- controlled ESP experiments could not be accOunted for on the basis of sensory leakage. The hypothesis that significant "extrachance" deviations in ESP experi- ments might be attributable to motivated scoririlg errors was investigated in several studies. In one investigation (26), 28 observers recorded 11,125 mock ESP trials. Of these, 126 (1.13 percent) were misrecorded. Observers favor- Table 1: ESP card-guessing experiments (1934-1939) excluding sensory cues. Method Studies N (Trials) Mean/25 P< 9 "Clairvoyance" paradigm, stimuli in sealed, opaque envelopes 5 129,775 5.21 4.(V Combined z 6.14 ?1.29 -013 Studies with p < .05 87.5% 0.0% O Mean ES .055 .005 m SD? .045 .035 a. t(15) = 2.61,p = .01 O r = .559 These results are quite striking and suggest that future studies mu' combining these moderators should yield especially reliable effects. SUMMARY AND CONCLUSIONS Our meta-analysis of forced-choice precognition experiments O confirms the existence of a small but highly significant precognition 5 effect_ The effect appears to be replicable; significant outcomes are *.1 reported by 40 investigators using a variety of methodological par- adigms and subject populations. co The precognition effect is statistically very robust: it remains highly significant despite elimination of studies with z scores in the 0 ^ upper and lower 10% of the z-score distribution and when a third co of the remaining investigators?the major contributors of precog- ? nition studies?are eliminated. Estimates of the "filedrawer" problem and consideration of para- _. 0 psychological publication practices indicate that the precognition ef- fect cannot plausibly be explained on the basis of selective publica- ? tion bias. Analyses of precognition effect sizes in relation to eight g measures of research quality fail to support the hypothesis that the 4." observed effect is driven to any appreciable extent by methodolog- ical flaws; indeed, several analyses indicate that methodologically su- perior studies yield stronger effects than methodologically weaker studies. Analyses of parapsychological alternatives to precognition, al- though limited to the subset of studies using random number tables, provide no support for the hypothesis that the effect results from the operation of contemporaneous ESP and PK at the time of ran- lomization. Although the overall precognition effect size is small, this does lot imply that it has no practical consequences. It is, for example, )f the same order of magnitude as effect sizes leading to the early .ermination or several major medical research studies. In 1981, the \Iational Heart, Lung, and Blood Institute discontinued its study of n-opranolol because the results were so favorable to the propranolol reatment that it would be unethical to continue placebo treatment Kolata, 1981); the effect size was 0.04. More recently, The Steering -3ommittee of the Physicians' Health Study Research Group (1988), n a widely publicized report, terminated its study of the effects of tspirin in the prevention of heart attacks for the same reason. The tspirin group suffered significantly fewer heart attacks than a pla- :ebo control group; the associated effect size was 0.03. The most important outcome of the meta-analysis is the identi- ication of several moderating variables that appear to covary sys- ematically with precognition performance. The largest effects are )bserved in studies using subjects selected on the basis of prior test )erformance, who are tested individually, and who receive frial-by- rial feedback. The outcomes of studies combining these factors con- rast sharply with the null outcomes associated with the combination )f group testing, unselected subjects, and no feedback of results. Be- ause the two groups of studies were conducted by a subset of the ame investigators, it is unlikely that the observed difference in per- ormance is due to experimenter effects. Indeed, these outcomes- mderscore the importance of carefully examining differences in ubject populations, test setting, and so forth, before resorting to acile "explanations" based on psi-mediated experirrienter effects or he "elusiveness of psi." The identification of these moderating variables has important nplications for our understanding of the. phenomena and provides clear direction for future research. The existence of moderating ariables indicates that the precognition effect is not merely an nexplained departure from a theoretical chance baseline, but ather is an effect that covaries with factors known to influence lore familiar aspects of human performance. It should now be pos- tble to exploit these moderating ['actors to increase die magnitude nd reliability of precognition effects in new studies. RFTERENCES ?, .KERS, C. (198). Parapsychology is science, but its findings are inconclusive. Behavioral and Brain Sciences, 10, 566-568. 17-1?000?0001?COON68/00-96dC1N-V10 : 81?/170/?00Z asealaN JOA peACLIddV 302 /wawa l'arapsychology BARNETT, V., '11 (1978). (hi1/icr. iii statistical data. New York: Julni Wiley & Sons. BROWNLEE, K. A. (1965). Siatistical theory and methodology in science and engi- neering. New York: John Wiley & Sons. -0 ? COHEN, J. (1977). Statistical power analysis for the behavioral sciences. New York: ? Academic Press. 0 ? DAWES, R. M., LANDNIAN, J., & WILLIAMS, J. (1984). Reply to Kurosawa. Amer- ? icon Psychologist, 39, 74-75. m 1-ioNoRToN, C. (1985). Meta-analysis of psi ganzfeld research: A response to O Ifyinan. Journal of Parapsychology, 49, 51 ?92. ? HYMAN, R. (1985). The ganzfeld psi experiment: A critical appraisal. journal 7 of l'imipAyrholoo, 49, 3-50. (1 ? KOLATA, G. B. (1981). Drug !build to help heart attack survivors. Science, 214, ? 774-775. r%,) MANGAN, G. L. (1955). Evidence of displacement in a precognition test.Immtal 0 0 of Parapsychology. 19, 35-11. C4 MORRIS, R. L. (1982). Assessing experimental support for true precognition. Journal of Parapsychology, 46, 321-336. " ROSENTHAL, R. (1984). Meta-analytic procedures for social research. Beverly Hills, co CA: Sage. C) STEERING COMMI1TEE OF THE PHYSICIANS' HEAL:Ill STUDY RESEARCH GROUP. > (1988). Preliminary. report: Findings from the aspirin component of the ongoing Physicians' Health Study. New England Journal of Medicine, 318, 262-264. STERLING, '11 D. (1959). Publication decisions and their possible effects on inferences drawn from tests of significance?or vice versa. Journal of the 0 American Statistical Association, 54, 30-34. CO WILKINSON, I,. 0984 SYS7i1T: The system far statistics. Evanston, II.: SVS.IAL to 0 0 CilizoNt)LoGicAL LISTING OF STUDIES IN META-ANALYSIS Ce4 0 CARINGTON, W (1935). Preliminary experiments in precognitive guessing:7012/-- 0 0 nal of the Society Jiff Psychical Research, 29, 86-104. C4 RHINE, J. B. (1938). Experiments bearing on the precognition hypothesis: I. 0 0 Pre-shuffling card calling. Journal of Parapsychology, 2, 38-54. 0 RHINE, J. B., Smi-rn, B. M., & Woonicurr, J. L. (1938). Experiments bearing on the precognition hypothesis: II. The role of ESP in the shuffling of cards. Journal of Parapsychology, 2, 119? 131. HUMPHREY, B. M., & 1'RA"1-1-, J. G. (1941). A comparison of five ESP test procedures. Journal of Parapsychology, 5, 267-293. RHINE, J. B. (1941). Experiments bearing upon the precognition hypothesis: III. Mechanically selected cards. journal of Parapsychology, 5, 1-57. STUART, C. E. (1941). An analysis to determine a test predictive of extra-chance scoring in card-calling tests. Journal of Parapsychology, 5, 99-137. HUMPHREY, B. M., & RHINE, J. B. (1942). A confirmatory study of salience in precognition tests. Journal of Parapsychology, 6, 190-219. 0 0 CD CY) 6 A illeta-Analysis of Forced-Choice Precognition Experiments 303 Rittra. J. B. (1942). Evidence of precognition in the covariation of salience ratios. Journal of Parapsychology, 6, 111-143. NR:ot., J. E, & CARINGTON, W. (1947). Some experiments in willed die-throw- ing. Proceedings of. the Society for Psychical Research, 48, 164-175. THout,Ess, R. H. (1949). A comparative study of performance in three psi tasks. journal of Parapsychology, 13, 263-273. BASTIN, E. W, & GREEN, J. M. (1953). Some experiments in precognition. Journal of Parapsychology, 17, 137-143. MGMAI IAN, E. A., & BATES, E. K. (1954). Report of further Marchesi exper- 'molts. Journal of ray-opychology, 18, 82-92. MANGAN, G. L. (1955). Evidence of displacement in a precognition test.Journal l 'a rapsycliology, 19, 35-14. Osis, K. (1955). Precognition over time intervals of one to thirty-three days. Jounial of Parapsychology. 19, 82-91. NIELSEN, W. (1956). An exploratory precognition experiment. Journal of Para- psychology, 20, 33-39. NIELSEN, W (1956). Mental states associated with success in precognition. jour- nal of Parapsychology, 20, 96-109. FAHLER, J. (1957). ESP card tests with and without hypnosis. Journal of Para- psychology, 21, 179-185. MANGAN, G. L. (1957). An ESP experiment with dual-aspect targets involving one trial a clay. Journal of Parapsychology, 21, 273-283. ANDERSON, M., ?& WHITE, R. (1953). A survey of work on ESP and teacher- pupil attitudes. Journal of Parapsychology, 22, 246-268. NASH, C. B. (1958). Correlation between ESP and religious value. Journal of Parapsychology, 22, 204-209. ANDERSON, M. (1959). A precognition experiment comparing time intervals of it feW clays and one year journal of Parapsychology, 23, 81-89. ANDERSON, M., & GREGORY, E. (1959). A two-year program of tests for clair- voyance ancl precognition with a class of public school pupils. Journal. of Parapsychology, 23, 149-177. NASH, C. B. (1960). Can precognition occur diametrically? Journal of Parapsy- chology, 24, 26-32. FREEMAN, J. A. (1962). An experiment in precognition. finanal of Parapsychology, 26, 123-130. RHINE, J. B. (1962). The precognition of computer numbers in a public test. Journal of Parapsychology, 26, 244-251. RM., M. (1962). -Raining the psi faculty by hypnosis. Journal of the Society for Psychical Research, 41, 234-252. SANDERS, M. S. (1962). A comparison of verbal and written responses in a precognition experiment. Journal of Parapsychology, 26, 23-34. FREEMAN, J. (1963). Boy-girl differences in a group precognition test. journal of Parapsychology, 27, 175-181, RAO, K. R. (196,3). Studies in the preferential effect: 11. A language ESP test involving precognition and "intervention." puma/ of Parapsychology, 27, 147- 160. P-1?000?000?00t168/00-96dCIU-VI3 81./170/?00Z aseeieu -10d peACLIddV 304 The Journal of Parapsychology A Meta-Analysis of Forced-Choice Precognition Experiments 305 FREEMAN, J. (1964). A precognition test with a high-school science club. Journal ry. Parapsychology, 28, 214-221. FREEMAN, J., & NIELSEN, W (1964). Precognition score deviations as related to anxiety levels. Journal of Parapsychology, 28, 239-249. SCHMEIDLER, G. (1964). An experiment on precognitive clairvoyance: Part I. The main results. Journal of Parapsychology, 28, 1-14. FREEMAN, J. A. (1965). Differential response of the sexes to contrasting ar- > rangements of ESP target material. Journal of Parapsychology, 29, 251-258. Osis, K., & FAHLER, J. (1965). Space and time variables in ESP. Journal of the a American Society for Psychical Research, 59, 130-145. < ? FAHLER, J., & OSIS, K. (1966). Checking for awareness of hits in a precognition experiment with hypnotized subjects. Journal of the American Society for ^ Psychical Research, 60, 340-346. 0 -I FREEMAN, J. A. (1966). Sex differences and target arrangement: High-school ? booklet tests of precognition. Journal of Parapsychology, 30, 227-235. 4,7 Ror;i:its, D. P. (196(i). Negaiive and posiiive a fleci and ESP nin-score variance. ? Journal of Parapsychology, 30, 151-159. - en M ROGERS, D. P., & CARPENTER, J. C. (1966). The decline of variance of ESP scores within a testing session. Journal of Parapsychology, 30, 141-150. 0 ? BRIER, B. (1967). A correspondence ESP experiment with high-I.Q. subjects. Journal of Parap.sychology, 31, 113- 148. BUZBY, D. E. (1967). Subject attitude and score variance in ESP tests. Journal co of Parapsychology, 31, 43-50. ? ? BUZBY, D. E. (1967). Precognition and a test of sensory perception. Journal of O Parapsychology, 31, 135-142. >, FREEMAN, J. A. (1967). Sex differences, target arrangement, and primary men- ? tal abilities. Journal of Parapsychology, 31, 271-279. ? HONORTON, C. (1967). Creativity and precognition scoring level. Journal of. to ? Parapsychology, 31, 29-42. cE) CARPENTER, J. C. (1968). Two related studies on mood and precognition run- ? score variance. Journal of Parapsychology, 32, 75-89. (Ds:" DuvAL, R, & MONTREDON, E. (1968). ESP experiments with mice. Journal of ? Parapsychology, 32, 153-166. 0 0 FEATHER, S. R., & BRIER, R. (1968). The possible effect of the checker in precognition tests. Journal of Parapsychology, 32, 167-175. 0 0 FREEMAN, J. A. (1968). Sex differences and primary mental abilities in a group 0 precognition test. Journal of Parapsychology, 32, 176-182. s NASI I, C. S., & NAsit, C. B. (1968). Effect of target selection, field dependence, 0 and body concept on ESP performance.Jcrurnal of Parapsychology, 32, 248- 257. RHINE, L. E. (1968). Note on an informal group test of ESP. Journal of Para- psychology, 32, 47-53. RYZL, M. (1968). Precognition scoring and attitude toward ESP.Journal of Para- psychology, 32, 1-8. RYZ1,, M. (1968). Precognition scoring and attitude. Journal of Parapsychology, 32, 183-189. CARPENTER, J. C. (1969). Further study on a mood adjective check list and ESP run-score variance. Journal of Parapsychology, 33, 48-56. DUVAL, R, & MONTREDON, E. (1969). Precognition in mice: A confirmation. Journal of Parapsychology, 33, 71-72. FREEMAN, J. A. (1969). The psi-differential effect in a precognition test.Journal of Parapsychology, 33, 206-212. FREENIAN, J. A. (1969). A pi-ecognition experiment with science teacherslourna/ of Parapsychology, 33, 307-310. JOHNSON, M. (1969). Attitude and target differences in a group precognition test. Journal of Parapsychology, 33, 324-325. MONTREDON, E., & ROBINSON, A. (1969). Further precognition work with mice. Journal of Parapsychology, 33, 162-163. Scurvitrir, H. (1969). Precognition of a quantum process. Journal of Parapsy- chology, 33, 99-108. BENDER, H. (1970). Differential scoring of an outstanding subject on GESP and cliti rvoyai ice. journal of Parap.sychology, 34, 272-273. FREEMAN, J. A. (1970). Sex differences in ESP response as shown by the Free- man picture-figure test. Journal of Parapsychology, 34, 37-46. FREEMANJ. A. (1970). Ten-page booklet tests with elementary-school children. journal Y. PampAychology, 34, 192-196. FREEMAN, J. (1970). Shift in scoring direction with junior-high-school students: A summary. Journal of Parapsychology, 34, 275. FREEMAN, J. A. (1970). Mood, personality, and attitude in precognition. tests. Journal of Parapsychology, 34, 322. HARALDSSON, E. (1970). Subject selection in a machine precognition testlourna of Parapsychology, 34, 182-191. HARALDSSON, E. (1970). Precognition of a quantum process: A modified rep- lication. Journal of Parapsychology, 34, 329-330. NIELSEN, W. (1970). Relationships between precognition scoring level. and mood. Journal of Parapsychology, 34, 93-116. Sctimnrr, H. (1970). Precognition test with a high-school group. Journal of Parapsychology, 34, 70. BELOFF, J., & BATE, D. (1971). An attempt to replicate the Schmidt findings. Journal of the Society for Psychical Research, 46, 21-31. HONORTON, C. (1971). Automated forced-choice precognition tests with a "sen- sitive."Journal of the American Societyfor Psychical Research, 65, 476-481. MrrcitEt.t., E. D. (1971). An ESP test from Apollo 14. Journal of Parapsychology, 35, 89-107. Si:mum., II., & PANTAS, I.. (1971). Psi iests,with psychologically equivalent conditions and internally different machines. Journal of Parapsychology, 35, 326-327. STANFORD, R. G. (1971). Extrasensory effects upon "memory." Journal of the American ;Society for Psychical Research, 64, 161-186. STEILBERG, B. J. (1971). Investigation of the paranormal gifts of the Dutch sensitive Lida T Journal of Parapsychology, 35, 219-225. 17-1?000?0001;?00t168/00-96dCIU-VI3 81./170/?00Z aseeieu JOd peACLIddV 306 The journal of Parapsychology A Meta-Analysis of Forced-Choice Precognition Experiments 307 Timut.Ess, R. H. (1971). Experiments on psi self-training with Dr. Schmidt's pre-cognitive apparatus. Journal of the Society for Psychical Research, 46, 15- 91. HONORTON, C. (1972). Reported frequency of dream recall and ESP. journal of the American Society for Psychical Research, 66, 369-374. JottNsoN, M., & N(m.DBEcK, B. (1972). Variation in the scoring behavior of a "psychic" subject. journal of Parapsychology, 36, 122-132. KELLY, E. E, & KANTHAMANI, B. K. (1972). A subject's efrorts toward voluntary control. Journal of Parapsychology, 36, 185-197. .Scruotn-r, H., & PANTAs, L. (1972). Psi tests with internally different machines. Journal of Parapsychology, 36, 222-232. CRAIG, J. G. (1973). The effect of contingency on premgnition in the rat. Research in Parapsychology 1972, 154? 15(3. FREEMAN, J. A. (1973). The psi quiz: A new ESP test. Research in Parapsychology 1972, 132-134. ARTLEY, B. (1974). Confirmation of the small-rodent precognition work. Journal of Parapsychology, 38, 238-239. HARRIS, S., & TERRY, J. (1974). Precognition in a water-deprived Wistar rat. Journal of Parapsychology, 38, 239. RANDALL, J. L. (1974). An extended series of ESP and PK tests with three ? English schoolboys. journal of the Society for Psychical Research, 47, 485-494. EYSENCK, H. J. (1975). Precognition in rats. Journal of Parapsychology, 39, 222- 227 - HARALussoN, E. (1975). Reported dream recall, precognitive dreams, and ESP. Research in Parapsychology 1974, 47-48. HoNowroN, C., RAMSEY, M., & CABIBBO, C. (1975). Experimenter effects in extrasensory perception. journal of the American Society for Psychical Research, 69, 135-149. KANTHAMANI, H., & RA), H. H. (1975). Response tendencies and stimulus structure. journal of Parapsychology, 39, 97-105. LevTN,J. i;(15A et-pri ex ruin nus with gerbils4H4444-(41-441.-144)44- psychology, 39, 363-365. TERRY, J. C., & 1-1Amus, S. A. (1975). Precognition in water-deprived rats. Research in Parapsychology 1974, 81.. DAVIS, J. W, & HAIGHT, J. (1976). Psi experiments with rats. journal of Para- psychology, 40, 54-55. JAccBs, J., & BREEDERVELD, H. (1976). Possible influences of birth order on ESP ability. Research Letter (Parapsychology Laboratory, University of Utrecht). No. 7, 10-20. N F.V I 1.1,E, R. C. (1976). Some aspects of precognition testing. Research in Para- psychology 1975, 29-31. DRUCKER, S. A., DAMES, A. A., & Rum N, L. (1977). ESP in relation to cognitive development and IQ in young children. Journal of the American Society Jiff Psychical Research, 71, 289-298. ARALDSSON, E. (1977). ESP and the defense mechanism test (DMT): A further validation. European Journal of Parapsychology, 2, 104-114. SARGENT, C. L. (1977). An experiment involving a novel precognition task. Journal of Parapsychology, 41, 275-293. BIERMAN, D. J. (1978). Testing the "advanced wave" hypothesis: An attempted replication. European Journal of Parapsychology, 2, 206-212. BRAUD, W. (1979). Project Chicken Little: A precognition experiment involving the SKYLAB space station. European Journal of Parapsychology, 3, 149-165. HARALDSSON, E., & JoHNsoN, M. (1979). ESP and the defense mechanism test (DMT) Icelandic study No. III: A case of the experimenter effect? European Journal of Parapsychology, 3, 11-20. O'BRIEN, J. T (1979). An examination of the checker effect. Research in Para- psychology 1978, 153-155. CLEMENS, D. B., & PHILLIPS, D. T (1980). Further studies of precognition in mice. Research in Parapsychology 1979, 156. HARALDSSON, E. (1980). Scoring in a precognition test as a function of the frequency of reading on psychical phenomena and belief in ESP. Research Letter (Parapsychology Laboratory, University of Utrecht), No. 10, 1-8. SARGENT, C., & HARLEY, T A. (1981). Three studies using a psi-predictive trait variable questionnaire. Journal of Parapsychology, 45, 199-214. WINKELMAN, M. (1981). The effect of formal education on extrasensory abil- ities: The Ozolco study. Journal of Parapsychology, 45, 321-336. NAsH, C. B. (1982). ESP of present and future targets. journal of the Society for Psychical Research, 51, 374-377. THALBOURNE, M., BELOIT, J., & DELANOY, D. (1982). A test for the "extra- verted sheep versus introverted goats" hypothesis. Research in Parapsychology 1981, 155-156. CRANDALL, J. E., & HITE, D. D.(1983). Psi-missing and displacement: Evidence fill- improperly fbcused psi? Journal of the American Society for Psychical Re- search, 77, 209-228. 54 in- inary findings. Research in Parapsychology 1982, 103-105. .)(311Ns()N, M., & HARP, t.nss()N, E. (1984). The Defense Mechanism Test as a predictor of ESP scores: Icelandic studies IV and Viournal of Parapsychology, 48, 185-200. TEnnEB, 'W. (1984). Computer-based long-distance ESP: An exploratory ex- amination (RB/PS). Research in Parapsychology 1983, 100-101. HESELTINE, G. L. (1985). PK success during structured and nonstructured RNG operation. Journal of Parapsychology, 49, 155-163. HARALDSSON, E., & JOHNSON, M. (1986). The Defense Mechanism Test (DMT) as a predictor of ESP perlbrmance: Icelandic studies VI and VII. Research in Parapsychology 1985, 43-44. VAssv, L. (1986). Experimental 'study of complexity dependence in precogni- tion. journal of Parapsychology, 50, 235-270. P-1?000?0001.?00t168/00-96dCIU-VI3 814170/C00Z aseeieu -10d peACLIddV 308 The Journal of Parapsychology FlmoRT0N, C. (1987). Precognition and real-time ESP performance in a com- puter task with an exceptional subject. journal of Parapyrhnlo , cl, 991- 320. Psychophysical Research Laboratories P 0. Box 569 Plainsboro, NI 08536 17-1?000?0001.?00U68/00-96dCIU-VIO 814170/?00Z aseeieu -10j panoiddv 17-1?000?0001.?00t168/00-96dCIU-VIO : 81./170/C00Z aseeieu -10d peACLICIdV PSI COMMUNICATION IN THE GANZFELD EXPERIMENTS WITH AN AUTOMATED TESTING SYSTEM AND A COMPARISON WITH A META-ANALYSIS OF EARLIER STUDIES BY CHARLES HONORTON, RICK E. BERGER, MARIO P. VARVOGLIS, MARTA QUANT, PATRICIA DERR, EPHRAIM I. SCHECHTER, AND DIANE C. FERRARI 0 CD a ABSTRACT: A computer-controlled testing system was used in II experiments on 0 ganzfeld psi communication. The automated ganzfeld system controls target selection and presentation, subjects' blind-judging, and data recording and storage. Video- taped targets included video segments (dynamic targets) as well as single images CD (static targets). 'Two hundred and forty-one volunteer subjects completed 355 psi Ci) ganzfeld sessions. The subjects, on a blind basis, correctly identified randomly Se- lected and remotely viewed targets to a statistically significant degree, z = 3.89, p = .00005. Study outcomes were homogeneous across the 11 series and eight different 0 experimenters. Performance on dynamic targets was highly significant, z = 4.62, p o = .0000019, as was the difference between dynamic and static targets, p = .002. 0 Suggestively stronger performance occurred with friends than with unacquainted ?I=. sender/receiver pairs, p = .0635. The automated ganzfeld study outcomes are corn- ?% pared with a meta-analysis of 28 earlier ganzfeld studies. The two data sets are con- 0 ? ? sistent on four dimensions: overall success rate, impact of dynamic and static targets, effect of sender/receiver acquaintance, and prior ganzfeld experience. The combined 0 z for all 39 studies is 7.53, p = 9 x 10-'4. )> i3 0 Research on psi communication in the ganzfeld developed as the . result ()I' earlier research suggesting that psi functioning is Fre- quently associated with internal attention states brought about 0 00 C0 ? This work was supported by the James S. McDonnell Foundation of St. Louis, Missouri, and by the John E. Fetzer Foundation of Kalamazoo, Michigan. 0 0 We wish to thank Marilyn J. Schlitz, Peter Rojcewicz, and Rosemarie Pilkington for their help in recruiting participants; Daryl J. Bern of Cornell University and 0 Donald McCarthy of St. Johns University for helpful comments on an earlier draft. 0 of this paper; Edwin C. May of SRI International for performing the audio spectrum 0 c.A.s analysis; and Robert Rosenthal of Harvard University for suggestions concerning 0 data analysis. We also wish to thank several PRL colleagues who contributed in var- ions ways to the work reported here: Nancy Sondow for assistance in the preparation relaxation exercise and instruction tape that was used throughout, and George Hansen and Linda Moore who served frequently as lab senders. Hansen also .pro- vided technical assistance and conducted a data audit resulting in the correction of several minor errors that appeared in a version of this report presented at the 32nd Annual Convention of the Parapsychological Association. Finally, we thank the 241 volunteer participants for providing us with such interesting data. 100 The journal of Parap.sychology through dreaming, hypnosis, meditation, and similar naturally oc- curring or artificially induced states (Braud, 1978; Honorton, 1977). This generalization, based on converging evidence from sponta- neous case studies, clinical observations, and experimental studies, led to the development of a low-level descriptive model of psi func- tioning, according to which, internal attention states facilitate psi de- n O tection by attenuating sensory and somatic stimuli that normally mask weaker psi input (Honorton, 1977, 1978). This "noise-reduc- o_ tion" model thus identified sensory deprivation as a key to the ire- -n ? quent association between psi communication and internal attention ? states, and the ganzfeld procedure was developed specifically to test mT the impact of perceptual isolation on psi performance. () ? - Fifteen years have passed since the initial reports of psi Com- ? munication in the ganzfeld (Brand, Wood, & Brand, 1975; Honorton & Harper, 1974; Parker, 1975). Dozens of additional psi 0 ganzfeld studies have appeared since then, and the success of the 0 paradigm has triggered substantial critical interest. Indeed, there is at least one critical review or commentary for every ganzfeld study co reporting significant evidence of psi communication (Akers, 1984; Alcock, 1986; Blackmore, 1980, 1987; Child, 1986; Druckman & 5 Swets, 1988; Harley & Matthews, 1987; Harris & Rosenthal, 1988; Honorton, 1979, 1983, 1985; Myelin:inn, 1986; Hyman, 1983, O 1985, 1988; Hyman & Honorton, 1986; Kennedy, 1979; McClenon, CO 1986; Palmer, 1986; Palmer, Honorton, & Utts, 1989; Parker & 6 Wiklund, 1987; Rosenthal, 1986; Sargent, 1987; Scott, 1986; _9 Stanford, 1984, 1986; Stokes, 1986; Utts, 1986). co Of the many controversies spanning the history of parapsycholog- ? ical inquiry, the psi ganzfeld domain is unique in three respects. 0 First, the central issue involves the replicability of a theoretically -based technique rather than th-e Special abilities or exCerpth5fiiirin- o 0 dividuals (Honorton, 1977). Second, meta-analytic techniques have (.4 been used to assess statistical significance, effect size, and potential 0 threats to validity (Harris & Rosenthal, 1988; Honorton, 1985; -% Hyman, 1985, 1988; Rosenthal, 1986). Third, investigators and crit- ics have agreed on specific guidelines for the conduct and evaluation of future psi ganzfeld research (Hyman & Honorton, 1986). The Automated Ganzfeld Testing System Psi ganzfeld experiments typically involve four participants. The subject (or receiver, R) attempts to gain target-relevant. imagery while in the ganzfeld; following the ganzfeld/imagery period, R Psi Communication in the Canzfehl 101 tries?on a blind basis?to identify the actual target from among four possibilities. A physically isolated sender (Se) views the target .and attempts to communicate salient aspects of it to R. Two exper- imenters (Es) are usually required. One E manages R, elicits R's ver- bal report of ganzfeld imagery (mentation), and supervises R's blind judging of the target and decoys; a second E supervises Se, and ran- domly selects and records the target. We developed an automated ganzfeld testing system ("autoganz- feld") to eliminate potential methodological problems that were identified in earlier ganzfeld studies (Honorton, 1979; Hyman & Honorton, 1986; Kennedy, 1979) and to explore factors associated with successful performance. The system provides computer control of target selection and presentation, blind judging, subject feedback, and data recording and storage (Berger & Honorton, 1986). A com- puter-controlled videocassette recorder (VCR) accesses and auto- matically presents target stimuli to Se. A second E is required only for assistance in target selection The system includes an experimen- tal design module through which E specifies the sample size and status of a new series. The system was designed to enable further assessment of factors identified with successful performance in earlier ganzfeld studies. Differences in target type and sender/receiver acquaintance seem to be particularly important. Significantly better performance occurred in studies using dynamic rather than static targets. Dynamic targets contain multiple images reinforcing a central theme, whereas static targets contain a single image. Also, studies permitting subjects to have friends as their senders yielded significantly superior perfor- mance compared to those requiring subjects to work, with laboratory senders-. (See-`Comparison of-Study-Outeentes-with-Ganeta- Analysis" in the Results section.) The autoganzfeld system uses both dynamic and static targets. The dynamic targets are excerpts from films; static targets irfclude art work and photographs. Receivers may, if they choose, bring friends or family members to serve as their senders; a session setup module registers the sender type and other session information. In this report, we present the results of the 11 autoganzfeld series conducted between the inauguration of the experiments in February, 1983, and September, 1989, when funding problems required suspension of the PRL research program.' We focus on 'This article conforms to the reporting guidelines recommended by Hyman and Imuirton (1986). !Seca M h C or io ,,iz th d o or is ataba,c, however, it is not practical to P-1?000?0001.?00t168/00-96dCIU-VI3 914170/C00Z aseeieu JOd 130A0iddV nu, 01 r-urapsycnotogy (1) evidence for psi in the autoganzfeld situation, (2) the impact of dynamic versus static targets, (3) the effects of sender/receiver ac- quaintance, (4) the impact of prior psi ganzfeld experience, and (5) a comparison of these four factors with the outcomes of earlier nonautomated psi ganzfeld experiments. Our findings on demo- graphic, psychological, and target factors will be presented in later reports. -o Subjects -o The participants are 100 men and 141 women ranging in age a from 17 to 74 years (mean = 37.3, SD = 11.8). This is a well- educated group; the mean formal education is 15.6 years (SD = 2.0). Our primary sources of recruitment include referrals from col- 67 leagues (24%), media presentations concerning PRL research (23%), friends or acquaintances of PRL staff (20%), and referrals from mu) N) other participants (18%). Belief in psi is strong in this population. On a seven-point scale where "1" indicates strong disbelief and "7" indicates strong belief 4. in psi, the mean is 6.20 (SD = L03); only two participants rated co" their belief in psi below the midpoint of the scale. Personal experi- ? ? ences suggestive of psi were reported by 88% of the subjects; 80% ? reported ostensible telepathic experiences. Eighty percent of the participants have had some training in meditation or other tech- niques involving internal focus of attention. CD Participant Orientation cb 0 '?%1 CO CD 0 0 (.4 (.4 Initial contact. New participants receive an information pack be- fore their first session. The information pack includes a 55-item per- sonal history survey (Participant Information Form [PIF]; Psycho- physical Research Laboratories, 1983), Form F of the Myers-Briggs Type Indicator (MBT1; Briggs & Myers, 1957), general information about the research program, and directions for reaching PRL. Par- ticipants usually return the completed questionnaires before their first session. However, if new participants are scheduled on short 4 notice, they either complete the questionnaires at PRL or, in a few cases, at home after the session. include the data in an appendix to the report. Instead, we will supply the data to qualified investigators in a Lotus-compatible, MS-DOS computer disk file. There is a small fee to cover materials and mailing. Address inquiries to the Journal. Psi Communication in the Ganzfeld 103 Whenever possible, new participants are encouraged to come in for a preliminary orientation session, prior to their first PRL ganz- ['cid session. The orientation serves as a "get acquainted" session for participants and the PRL staff, and introduces participants to the PRL program and facility. Participants who avail themselves of this option generally complete the MBTI and PH' questionnaires during the orientation session. We inform new participants that they may bring a friend or family member to serve as their sender. When a-g participant chooses not to do so, a PRL staff member serves as12, sender. We encourage participants to reschedule their session rather 2 than feel they must come in to "fulfill an obligation" if they are not a feeling well. -n Session orientation. We greet participants at the door when they' arrive and attempt to create a friendly and informal social atmos- phere. Coffee, tea, and soft drinks are available. E and other staff (sT)) members engage in conversation with R during this period. When (T) a laboratory sender is used, time is taken for sender and receiver to N) become acquainted. If the participant is a novice, we describe the rationale and back- ground of the ganzfeld research, and we seek to create positive ex- pectations concerning R's ability to identify the target. This infor- mation is tailored to our perception of the needs of the individual participant, but it generally includes four elements: (1) a brief re- view of experimental, clinical, and spontaneous case trends indicat- ing that ESP is more readily detected during internal attention states such as dreaming, hypnosis, and meditation (Honorton, 1977), (2) the notion that these states all involve physical relaxation and functional sensory deprivation, suggesting that weak ESP impr'es- sions may be more readily detected when perceptual and somatic noise is reduced, (3) the development of the ganzfeld technique to %) test this noise-reduction hypothesis, and (4) the long-term success of g the ganzfeld technique as a means of facilitating psi comrnupicatipri in unselected subjects. We encourage "goal orientation" and discourage excessive "task orientation" during the session; this is especially emphasized with participants who appear to be anxious or overly concerned about their ability to succeed in the ganzfeld task. We discourage partici- pants from analyzing their mentation during the session, and tell them that they will have an opportunity to analyze their mentation during the judging procedure. They are encouraged to adopt the role of an outside observer of their mental processes during the ganzfeld Again, this is emphasized with those who appear anxious 0 CO . . 0 -0 CD 6 0 0 0 0 CA) 0 0 0 104 The Journal of Parapsychology about their performance; they are advised to relax, follow the taped instructions, and to simply allow the procedure to work. We inform participants that they may experience various types of correspond- ence between their mentation and the target; they are told that they may experience direct, literal correspondences to the target, but that they should also be prepared for correspondences involving distor- tions or transformations of the target content, cognitive associations, and similarities in emotional tone. Finally, we orient new partici- pants to where Se and E will be located during the session. Layout and Equipment R and Se are sequestered in nonadjacent, sound-isolated and electrically shielded rooms. Both rooms are copper-screened, and are 14 ft apart on opposite sides of E's monitoring room, which pro- vides the only access. R and Se remain isolated in their respective rooms until R completes the blind-judging procedure. R's room is an Industrial Acoustics Corp., IAC 1205A Sound- Isolation Room, consisting of two 4-inch sheetrock-filled steel panels. The two panels are separated by a 4-inch air space, for a total thickness of one foot. The inside walls and ceiling of Se's room are covered with 4-inch Sonex acoustical material, similar to that used in commercial broadcast studios. A free-standing Sonex-covered plywood barrier (5 ft wide by 8 ft high) positioned inside the sender's room, between Se's chair and the acoustical door, blocks sound transmission Through?the?claw frame. Figure 1 shows the -fluor plan of the ex- 0 0 periinental rooms. 0 E occupies a console housing the computer system and other cs equipment. The computer is an Apple II Plus with two disk drives, 2 a printer, and an expansion chassis. The computer peripherals in- 4 dude a real-time clock, a noise-based random number generator (RNG), a Cavri Interactive Video Interface, an Apple game pad- dle, and a fan. Other equipment includes a color TV monitor, the VCR used to access and display targets, and three electrically iso- lated audiocassette recorders. One audiocassette recorder presents audio stimuli (prerecorded relaxation exercises, session instructions, and white noise). Another plays background music during the ex- perimental setup. The third records R's ganzfeld mentation and Psi Communication in the Ganzfeld 105 RECEIVER E's equipment console Industrial Acoustics 12.05A Sound Isolation Room SENDER 0 EXPERIMENTER Figure 1. Floor plan of experimental suite. SCALE 5 ft Double wall with 4" Sonex Acoustical Padding and acousti- cal door judging period associations. There is two-way intercom cation between E and R. One-way audio communication Se allows Se to listen to R's ganzfeld mentation. Receiver Preparation 0 0 I commumco from R -0 CD R sits in a comfortable reclining chair in the IAC room. Se keeps5 R company while E prepares R for visual and auditory ganzfel stimulation. Translucent hemispheres are taped over R's eyes witho Micropore t4' tape. Headphones are placed over R's ears. A clip-or microphone is fastened to R's collar. A 600-watt red-filtered flood ?light;located approximateTy 6 ft in?fiont of R's face, is adjusted inS - intensity until R reports a comfortable, shadow-free, homogeneoug visual field. White noise level is similarly adjusted; R is informect that the white noise should be as loud as possible without being ang noying or uncomfortable. The ganzfeld light and white noise inten- sity are adjusted from E's console after R and Se are sequestered in their respective rooms. Sender Preparation Se sits in'a comfortable reclining chair in the sender's room. Se faces a color.TV monitor, wearing headphones. During the session, Se can hear R's mentation report through one headphone; if dy- 0 CD 0- 0 crt 01 0 cr) ???1 03 C.0 0 0 C.4 0 0 0 C.4 0 0 0 106 The Journal of Parapsychology namic targets are used, Se hears the target audio channel through the other headphone. Series Manager Setup Procedures E accesses the autoganzfeld computer program through the Se- ries Manager software. Series Manager is a password-protected, menu- driven control program. It provides the only means through which an experimenter may specify parameters for the series design, reg- ister new participants in the series, set up a session, and run a ses- sion. The Series Manager menu is accessed through entry of a private (and nonechoing) password. Series design. A valid series design must exist before sessions can be run in an experimental series. This is done through the Series Manager "design" module. The design module prompts E to specify the type of series (pilot, screening, or formal), the number of participants, the maximum number of trials per participant, the total number of trials per series, and the series name. There is no provision for changing the series design once it is accepted by E. Design parameters are saved in a disk file; they are passed to the experimental program at the beginning of the session. Participant registration. When R is new to a series, E accesses "Participant Registration" from the Series Manager menu before the session. E is prompted to enter R's name and identification number. The module verifies that the maximum number of participants specified in the design is not exceeded. (An error message appears if an attempt is made to register more participants than are speci- fied in the design; then, control is returned to the Series Manager menu.) Session setup. E then selects "Session Setup" from the Series Man- ager menu, E is prompted to enter R's name and thc program ver- ifies that R has not already completed the maximum number of trials specified in the design module. (An error message appears if a participant has completed the number of sessions allowed for the series or has not been properly registered; control is then returned to the Series Manager menu.) E enters Se's name and the sender type: lab, lab friend, or friend. Lab senders are PRL staff members whose acquaintance with the participant is limited to the experi- ment. Lab friend refers to PRL staff senders who have some .social acquaintance with R outside the laboratory. Friend senders are friends or family members of the participant. Finally, E enters the ganzfeld light and noise intensity levels and his or her initials. E then leaves Psi Communication in the Ganzfeld 107 the monitoring room while another PRL staff person supervises tar- get selection. Targets The system uses short video segments (dynamic targets) and still pictures (static targets) as targets. Dynamic targets include excerpts from motion pictures, documentaries, and cartoons. Static targets 4'; include art prints, photographs, and magazine advertisements. There are 160 targets, arranged in judging sets of four dynamic 2 or four static targets. The sets were constructed to minimize simi- a larities among targets within a set. The targets are recorded on four -n one-half-inch VI-IS format videocassettes; each videocassette con- 9, tains 10 target sets (5 dynamic and 5 static). A signal recorded on g? an audio track of each videocassette allows computer access of the (7 targets. Target display time?to Se during each sending period and Po' to R during the judging period?is approximately one minute; blank space added to briefer targets insures that the VCR remains g in play mode for the same length of time for all targets. Preview packs. The video display format of the autogan'zfeld tar- 2 gets does not permit simultaneous viewing of the entire target set colZ during the judging procedure as is done in many nonautomated ganzfeld studies. Each target set is therefore accompanied by a pre- 0 view pack containing brief excerpts of all four targets in the set; this gives R a general impression of the range of target possibilities. R views the preview pack at the beginning of the judging procedure; :to] it runs approximately 30 sec. 6 CO CD Titc IIIISCL sclectoi (TS) is a PRL staff member who has no eon- g tact with either E or R until after the blind-judging procedure. TS (-4 is needed to load the videocassette containing the target into the g VCR. TS is informed which of the four videocassettes contains the (.9 target, but remains blind to the target's identity. If Sc is a staff g member, Se serves this role; otherwise, a staff member not involved 0 in the session serves as TS. (In the latter case, Se and R are segues- 4. tered in their respective rooms before TS enters the monitoring room.) The Series Manager program prompts TS to press a key on the computer keyboard. A program call to the hardware RNG obtains the target.-yalue (a number between 1 and 160) and stores it in, corn- Target Selection 108 The Journal of Parapsychology puter memory.' The program determines the target set and video- cassette number from the target value. The videocassette number is displayed on the monitor, and TS is prompted to insert it into the VCR. The program verifies that the correct videocassette has been inserted and clears the monitor screen; if the videocassette is not -cs correct, an error message prompts TS to insert the correct video- cassette. 0 TS places a cardboard cover over the VCR's front panel to con- e. a ceal the digital counters and VU meters. Finally, TS leaves the mon- m itoring room with the three remaining videocassettes, knocking 0 n three times on the monitoring room door as a signal for E to return. (7) Relaxation Exercises and Ganzfeld Instructions co 0 Ts' 0 co co co CD 0 0 0 0 0 0 0 0 R and Sc undergo a I4-min prerecorded relaxation exercise be- fore the mentation/sending period: This provides a unique shared experience for R and Se before the ESP task. The relaxation exer- cise includes progressive relaxation exercises and autogenic phrases (Jacobson, 1929; Shultz, 1950). Ganzfeld instructions are recorded after the relaxation exercise. The instructions and relaxation exer- cise are delivered in a slow, soothing but confident manner with ocean sounds in the background. The style of presentation is similar to a hypnotic induction procedure. The ganzfeld instructions to R, which are also heard by Se, areas follows: During this experiment we want you to think out loud. Report all of the images, thoughts, and feelings that pass through your mind. Do not cling to any of them. Just observe them as they go by. At some point during the session, we will send you the target information. Do not try to anticipate or conjure up this information. Just give yourself the sug- gestion, right now?m-theTormiT) -making a wisii?that the information will appear in consciousness at the appropriate time. Keep your eyes open as much as possible during the session and allow your conscious- ness to flow through the sound you will hear through the headphones. One of us will be monitoring you in the other room. Now get as coni- fortable as possible, release all conscious hold of your body, and allow it to relax completely. As soon as you begin observing your mental proc- esses, start thinking out loud. Continue to share your thoughts, images, and feelings with us throughout the session. 2 An exception Occurs in the two target comparison series (Series 301 and 302). See pp. 112-113. Psi Communication in the Ganzfeld 109 Mentation/Sending Procedures Receiver mentation report. After the relaxation exercise and in- structions. R listens to the white noise through headphones for 30 minutes. R reports whatever thoughts, images, and feelings occur in_g the ganzfeld. The mentation report is monitored by E and Se fronig their respective rooms. The mentation report is tape recorded, and 2 E takes detailed notes for review from R prior to judging. cr. Target presentation and sender procedures. A Cavri Video Interface -n automates computer access and control of targets from a JVC BR- 9, 6400U VCR. An electronic video switcher selectively routes the 47? video output (VCR or computer text mode) to three color TV mon- ET itors, one each for E, R, and Se. E's and R's monitors remain in computer text mode until the judging period. During each of the (D six sending periods, Se's TV monitor is switched from computer 6- 0 text to VCR mode. At the beginning of each sending period, Se's monitor displays .c12 the prompt., "Silently communicate the contents and meaning of the target to [R's first name]." Sc views the target and attempts to corn- 00 munic:ate its contents to R. Se mentally reinforces R for target- 0 related associations and mentally discourages R when the mentation is unrelated to the target. -0 Judging Procedure CD co 6 After the mentation period, E turns off the ganzfeld light and reads back R's mentation from the session notes. R remains in ganz- feld during the mentation review to minimize any abrupt shift in ??) state. E's and R's TV monitors are switched into VCR mode by the ej puter,-which-also-prompts Se-to "Silently-direct [R's -first name] to select the target that you saw." Se's TV monitor remains blank g (computer mode) during this period. co R removes the eye covers and view's the preview pack.-- From 8 their respective rooms, R and E then view the four potential targets (the actual target and three decoys), which are presented in one of .12. four random sequences. R, viewing each candidate, associates to the item as though it were the actual target, describing perceived simi- larities between the item and the ganzfeld mentation. While R as- sociates to each candidate, E points out potential correspondences that R may,have overlooked.' R views any of the target candidates as often as desired before proceeding to the judging task. 3 This applies to Pilot Series 3, Novice Series 103-105, and to Experienced Series 110 The Journal of Parapsychology A 40-point rating scale then appears on R's TV monitor. The scale is labelled 0% on the left and 100% on the right. Using a coin- puter-ganie paddle to move a pointer horizontally across the rating scale, R indicates the degree of similarity between his ganzfeld men- tation and each potential target. E and Se view R's ratings on their monitors. The program checks for ties, and, if they occur, R re-rates the four candidates to obtain unique ratings for each. The program then converts R's ratings into ranks. A rank of 1 is assigned to the candidate R believes has the strongest similarity to his ganzfeld men- tation; a rank of 4 is given to the candidate R believes is least like his ganzfeld experience. Feedback and Post-Session Procedures After R finishes judging, Se leaves the sender's room and enters R's room with E. Se reveals the actual target, which the computer automatically displays on R's TV monitor. The session data are ?vrit- ten to a floppy disk file. Following feedback, E is prompted to backup the series data disk. The target videocassette is then automatically wound to a po- sition near the center of the videocassette (frame 50,000). E selects "Analysis" from the Series Manager menu, and obtains a hardcopy printout of the session data file. The printout includes: the file name, R's name and ID number, series type, session number, Se's name, E's initials, date and start time, target number, target position in the set, R's target ranking, the standardized target rating (z score), target judging sequence, target name, target type and set number, sender type, light and white noise levels, finish time, and optional experimenter's comments. The printout is attached to E's notes on R's mentation and placed in a ring binder containing all such information for the series. The audio tape of the session is sim- ilarly filed. Experimenters Eight Es contributed to the autoganzfeld database. Honorton, one of the originators of the psi ganzfeld technique, has conducted psi ganzfeld experiments over a 16-year period. Derr and Varvoglis 201 and 302, It does not apply to (lie earlier series (Pilot Series 1-2; Novice Series 101-102; or Experienced Series 301). This practice was initiated because participants frequently railed to identify obvious correspondences between their tneittation and target elements. Psi Communication in the Ganzfeld 111 worked with Honorton at Maimonides Medical Center and were trained by him. Berger is primarily for the technical im- plementation of the autoganzfeld system. He trained Honorton, Derr, Varvoglis, and Schechter in its use. Honorton trained Quant, Ferrari, and Schlitz in the use of the autoganzfeld system.' Experimental Series Altogether, 241 participants contributed 355 sessions in 11 sa ries. To fully address the issue of selective reporting, we inclucrv every session completed from the inauguration of the experimen?ri in February, 1983, to September, 1989, when the PRL facility vies closed. Thus, this database has no "file-drawer" problem (Rosenth31 1984). The studies include three pilot series and eight formal seri. Five of the formal series were single-session studies with novice pAt- ticipants. The remaining three formal series involved experiencgi participants. 0 Pilot Series CO Series I. This initial pilot series was conducted during the devil opment and testing of the autoganzfeld system. It served to test s s- tern operation, to detect and correct programming errors; and fine-tune session timing functions. Nineteen subjects contributedy sessions as Rs. Seven, including PRI, staff members, had pripr perience as Rs in nonautomated ganzfeld studies at Maimonigs Medical Center. The remaining 12 Rs were novices with no prar ganzfeld experience. Series sample size was not specified in adyair).; the series continued until we were satisfied that the system was go- erating reliably. Series 2. This pilot series was designed by Berger in an atterept to avert potential displacement effects and subject judging problas by having E rather than R serve as judge: R received feedback ?ly to the actual target. Four participants contributed to this seas. Nine of the planned 50 sessions were completed before Berger's kle- parture. from NU, when this series was discontinued. Berger', Schechter, and Varvoglis have doctorate degrees in psychology. Quant holds a masters degree in counselling psychology, and Ferrari has a bachelors degree in psychology. Schlitz has conducted independent garizteld and remote-viewing re- search in'other laboratories and has a masters degree in anthropology. . ? 0 Feedback and Post-Session Procedures (T) ? After R finishes judging, Se leaves the sender's room and enters rsa R's room with E. Se reveals the thual target, which the computer automatically displays on R's TV monitor. The session data are writ- ten to a floppy disk file. Following feedback, E is prompted to backup the series data co- disk. The target videocassette is then automatically wound to a po- sition near the center of the videocassette (frame 50,000). E selects ? ? O ? "Analysis" from the Series Manager menu and obtains a hardcopy printout of the session data file. The printout includes: the file ? name, R's name and ID number, series type, session number, Se's co name, E's initials, date and start time, target number, target position 6 in the set, R's target ranking, the standardized target. rating (z o score), target judging sequence, target name, target type and set co number, sender type, light and white noise levels, finish time, and co ? optional experimenter's comments. The printout is attached to E's notes on R's mentation and placed in a ring binder containing all such information for the series. The audio tape of the session is sim- ilarly filed. 0 0 0 Experimenters ? Eight Es contributed to the autoganzfeld database. Honorton, one of the originators of the psi ganzfeld technique, has conducted psi ganzfeld experiments over a 16-year period. Derr and Varvoglis 110 The Journal of Parap.sychology A 40-point rating scale then appears on R's TV monitor. The scale is labelled 0% on the left and 100% on the right. Using a com- puter-game paddle to move a pointer horizontally across the rating scale, R indicates the degree of similarity between his ganzfeld men- tation and each potential target. E and Se view R's ratings on their monitors. The program checks for ties, and, if they occur, R re-rates the four candidates to obtain unique ratings for each. The program then converts R's ratings into ranks. A rank of 1 is assigned to the candidate R believes has the strongest similarity to his ganzfeld men- tation; a rank of 4 is given to the candidate R believes is least like his ganzfeld experience. 201 and 302. It does not apply to the earlier series (Pilot Series 1-2; Novice Series 101-102; or Experienced Series 301). This practice was initiated because participants frequently failed to identify obvious correspondences between their mentation and target elements. Psi Communication in the Ganzfeld 111 worked with Honorton at Maimonides Medical Center and were trained by hint. Berger is primarily responsible for the teci-",;ctil im- plementation of the autoganzfeld system. He trained Honorton, Derr, Varvoglis, and Schechter in its use. Honorton trained Quant, Ferrari, and Schlitz in the use of the autoganzfeld system.' Experimental Series -0 -0 Altogether, 241 participants contributed 355 sessions in 11 sei: ries. To fully address the issue of selective reporting, we include o. every session completed from the inauguration of the experimentisi in February, 1983, to September, 1989, when the PRL facility wA closed. Thus, this database has no "file-drawer" problem (Rosenthat 1984). (T) The studies include three pilot series and eight formal serieN. Five of the formal series were single-session studies with novice paiD- ticipants. The remaining three formal series involved experienca participants. Pilot Series CO Series I. This initial pilot series was conducted during the devg- opment and testing of the autoganzfeld system. It served to test s- tern operation, to detect and correct programming errors, and tly. line-tune session tinting. functions. Nineteen subjects contributed a sessions as Rs. Seven, including PRI, stall. members, had pripr AR- perience as Rs in nonautomatecl ganzfeld studies at MaimonicEs Medical Center. The remaining 12 Rs were novices with no prcor ganzfeld experience. Series sample size was not specified in adyanre; the series continued until we were satisfied that the system was *- erating reliably. ? ? Series 2. "rhis pilot series was designed by Berger in an attengt to avert potential displacement effects and subject judging problems by having E rather than R serve as judge: R received feedback o2y to the actual target. Four participants contributed to this serles. Nine of the planned 50 sessions were completed before Berger's ite- parturc from PRL when this series was discontinued. Berger, Schechter, and Varvoglis have doctorate degrees in psychology. Quant holds a ma.sters degree in counselling psychology, and Ferrari has a bachelors degree in psychology. Schlitz has conducted independent ganzfeld and remote-viewing re- search mother laboratories and has a masters degree in anthropology. . P-1?000?0001,COM68/00-96dCIU-VI3 81./170/?00Z eseeieu Jod peAwddv 112 The journal of Parapsychology Series 3. This pilot series was a practice series for pailicipants who completed the allotted number of sessions in ongoing formal series but who wanted additional ganzfeld experience. This series also includes several demonstration sessions when TV film crews were present and provided receiver experience for new PRL staff. The sample size was not preset. Novice ("Firstzl'imers") Series The identification of characteristics associated with successful in- itial performance was a major goal of the PRL ganzfeld project (Honorton & Schechter, 1987). Except for Series 105, each novice series includes 50 ganzfeld novices, that is, participants with no prior ganzfeld experience. Each novice contributed a single ganz- feld session. Most novices had not participated in any psi experiment prior to the novice series. Series 101. This is the first novice series. Series 102. Beginning with this series, R was prompted after the mentation period to estimate the number of minutes since the end of the relaxation/instructions tape. Series 103. Starting with this series, Rs were given the option of having no sender (i.e., "clairvoyance" condition). Only four partici- pants opted to have no sender. Series 104. A visiting scientist (Marilyn Schlitz) served as E in seven sessions and as Se in six sessions with subjects from The Juil- liard School in New York. Series 105. This series was started to accommodate the overflow of Juilliard students from Series 104. The sample size was set to 25. Six sessions were completed at the time the PRL program was sus- pended. (There were 20 Juilliard students altogether. Sixteen were in Series 104 and four were in Series 105.) Experienced Subjects Series Series 201. This series involved especially promising subjects. The number of trials was set to 20. Seven sessions by three Rs were completed at the time the PRL program was suspended. Series 301. This series compared dynamic and static targets. Sample size was set to 50 sessions. Twenty-five experienced subjects each contributed two sessions. The autoganzfeld program was mod- ified for this series so that each R would have one session with dy- Psi Communication in the Ganzfeld 113 mimic targets and one session with static targets. Subjects were in- formed of this only after completing both sessions. Series 302. This series used a single dynamic target set (Set 20). In earlier series, Target 77 ("Tidal Wave Engulfing Ancient City") had an especially strong success rate while Target 79 ("High-Spe41 Sex Trio") had never been correctly identified. We made two pt- gram - gram modifications for this series. The target selection ("Randoit ize") routine was modified to select only targets in Set 20, and tick VCR tape-centering routine was modified to wind the videotape r.R a randomly selected position between frame numbers 85,000 argl 95,000. The second modification insured that E could not be cuet perhaps unconsciously, by the time required to wind the tape fro?" its initial position to the target location. The study involved experienced Rs who had no prior experien& with Set 20. Each R contributed one session. Participants were ur6) aware of the purpose of the study or that it was limited to one targa set. The design called for the series to continue until 15 sessioiR were completed with each of the two targets of interest. Twenty-fiv:a sessions were completed when the PRL program was suspended. !):: 0 Statistical Analysis Except for two pilot series, series sample sizes were specified icn2 advance. Our primary hypothesis was that the observed succear rate?the proportion of correctly identified targets?would reliable exceed the null hypothesis expectation of .25. To test this hypothi`lo esis, we calculated the exact binomial probability for the observe number of direct hits (ranks of 1) with p = .25 and q = .75. Org the basis of the overwhelmingly positive outcomes of earlier studiev we preset alpha to .05, one-tailed. 0 We also tested two secondary hypotheses, based on riatterns oE success in earlier psi ganzfeld research. These are: (1) that,dynami.0 targets are significantly superior to static targets, and (2) that per? formance is significantly enhanced when the sender is a friend of compared to when R and Se are not acquainted. We initially planned to test these hypotheses by chi-square tests, a trial-based analysis. However, a consultant (Dr. Robert Rosenthal) suggested that a t test using the series as the unit would be a more powerful test of these hypotheses, and we have followed his recommendation. The remaii-iing analyses are exploratory.' 5 The statistical analyses in this report were performed using SYSTAT ,(Wilkin- 114 The journal of Parapsychology TABLE 1 OUTCOME BY SERIES Series Series type Hits Effect size subjects trials N % (h) 19 4 25 50 50 50 50 6 3 25 25 I Pilot .25 .99 > 2 Pilot .18 .25 13 13 3 Pilot .07 .-- g() 3101 Novice -.02 -.30 < . CD 102 Novice .24 1.60 al 03 Novice .11 .67 71104 O Novice .24 1.60 n 105 Novice .87 1.78 X (D201 Experienced .38 .69 (7301 Experienced .11 .67 D) 0302 Experienced .81 3.93 CD IV Overall 241 355 122 34 .20 3.89 o 0 Note. The z scores are based on the exact binomial probability with p = .25 4C2and q = .75. " co 29 8 9 3 36 10 50 12 50 18 50 15 50 18 6 4 7 3 50 15 25 16 36 33 28 24 36 30 36 67 43 30 64 RESULTS 0 ),>Overall Success Rate ? Ganzfeld hit rate. There were 241 participants, who contributed g355 autoganzfeld sessions. The 122 direct hits (34.4%) yield an exact inomial p of .00005 (z = 3.89). The effect size, Cohen's It (Cohen, V01977), is .20. The 95% confidence interval (CI) is a hit rate from ro% to 39%. Because this level of accuracy would occur about one ?time in 20,000 by chance, we reject the null hypothesis. (See 'Fable c41.) 0 Success rale by series. Of the 11 series, 10 yield positive outcomes. oThe mean series effect size is .29, SD = .29, t (10) = 3.32. o Homogeneity of effect sizes. Traditionally, psi investigators have obeen preoccupied by whether there is a significant nonzero effect. -11 An equally important issue, however, is the size of the effect. There is a growing tendency among behavioral scientists to define replic- ability in terms of the homogeneity of effect sizes (Hedges, 1987; son, 1988). When t tests are reported on samples with unequal variances, they are calculated using the separate variances within groups for the error and degrees of freedom following Brownlee (1965). Combined zs are based on Stouffer's method (Rosenthal, 1984). Unless otherwise specified, p levels are one-tailed. Psi Communication in the Ganzfeld 115 TABLE 2 OUTCOME BY EXPERIMENTER Experimenter trials Hits Effect size (h) Quant Honorton Berger Derr Varvoglis Schechter Ferrari Schlitz 106 72 53 45 43 11 15 7 38 27 18 19 11 5 9 9 36 38 34 2i 26 36 60 29 .24 .29 .20 .05. .03 .23 .79 .08 > -0 -0 n 0 < 0 a -n 0 -s X a) Rosenthal, 1986; Utts, 1986). Two or more studiesare replicates of, one another if their effect sizes are homogeneous. We assess them a) homogeneity of effect sizes across the 11 series by performing a chi- square homogeneity test comparing the effect size for each seriesg with the weighted mean effect size (Hedges, 1981; Rosenthal, 1984).!--4, o .The formula is: .P. where k is study, and the weighted mean effect size is: x2(k - 1) = E - 102, i I CO 0 the number of studies, N1 is the sample size of the ithci*) -0 ? co co TI E 0 T -CD E(T) 0 The test shows that the series effect sizes are not significantly nong Homogeneity of Outcome by Experimenter homogeneous: x2 = 16.25, 10 df, p = .093. Eight Es contributed to the autoganzfeld database. (See Table 2.) All eight experimenters have positive effect sizes. A chi-square ho- mogeneity test, using the mean effect sizes for each E weighted by sample size, indicates that the results are homogeneous across ex- perimenters: X2 = 7.13, 7 df, p = -- .415. P-1?000?000?00t168/00-96dCIU-VI3 81./170/?00Z eseeiati Jod peAwddv I 16 The Journal of Pa ropAyrhohn,ry TA 11 I 3 GANzFEt.o SUCCESS IN RELATIUN to r's1 II NI 11E1I. or SESSIUNS No. of sessions as receiver 1 2 3 4+ N subjects 183 23 24 11 N trials 183 ?11; 72 5.1 Hits 53 19 31 19 % Hits 29 41 43 35 Effect size (h) .09 .34 .38 .22 Subject-Based Analysis Seventy-six percent of the participants (N = 183) contributed a single session as R. Fifty-eight Rs contributed multiple sessions. Par- ticipants with multiple sessions either had direct hits or strongly suggestive target mentation correspondences in their first session. ? (See Table 3.) Success rate by subjects. To test the consistency of ganzfeld perfor- mance across participants, we use the standardized ratings of the target and decoys (Stanford's z scores; Stanford Sc Sargent, 1983) as the dependent variable. Stanford zs are averaged for participants with multiple sessions. Direct hits and Stanford zs are highly mere- fated. In this database, N (353) is .776. The mean Stanford z for the 241 participants is .21 (SD = 1.04), and t (240) = 3.22 (p = .00073). The 95% CI is a Stanford z from .08 to .35. The effect size (Cohen's d; Cohen, 1977) is .21. (The effect size for subjects is nearly identical to the trial-based effect size, h = .20.) Thus, there is a general ten- dency for participants to give higher ratings to the actual target than to the decoys,_and the significance_a_these?experiments is?not attributable to exceptional performance by a few outstanding sub- jects. Dynamic Versus Static Targets The success rate for dynamic targets is highly significant. There are 190 dynamic target sessions and 77 direct hits (40%, Ii= .32; exact binomial p = 1.9 x 10-6, z = 4.62). The hit rate for static targets is not significant (165 trials, 45 hits, 27%, It = .05, p = .276, z = .59). Using the series effect size as the outcome variable and target type as the predictor variable, the point-biserial correlation (re) between ganzfeld performance and target type is .663, t (17) = Psi Communication in the Ganzfeld 117 TA int: '1 SENDER/RECEI VER PAIRING Sender as: Lab Lab friend Friend N trials N hits % Hits Effect size (11) 1,10 46 33 .18 2.01 .023 66 24 36 .24 1.93 .026 145 52 36 .24 2.83 .0023 0 CD 0- 11 0 07 3.65, p = .002.' The 95% CI for dynamic targets is a hit rate fromN) 34% to 47%. The CI for static targets is from 21% to 34%. Thus,g our hypothesis concerning the superiority of dynamic targets istg strongly supported. SimderIReceiver Pairing CO 0 Receivers are more successful with friends than with laboratoryi3 senders, although the difference is not statistically significant. TheD number of sessions in this analysis is 351 because four subjects:Cr! opted to have no sender. The best performance occurs with friendcg senders. Sessions with laboratory senders, although significant, haveF.1) the lowest success rate. (See Table 4.) so Using series effect sizes as the unit of analysis and sender typex as the predictor variable (combining lab friend and friends), r1, isg .363, t 61 0635 7 ..The....- friends is a hit rate from 33.3% to 47%. For lab senders, the CI is g from 18.3% to 41.8%. Thus, although the effect of sender type is not statistically significant, there is a trend toward better resuftS with g friends. 'Separate effect sizes were obtained for the dynamic and static target sessions of each series. Since Series 302 used dynamic targets only, the analysis is based on 11 dynamic target effect sizes and 8 static target effect sizes; two static target series (105 and 201) had extremely small sample sizes (2 and 3 sessions, respectively). A similar procedure is used in the analyses of sender/receiver pairing and experienced versus novice subjects. 'Three series involving laboratory senders were eliminated from this analysis be- cause of extremely small sample sizes. These include Series 2 (a = 2), Series 105 (a = 2), and Series 201 (n = 1). Thus, the point biserial correlation is based on 11 series with friends and 8 series with laboratory senders. 118 The journal of Parapsychology Ganzfeld Experience -rwo hundred and eighteen participants had their first experi- ence as ganzfeld receivers in the autoganzfeld series. (This includes . the 5 Novice Series 101-105 and 12 novices in Series 1.) For all but 24 (11%), their initial autoganzfeld session provided their first ex- _0> perience as participant in any parapsychological research. Of the 13 218 novices, 71(32.5%, h = .17) correctly identified their target (ex- O act.binomial p = .0073, z = 2.44). ? Participants with some ganzfeld experience contributed 137 -na trials and 51 hits (37%, h = .26, p = .001, z = 3.09). When series ? effect sizes are used as the unit. of analysis and prior ganzfeld ex- xf? perience is used as the predictor variable, i, is .078, 1 (10) , 9.25, p = .41. The 95% CI for novices is a hit rate from 25.5% to 49.5%. Pn) The CI for experienced participants is from 29% to 50%. ati 0 Participation by PRL Laboratmy Staff o For completeness, we report the contribution of laboratory staff ?% as subjects in this database. PRL staff members contributed 12 ses- sions as R. These sessions yield 3 hits (exact binomial p = .50; h = O .00). O White Noise and Ganzfeld Illumination Levels co to) 6 The mean white noise level (in arbitrary units of-0-7.5) is 2.97 o (SD = 1.77). As measured from the headphones, the mean noise at level is approximately 68 dB. The mean light intensity (arbitrary co ? units of 0-100) is 73.8 (SD = 26.1). Preferred noise and light in- tensity levels are highly correlated: r = .569, 1 (353) = 12.99. Neither noise nor light intensity is significantly related to ganz- feld performance. The point-biserial correlation between hits and o c.,) noise level is ?.026, 1 (353) = ? 0.18, p = .631, two tailed. For light. o ? 0 intensity, ri, is ?.040, 1 (353) = ?0.76, p = .449, two tailed. RANDOMNESS TESTS The adequacy of randomization was a major source of disagree- ment in two meta-analytic reviews of earlier psi ganzfeld research (Honorton, 1985; Hyman, 1985). In this section we document the Psi Communication in the Ga74eld 119 adequacy of our randomization procedure according to guidelines agreed on by Hyman and Honorton (1986). Global Tests of Random Number Generator Full-range frequency analysis. As described earlier, autoganzfeld targets are selected through a program call to the RNG for values within the target range (1-160). The number of experimental ses- sions (Ai = 355) is too small to assess the RNG output distribution for the full range, so we performed a large-scale control series to test the distribution of values. Twelve control samples were col- lected. These included five samples with 156,000 trials, six samples with 1,560 trials, and one sample of 1,560,000 trials. The 12 result- ing 'clii-square values were compared to a chi-square distribution with 155 df, using the Kolmogorov-Smirnov (KS) one-sample test. The KS test yields a two-tailed p = .577, indicating that the RNG used in these experiments provides a uniform distribution of values throughout the full target range.' Test of frequency distribution for Set 20. We used a single target set (Set 20) in Series 302. We repeated the frequency analysis in a 40,000-trial control sample, restricting target selection to the four target values within Set 20 (Targets 77-80). A chi-square test Of the distribution of targets within Set 20 shows that the RNG produces n uniform distribution of the target values within- the set: x?'- = 3.19, 3 df, p = .363. Tests of the Experimental RNG Usage Each autoganzfeld session required two RNG calls. An RNG call at the beginning of the session determined the target; ? another, made before the judging procedure, determined the order in which g the target and decoys were presented for judging. Distribution of targets in the experiment. -A chi-square test of the dis- tribution of' values within the target sets shows that the targets were cs selected uniformly from among the four possibilities. within each set; (a x2 with 3 df is 0.86, p = .835. Distribution of judging order. A chi-square test of the judging order indicates that the targets were uniformly distributed among the four possible judging sequences: the x2 with 3 df is 1.85, p = .604. "(hie or the preview pack elements for Set 6, containing Targets 21-24, was damaged. This required filtering the RNG calls in the experiment and control tests to bypass the 6maged portion of the videotape, leaving the targets in Pool 6 unused. Thus, for the-full-range analyses reported here, there are 155 df rather than 159. 120 The journal of Paraksychology Summary The randomness tests demonstrate that the RNG used for target selection in these experiments provides an adequate source of ran- dom numbers and was functioning properly during the experi- ments. 0 ? EXAMPLES OF TARGET-MENTATION CORRESPONDENCES a 9, In this section, we present some examples of correspondences xbetween targets and ganzfeld mentation. Although conclusions can- be drawn from qualitative data, this material should not be ig- nored. It constitutes the raw data on which the objective statistical co mevidence is based, and may provide important insights concerning ghe underlying process. These examples are excerpts from sessions Df subjects' ganzfeld mentation reports, identified by them during ahe blind judging procedure as providing their basis for rating the arget. co ? ffarget 90, Static: DaliS "Christ Crucified." Feries I. Participant II): 77. 1?anh = 1. z score = /.67. 0 "...I think of guides, like spirit guides, leading me and I come into like 1:I co a court with a king. It's quiet.... It's like heaven. The king is something 0, like Jesus. Woman. Now I'm just sort of summersaulting through 6 o heaven.... Brooding.... Aztecs, the Sun God.... High priest.... -.4 Fear.... Graves. Woman. Prayer.... Funeral.... Dark. Death.... co co Souls.... Ten Commandments. Moses ...." X Earget 77, Dynamic: Tidal wave engulfing ancient city. From "Thr Clash the rttans, a .filla-TaSe-d an Greek Myikalagy. /I huge tidal wave cra,vhes to the shore. The scene shifts to a center courtyard of an ancient Greek it? y; there is a statue in the center, and buildings with Greek columns around e periphery. People are running to escape consumption by the tidal wave. fater rushes through the buildings, destroying the columns and the statue; ftople scurry through a stone tunnel, just ahead of the engulfing water; debris floats through the water. Series: I. Participant ID: 87. Rank = 1. z score = 1.42. " ...The city of Bath comes to mind. The Romans. The reconstruction of the baths through archaeology. The Parthenon. Also getting sort of buildings like Stonehenge but sort of a cross between Stonehenge and the Parthenon. The Byzantine Empire. The Gates of Thunder. The? Psi Communication in the Ganz. fe 121 Holy See. Tables floating about.... The number 7 very clearly. That just popped out of nowhere. It reminds me a bit of one of the first Clash albums, however. The Clash, "Two Sevens" I think it was called, I'm not sure...." [The target was number 77.] Series 302. Participant ID: 267. Rank = I. z score = 2.00. "...A big storm over New York City. I'm assuming it's New York City. No, it's San Francisco.... A big storm and danger. It looks so beautiful but I'm getting the sense of danger from it.... It's a storm. An earth- quake...." Target 63, Dynamic: Horses. From the film, "The Lathe of Heaven." An overhead view of five horses galloping in a snow storm. The camera zooms in on the horses as they gallop through the snow. The scene shifts to a close- up of a single horse trotting in a grassy meadow, first at normal speed, then in slow-motion. The scene shifts again; the same horse trotting slowly through empty city streets. Series: 101. Participant ID: 92. Rank = 1. z score --- 1.25. "...I keep going to the mountains.... It's snowing.... Moving' again, this lime to the left, spinning to the left Spinning. like on a carousel, horses. I horses on a carousel, a circus " Target 46, Dynamic: Collapsing Bridge. Newsreel footage of the collapse of a bridge the 1940s. The bridge is swaying back and forth and up and down. Light posts are swaying. The bridge collapses from the center into the water. Series: 101. Participant ID: 135. Rank = I. z score = 1.94. " ...Something, some vertical object bending or swaying, almost some- thing swarillg_iirt the wind.... Some thin vertical abject,-13e-nditag--to-t-he left Some kind of ladder-like structure but it seems to be almost blowing in the wind. Almost like a ladder-like bridge over some kind of chasm that's waving in the wind. This is .not vertical this is horizon- tal.... A bridge, a drawbridge over something. It's like one of those old English type bridges that opens up from either side. The middle part comes up. I see it opening. It's opening. There was a flash of an old English stone bridge but then back to this one that's opening. The bridge is lifting, both sides now. Now both sides are straight up. Now it's closing again. It's closing, it's coming down, it's closed. Arc, images of arcs, arcs, bridges. Passageways, many arcs. Bridges with many arcs Target 137, Static: "Working on a Watermelon Farm." This painting shows a black man ? with his back to the picture; his suspenders form a V-shape 122 The Journal of Parapsychology around his shoulders. A dog is in front of the man; there are watermelons between the dog and the man. The man faces a dirt path with watermelon patches on either side. on, the left side, another man pushes a wheelbarrow filled with huge watermelons. Series: 101. Participant ID: 105. Rank = 2. z score =-- 0.98. "...a small lamb, very soft, outside. Small, playful.... I see a -0 shape.... An apple.... I see a kitchen towel with a picture on it. Apple seeds or a fruit cut in half showing the seeds. A tomato or an apple. 2 The fruit was red on the Outside.... I thought of watermelon as in a a watermelon basket. Thinking of kids playing on a beach. Little kids m playing with balls that are bigger than they are and buckets that are 9, three-quarters their size.... I had a thought of going through a tunnel, not the kind of tunnel you see on Earth but the type of tunnel described 2. when someone dies." coTarget 64, Dynamic: 1920s Car Sinking. 1' the film "Ghost Stoiy."I he CD Kjcene depicts the murder of a young blonde woman by three young men in ghe 1920s. The men are all wearing suits; one of the men is wearing a gedora hat that is turned up in the back. The men push an old car into a .lake. The camera shifts between close-ups of their facial expressions, and the car, as it slowly sinks into the water. Thr woman's face and hand appear in ? ? the car's large rectangular rear window; she silently screams out for help. ......C2The car disappears beneath the water as the sequence ends. i'Series: 102. Participant ID: 154. Rank. = I. z score = 1.45. " ... Girl with a haircut.... Blond hair.... A car.... The back or sonic- one's head.... Someone running to the right.... Someone on the right 6 in a brown suit.., and a fedora hat turned up very much in the 0 back.... Fedora, trench coat, dark tie.... A tire of a car. The car's going co to the left. An old movie.... I'm picturing an Edward G. Robinson to xj movie.... Big roundish car like 1940's. Those scenes from the back win- o 0 dow. Bumping once in a while up and down looking through the back window you could see that it was probably a big screen in back of the 0 car and the car's standing still actually.... I think it's a movie I saw. 0 0 They're being shot at and shooting at the window and then the girl gets 0 shot.... Girl with the blonde haircut.... Someone walking in a suit, 0 0 brown suit.... It's the 1940's again, 30's maybe. Except it looks like it's in. color. Something red, blood ...blood on someone's lap.... A dead person all of a sudden.... A big mouth opened. Yelling, but no sound.... Two people running near a train.... Dressed in 1920 type suits with balloony pants, like knickers A big, old-fashioned white car with a flat top. 1920's, " ? Target 107, Static: Stained-Glass Madonna with Child. This is a stained- glass window depicting the Virgin Mary and Christ child. Psi Communication in the Ganzfeld 123 Series: 102. Participant ID: 183. Rank = 2. z score = 0.61. "Sonic kind of a house, structure.... Some kind or wall or building. Something with the sky in the background. Thinking of a bell. A bell structure. Something with a hole with the light coming through the hole.... Like a stained glass window like you see in churches." Target 19, Static: Flying Eagle. An eagle with outstretched wings is about to land on a perch; its claws are extended. The eagle's head is white and its wings and body are black. Series: 104. Participant ID: 316. Rank = 1. z score = 2.00. " ... A black bird. I see a dark shape of a black bird with a very pointed beak with his wings down.... Almost needle-like beak.... Something that would fly or is flying... like a big parrot with long feathers on a perch. Lots of feathers, tail feathers, long, long, long.... Flying, a big huge, huge eagle. The wings of an eagle spread out.... The head of an eagle. White head and dark feathers.... The bottom of a bird...." Target 144, Dynamic: Hell. From the film "Altered States." This sequence depicts a psychedelic experience. Evetything is tinted red. The rapidly shifting scenes include: A man screaming; many people in the midst of fire and smoke: a man. screaming in an isolation tank; people in agony; a large sun with a corona around it; a mass crucifixion; people jumping off a precipice, in the midst of fire, smoke, and molten lava; spiraling crucifixes. There is a close-up of a lizard's head, slowly opening its mouth, at the end of the se- quence. Series: 104. Participant ID: 321. Rank = 1. z score = 1.49. " ...1 just see a big `X'. A big I see a tunnel in front of me. It's like a tunnel of-smog or a tunnel of smoke. I'm going down rm going down it at a pretty fast speed.... I still see the color red, red, red, red, red, red, red, red.... Ah, suddenly the sun.... The kind of cartoon sun you see when you can see each pointy spike around the sphere... . I stepped on a piece of glass and there's a bit of blood coming out of My foot.... A lizard, with a big, big, big head...." Target 148, Static. Three U77U,S7tat Planes. Three small aircraft flying in formation. The planes are white and have swept-back wings; their landing- gear is extended. A winding road is visible below. Series: 104. Participant ID: 322. Rank = 2. z score = 0.39. " ... A .jet plane.... A 747 on the way to Greece. Blue.skies. Sounds like it's going Aigher....1 think I'm back on the plane again. I never used to be afraid of flying until recently.... They need better insulated jets, soundproof' like these ? rooms. They could use these comfortable seats, too. And the leg room. The service isn't bad either....Still can't get the I 21 77le journal of Parapsychology feeling of being in an airplane out of my mind. Flying over Greenland and Iceland when I went to England.... Feels like we're going higher and higher.... Descending. It seems we're descending.... Big airplanes flying over with people like me -staring down.... Flying around in a piece of tin.... Feel like I'm getting a G-force. Maybe I am taking off. Sure feels like it. Feels like we're going straight up.... I always feel like when I'm on the plane going home, I just hope that plane makes it past 0 the Rocky Mountains " a Target 10, Static: Santa and Coke. This is a Coca-Cola Christmas ad from the 1950s, showing Santa Claus holding a Coke bottle in his left hand; three buttons are visible on Santa's suit. Behind Santa and to his left, is a large (DX bottle cap with the Coca-Cola logo leaning against an ornamented Christmas tree: Series: 104. Participant ID: 332. Rank = I. z score = 1.11. CD 0 " . There's a man with a dark beard and he's got a sharp face.... cc) There's another man with a beard. Now there's green and white and cc) he's in bushes and he's sort of colonial. He looks like Robin Hood and he's wearing a hat.... I can see him from behind. I can see his hat and he has a sack over his shoulder.... Window ledge is looking clown and 0 there's a billboard that says 'Coca-Cola' on it.... There's a snowman again and it's got a carrot for a nose and three black buttons coming down the front.... There's a white beard again. There's a man with a 0 white beard.... There's an old man with a beard...." to 0" Target 70, Dynamic: Dancing in NY City Streets. From the film "The Wiz." 65 The span of yellovq)aved bridge over a body of water and automobile traffic 01 is visible in the opening scene; the New York City skyline is in the back- ground. A hot-air balloon flies overhead. The scene shifts as Dorothy (Diana 0 Ross), her dog Toto, the Lion, Tin Man, and Scarecrow dance along the 04 bridge; one _of the_bridge's. supporting Archesisi2ehind?thein?The_Chrys1er 0 0 0 CA) 0 0 0 Budding is in the background. At the end of the sequence, the characters dance in front of a painted backdrop of an old-fashioned building. Series: 105. Participant ID: 336. Rank = I. z score = 1.40. "Big colorful hot air balloons._ White brick wall.... Ocean.... People walking before my eyes. Several people.... A dog. Hot air balloon.... .a nightclub singer.... Back of a woman's head, short curly hair.... Water.... Balloon, big balloon.... Yellow.... Very tall building. Look- ing down at a city. Leaving a city, going up.... Faces. An arc.... Water.... A woman's face.... Cars, freeway.... A rock-n-roll star chanting.... Architecture. A jester's geometrical figures, designs. ...Yellow chocolate bar. Water. Going down into water, deep down.... Man with long golden hair and sun glasses .... The Bay, San Francisco l'si Communication in the Canzjeld 125 Bay. A lion.... Highways Lion, see a lion.... Tornado.... Bal- loon.... Face mask.... City.... Leaning Tower of Pisa Long hall- way, doorway.... Long road. Long, long desert road...." Target 22, Dynamic: Spiders. From the documental), "Life on Earth." A spider is weaving its web. The spider's long legs spring up and down re- peatedly, weaving strands of the web. The body of the spider is constantly in motion, and bounces up and down. A close-up shows one of the. veins of the web being stretched out by the spider. Various views of the web. ? Series: 301. Participant ID: 146. Rank = 2. z score = 0.65. "... Now visual patterns more like a spider web and the color. And then like the form of the veins of a windmill Something like a spider web again. A spider Web. A pattern that instead of a spider web it looks like basket weaving.... An image of the way sonic children were able to do something like flying when I was a child though I never had one. It was a?forgotten what it was called?a pogo stick or ajump stick, something in which you jumped up and down and you could hop quite a distance by doing so.... I have kinesthetic images all over as in vigorous motion expressed in flying or jumping on this sort of spring stick that I men- tioned.... Vigorous motion. It's as though I were trying to combine re- laxation with participating in an image of something very vigoroics.... I really feel carried away by these images of vigorous activity without being able to localize this activity .as to .what Target 108, Static: Two fire eaters. A young fire eater, in the foreground, facing to the right of the picture, blows a huge flame out of his mouth. In the background there is another .fire eater. A group of people are watching on the left side of the picture. Series: 301. Participant ID: 146. Rank = I. z score = 1.71. "... I keep having images of flames now and then.. :. The sound re- minds me oFflames too.... I aria flames again.... In these new images the fire takes on a very menacing meaning.... Rather mountainous sticking up of bare rocks just as though they had come from a recently formed volcano. Volcanos of course get back to the fire, extreme heat. I had an image of a volcano with molten lava inside the crater. Molten lava running down the side of the volcano.... Cold. Written out there behind the visual field and thinking how it contrasts with my images of flames. Although my images of flames didn't actually include much real. feeling of heat. I didn't have any imagery of heat in connection with. the flames. just abstract thought of flames.... Now I think of the water as a way of putting out flames. Suddenly, I was biting my lip. Biting my lip as though' lips had something to do with the imagery and I see lips out in front .of me.... And the lips I see are bright red, reminding me of the flame imagery earlier. And then a bright heart such as Valentine's 0 CD a.0 c7 CD co 0 co cr) ???1 CO CD0 0 C4 0" 0 0 C4 0 0 0 126 The journal of Parapsychology candy in the shape of a heart. The cinnamon flavored candies that I remember as a child having at Valentine's. Red color....This red as in the cinnamon candy is a deep very intense red. And similarly for the flames. And now I sec (11c word 'red'...." Target 94, Dynamic: Hang Gliders. The sequence shows a skier on a? V- .sha per! hang glider. The .skier Amin high up above snow covered mountoim and a pine forest. At the end, the skier lands on a mountain slope and skis away. The sequence is accompanied by Pachelbel's Canon. Series: 301. Participant ID: 188. Rank = I. z score = 1.26. Some kind of 'V' shape, like an open book.... I get some moun- tain.... Some kind of bird with a long wing.... The shape of an upside down 'V'....Ski, something about skiing came to me.... Some kind of a body like an oval shape of a body with wings on top of it in a shape. Another 'V' like a wing shape....Something with wings.... Again the shape of an umbrella came into my mind. A butterfly shape...." Target .80, Dynamic: Bugs Bunny in Space. In this cartoon, there is a close- up of the lower part of a cigar-shaped rockets/tip and the supports holding it up. The rocket assembly slides over to the launching pad, directly above Bugs Bunny's underground patch. The scene shifts to the underground patch, as Bugs Bunny climbs up the ladder leading out of his patch. Un- knowingly, he climbs up through the interior of the rockets/tip. The rocket's supports pull away and then it takes off into space. The rocket's nose .cone spins as Bugs Bunny appears through the top and he sees the Earth recede rapidly in the distance. As the sequence ends, Bugs Bunny is hit in the belly by a comet. Series: 302. Participant ID: 292. Rank = I. z score = 1.48. "... Space craft....The solar system. The underside of a helicopter or a submarine or some kind of fish that you're seeing from under- neath....Sort of being underneath it. Sort of being underneath A very strange image like a cartoon character, animated character. With his mouth open kind of.... Like a hypodermic needle or a candle or this shaft like thing with the a pointed top again.... missiles flying.... An aerial perspective.... I'm just kind of editing here I think. I'm really hoping all this rocketship kind of imagery isn't because of the noise. I feel like I'm in a rocketship or something....That image of the ship going into the belly of the mother ship...." COMPARISON OF STUDY OUTCOMES WITH GANZFEI.D MEI-A-ANALYSIS In this section, we compare the automated ganzfeld study out- comes with the results of earlier ganzfeld studies, summarized iii 't l'si Communication in the Ganzfeld 127 'FA 14 LE 5 COM PA RISON OF OVERALL PERFORMANCE IN AUTOMATED GA NZFELD AND M ErA-A N A LYSIS DATA SErS Outcome variable z scores Effect sizes (h) Database Meta-analysis Autoganzfeld Meta-analysis Autoganzfekl studies Mean SD 28 1.25 1.57 11 1.10 1.14 28 .28 .46 11 .29 .29 di 11 0.33 25 .748 0.14 28 .892 Note. The p values are two-tailed. meta-analysis (1-lonorton, 1985). We four dimensions: (I) overall success targets, (3) sender/receiver pairing, enced subjects. Overall Success Rate compare the two databases on rate, (2) dynamic versus static and (4) novice versus experi- To assess the consistency of results, we compare the 11 auto- ganzfelcl series to the 28 studies in a meta-analysis of earlier ganz- feld studies (Honorton, 1985, Table Al, p. 84), using direct hits as the dependent variable. The outcomes of the two data sets are con- sistent. Both display a predominance of positive outcomes: 23 of the 28 studies in the meta-analysis (82%) and 10 of the 11 autoganzfeld series (91%) yield positive z scores. The mean autoganzfeld z scores and effect sizes are very similar to those in the meta-analysis. (See Table 5.) Combined Estimates of Ganzfeld Success Rate Because the z scores and effect sizes for the automated ganzfeld are consistent with the original set of 28 studies in the meta-analysis, a better estimate of their true population values may be obtained by combining them. Positive outcomes were obtained in 33 of the 39 studies (85%); the 95% CI is from 69% to 99%. Table 6 shows a stem-and-leaf frequency plot of the z scores (Tukey, 1977). Unlike other methods of displaying frequency distributions, the stem-and- leaf plot retains ,the numerical data precisely. (Turned on its side, the stem-and4eaf plot becomes a conventional histogram.) Each number includes a stem and one or more leaves. For example, the stem 1 is followed by leaves of 6,6,6,7,7,7, representing z scores of 1.6,1.6,1.6,1.7,1.7,1.7. In the display, the letter "H" identifies the 128 The Journal of Parapsychology TABLE 6 Dis-ritiBuTioN 01: 2: Scotus Stern Leaf Minimum z = -1.97 -1. 97 Lower hinge - 0.25 0 -0. 85 Median z = 0.92 -0. 33 Mean z = 1.28 0- 0. H 222224 Upper hinge = 2.08 11 0 0. 1. M 6667777999 666777 Maximum z = SD = 4.02 1.44 2. H 011 Skewness (g,) = 0.05 2. 8 Kurtosis (g.,) = -0.37 3. 01124 Combined (Stouffer) z = 7.53 3. 9 4. 0 upper and lower hinges of the distribution, and "M" identifies its median. The z's range from - 1.97 to 4.02 (mean z 1.21, SD = 1.45), and the 95% CI is a z from .76 to 1.66. 0 The combined z for the 39 studies is 7.53 (p = 9 x 5, Rosenthal's (1984) file-drawer statistic indicates that 778 additional studies with z scores averaging zero would be required to reduce the -0 significance of the combined ganzfeld database to nonsignificance; c.o ? that is a ratio of 19 unknown studies for every known study. ? A stem-and-leaf display of the effect sizes is shown in Table 7. -4 The effect sizes range from -.93 to 1.11 (mean It = ..28, SD = .1 I). co co The two most extreme values on both sides of the distribution are (T) outliers. The 95% CI is an h between .15 and .4 1; the equivalent hit G.) rate is from 31.5% to 44.5%. 0? Dynamic Versus Static Targets The use of video sequences as targets is a novel feature of the ^ autoganzfeld database. However, a comparable difference in target type exists in the earlier ganzfeld studies. Of the 28 direct hits stud- ies in the meta-analysis, 9 studies (by three independent investiga- tors) used View Master stereoscopic slide reels as targets (Honorton, 1985, Studies 7-8, 16-19, 21, 38-39). Static targets (single pictures or slides) were used in the remaining 19 studies by seven independent investigators (Studies 1, 2, 4, 10-13, 23-31, 33- 34, 41-42). Like the autoganzfeld video sequences, View Master tar- gets present a variety of images reinforcing a central target theme. Psi Communication in the Ganzfeld 129 TABLE 7 DISTRIBUTION OF EFFECT SIZES (COHEN'S h) Stem Leaf - .9 -.4 3 0 OUTSIDE VALUES ? Minimum h -0 -0 0 CD - .9 3D- -.3 Lower hinge 0.10x -.1 0 Median h -.0 51 Mean h 0.2P c:0 .0 7779 Upper hinge .1 H 002888 Maximum h 1.4t .2 M 1334 SI) 0.41(D .3 11144777 Skewness (g1) 0.2t .4 H 01113 Kurtosis (g2) 2.490c4 .5 7 .7 3 .8 17 co OUTSIDE VALUES 1.3 3 0 1.4 4 To compare the relative impact of dynamic and static targets irk12 the autoganzfeld and meta-analysis, we obtained point-biserial cor- relations for each data set using target type (static or dynamic) asF.,:si the predictor variable and the series effect size, Cohen's It, as ditto outcome variable. We test the difference between the two correla- tions using Cohen's q (Cohen, 1977). Dynamic targets yield signifi-Ei cantly-larger-effect sizes-in- both- data 'sets. POT the Ineta-atiarys-ts, -r1,8 is .409, / (26) = 2.28, p = .015; and for the autoganzfeld, as re-o ported above, ri, is .663. The two correlations are not significandi different (q = .36; z = 1.14). Therefore, we combine the-two data sets to obtain a better estimate of the relationship between effect size-. and target type: r, = .439, t (45) = 3.28, p = .002. The 95% CIs4 are 24% to 36% for static targets and 38% to 55% for dynamic tar- gets. Thus, the cumulative evidence strongly indicates that dynamic targets are more accurately retrieved than static targets. Sender/Receiver Pairing A similar analysis compares the effects of sender/receiver pairing in the two databases. Studies in the meta-analysis did not routinely irmocoancoou69Loo-96dau-vi3 : 914170/C00Z aseeieu JOd 130A0iddV 130 The Journal of Parapsychology provide detailed breakdowns regarding sender/receiver pairing. Sender/receiver pairing in the meta-analysis can only be coded ac- cording to whether subjects could bring friends to serve as their sender oi we, e rem ric ted to Um, alory scii(ici s. III 17 siticlics, hy six independent investigators, subjects were free to bring friends (Honorton, 1985, Studies 1-2, 4, 7-8, 1(5, 23-28, 30, 33-34, 38- 39). Laboratory-assigned senders were used exclusively in the re- maining 8 studies, by four independent investigators (Studies 10? .12, 18-19, 21, 29, 41). (Three studies using clairvoyance proce- dures and no senders are excluded from this analysis.) For the au- toganzfeld studies, we calculated separate effect sizes for each series by sender type (combining lab friend and friend for comparability With the meta-analysis). In the meta-analysis, ri, (23) is .403; larger effect sizes occurred in studies where friends could serve as sender (t = 2.11, p = .023). For the autoganzfeld, as reported above, rp is .363, in the same direction. The two correlations are very similar (q = .05; z = 0.14) and are combined to give a better estimate of the relationship between sender/receiver pairing and ganzfeld study outcome: r1, = .38,1 (12) = 2.66, p = .0055. The 95% Cis are 20% to 34% for unacquainted sender/receiver pairs and 31.1% to 19.2% for friends. Thus, the sender/receiver relationship does have a sig- nificant impact on performance. Effect of Prior Ganzfeld Experience The meta-analysis includes 14 studies, by nine independent in- vestigators, in which novices are used exclusively (Honorton, 1985, Studies 2, 4, 8, 10-12, 16-18, 23-24, 31, 41-42). Experienced or mixed samples of novice and experienced subjects are used in the remaining 14 studies, by four different investigators (Studies 1, 7, 19, 21, 25-30, 33-34, 38-39). Studies using experienced subjects were more successful than those limited to novices; the point-biserial correlation between level of experience and effect size is .229, t (26) = 1.20, p = .12. For the autoganzfeld studies, as reported above, rp is .078. The two correlations do not differ significantly (q = .155; z = 0.40), and the combined rp is .194, t (38) = 1.22, p = .105. The respective 95% CIs are 24.5% to 44.5% for novices and 35.5% to 48% for experienced subjects. The 95% Cls for these comparative analyses are shown graphi- cally in Figure 2. The bottom two rows are Cis for the overall hit rates in the meta-analysis and autoganzfeld, respectively. The next Data set and condition Psi Communication in the Ganzfeld 131 A u 0:Ex 1,,,r - Niela:Exper Au lo:Nov ice - Meta:Novice .- A u lo:SIZ=Fr N{ etn:S12=Fr Au Lo:S12=La b la:Slt=La Au lo:IGT=S La Mcla:TGT=Sla Aulo:TGT=Dyn -1.0 -0.8 -0.6 -0.4 -0.2 0.0 02 0.4 0.6 0.8 Effect size (h) 81./170/C00Z eseelet1 0 1.0 ? 0 to Figure 2. Comparison of autoganzfeld and meta-analysis 95% confidence?, limits. Abbreviations are defined as follows: Meta = meta-analysis studies, Auto = automated ganzfeld studies, Dyn = dynamic targets, Sta = static-4 targets, Lab = laboratory senders, Fr = sender is friend or acquaintance of receiver, Novice = no prior ganzfeld experience, Exper = prior feld experience. 0 0 two rows give the Cis for dynamic targets in the two data sets, and SO On. DISCUSSION We now consider various rival hypotheses that might account for the experimental outcomes, and the degree to which the automated ganzfeld experiments, viewed in conjunction with the earlier psi 132 The journal of Parapsychology ganzfeld studies, constitute evidence for psi communication. Finally, we consider directions for future research suggested by these find- ings. 13 Rival Hypotheses 0 Sensory Cues. Only Sc knows the identity of the target until R a finishes the automated judging procedure. If Se is not a PRL staff 071 member, a staff member not otherwise involved in the session su- pervises target selection. In either case, the target selector knows a) only which videocassette contains the target. The target selector leaves the monitoring room with the remaining three target tapes (ID) after knocking three times on the monitoring room door, signalling E to return. Since the target selector only knows the videocassette o number, variations in knocking cannot communicate any useful in- c.., a formation to E. The cardboard cover over the VCR eliminates any it visual cues to E regarding the position of the videotape or the activ- _. co ity of the VU meters (which are active when the target is dynamic and has a soundtrack). 0 Sensory transmission from Se to R during the ganzfeld session is ? eliminated by having R and Se in separate, sound-attenuated rooms. O If either participant leaves their room before R's ratings have been ? registered in the computer, the session is unconditionally aborted. 6 The videotape target display system prevents potential handling 0 cues during the judging procedure. Computer registration of R's g target ratings and automated feedback after the session prevents the ? possibility of cheating by Se during feedback, raised by Hyman o (1985). ? ?After-about-8G% of the--sessions-were-completed, it was becoming_ .0 clear that our hypothesis concerning the superiority of dynamic tar- o (.4 gets over static targets was receiving substantial confirmation. Be- cause dynamic targets contain auditory as well as visual information, ?% we conducted a supplementary test to assess the possibility of audi- tory leakage from the VCR soundtrack to R. With the VCR audio set to normal amplification, no auditory signal could be detected through R's headphones, with or without white noise. When an ex- ternal amplifier was added between the VCR and R's headphones and with the white noise turned completely off, the soundtrack could sometimes be faintly detected. It is unlikely that subjects could have detected any target audio signal with the normal VCR ampli- fication and white noise; as we have reported, there is no correlation between ganzfeld success rate and white noise level in these exper- Psi Communication in the Ganz/'hl 1 33 iments. Nevertheless, to totally exclude any possibility of subliminal cueing, we modified the equipment. Additional testing confirmed that this modification effectively eliminated all leakage. This was formally confirmed by an audio spectrum analysis, covering the fre- quency domain between 475 Hz and 15.2 kHz. The critical question, of course, is whether performance on dynamic targets diminished after this modification. The answer is no; in fact, performance im- proved. Before the modification, the direct hit rate on dynamic tar- gets was 38% (150 trials, 57 hits, h = .28, exact binomial p = .00029, z = 3.44); the 95% CI was from 31% to 45%. Following the modification, the direct hit rate was 50% (40 trials, 20 hits, h = .52, exact binomial p = .00057, z = 3.25) with a 95% CI from 37% to 63%. The direct hit rate for all targets?static and dynamic?after the modification was 44% (64 trials, 28 hits, It = .39, exact binomial = .00082, z = 3.15). Randomization. As Hyman and Honorton (1986, p. 357) have pointed out, "Because ganzfeld experiments involve only one target selection per session..., the ganzfeld investigator can restrict his or her attention to a frequency analysis allowing assessment of the de- gree to which targets occur with equal probability." We have 'docu- mented both the general adequacy of the RNG used for target se- lection and its proper functioning during the experiment. Data selection. Except for two pilot studies, the number of partic- ipants and trials were specified in advance for each series. The pilot or formal status of each series was similarly specified in advance and recorded on disk before beginning the series. We have reported all trials, including pilot and ongoing series, using the automated ganz-. Feld system. Thus, there is no "file-drawer" problem in this data- Psi ganzfeld success rate is similar for pilot and formal sessions. The proportion of hits for the 66 pilot sessions is .32 (h = .16, p .129, z -= 1.13). For the 289 formal sessions, the proportion correct is .35 (h = .22, p = .0001, z = 3.71). The difference is not signifi- cant: X2 = 0. 1 1 , 1 df, p = .734. If we assume that the remaining trials in the three unfinished series would yield only chance results, these series would still be sta- tistically significant (exact binomial p = .009, z = 2.36). This would reduce the overall z for all 11 series from 3.89 to 3.61. Thus, inclu- sion of the three incomplete studies does not pose an optional stop- ping problem: '? Multiple analysis. Informal examination of recent issues of several American Psychological Association journals suggests that correction Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 P-1?000?0001.?00t168/00-96dCIU-VI3 914170/C00Z eseeiati JOd peACLIddV 134 The Journal of Parapsychology for multiple comparisons is not a common practice in more conven- tional areas of psychological inquiry. Nevertheless, half of 11yrnan's (1985) 50-page critique of earlier psi ganzfeld research focused on issues related to multiple testing. In the present case, advance spec- ification of the primary hypothesis and method of analysis prevents problems involving multiple analysis or multiple indices ill our test of the overall psi ganzfeld effect. Our direct hits analysis is actually less significant than either the sum of ranks method (z = 4.01, p = 2.7 x 10-5) or Stanford's z scores (1 = 4.53, 354 41; p = 4.1 x 1 0-6). In addition to the primary hypothesis, however, we also tested two secondary hypotheses concerning, the impact of target type and sender/receiver pairing on psi performance, and we have presented several purely exploratory analyses as well. Our Results section in- chides IS significaiRe tests involving psi perfOrmance as the depen- dent variable, and the p values cited are not adjusted for multiple comparisons. Of the 15 significance tests, 9 are associated with p < .05. The Bonferroni multiple comparisons procedure provides a conservative method of adjusting the alpha level when several si- multaneous tests of significance are performed (Holland & Copen- haver, 1988; Hyman & Honorton, 1986; Rosenthal & Rubin, 1984). When the Bonferroni adjustment is applied, six of the nine individ- ually significant outcomes remain significant; these are: the overall hit rate, the subject-based analysis using Stanford z scores, the dif- ference between dynamic and static targets, the dynamic target hit rate, and the hit rate for experienced subjects. Although the relationship between psi performance and sender type is not independently significant in the autoganzfeld, the cor- relation coefficient of .363 is close to that observed in the meta- analysis (r = .403), and the combined result is significant. The cu- initiative evidence, therefore, does support the conclusion that the sender/receiver relationship is a significant moderator of ganzfeld psi performance. Security. Given the large number of subjects and the significance of the outcome using subjects as the unit of analysis, subject decep- tion is not a plausible explanation. The automated ganzfeld protocol has been examined by several dozen parapsychologists and behav- ioral researchers from other fields, including well-known critics of parapsychology. Many have participated as subjects, senders, or ob- servers. All have expressed satisfaction with our handling of security issues and controls. In addition, two experts on the simulation of psi ability have ex- amined the autoganzfeld system and protocol. Ford,Kross has been Psi Communication in the Ganzfeld 135 a professional mentalist for over 20 years. Ile is the author of many articles in mentalist periodicals and has served as Secretary/Treas- urer of the Psychic Entertainers Association. Mr. Kross has provided us with the following statement: "In my professional capacity as a inentalist. I have reviewed Psychophysical Research Laboratories' automated ganzfeld system and found it to provide excellent secu- rity against deception by subjects" (personal communication, May, 1989). We have received similar comments from Daryl Bern, Pro- fessor of Psychology at Cornell University. Professor Bern is well known for his research in social and personality psychology. He is also a member of the Psychic Entertainers Association and has per- formed for many years as a mentalist. Ile visited PRI. for several days and was a subject in Series 101. The issue of. investigator integrity call only be conclusively ad- dressed through independent replications. It is, however, worth drawing attention to the 13 sessions in which a visiting scientist, Marilyn J. Schlitz, served as either experimenter (N = 7, 29% hits, h = .08) or sender (N = 6, 67% hits, h = .36). Altogether, these sessions yielded 6 direct hits (N = 13, 46.2% hits, h = .45). This effect size is more than twice as large as that for the database as a whole. Status of the Evidence for Psi Communication in the Ganzfeld The automated ganzfeld studies satisfy the methodological guidelines recommended by Hyman and Honorton (1986). There- sults are statistically significant. The effect size is homogeneous across 11 experimental series and eight different experimenters. Moreover, the autoganzfeld results are consistent with the outcomes of the earlier, nonautomated ganzfeld studies; the combined z .of 7.53 would be expected to arise by chance less than one time in 9 trillion. We have shown that, contrary to the assertions of certain critics (Druckman & Swets, 1988, p. 175), the ganzfeld psi effect exhibits "consistent and lawful patterns of covariation found in other areas of. inquiry." The automated ganzfeld studies display the same pat- terns of relationships between psi performance and target .type, sender/receiver acquaintance, and prior testing experience found in earlier ganzfeld studies, and the magnitude of these relationships is consistent across the two data sets. The impact of target type and sender/receiver acquaintance is also consistent with patterns in spon- taneous case studies, linking ostensible psi experiences to emotion- ally significant events and persons. These findings cannot be ex- irmocoancoou69Loo-96dau-vi3 914170/C00Z aseeieu JOd peACLIddV P-1?000?0004?00t168/00-96dCIU-VI3 81?/170/?00z aseeletliOd peAoiddv 136 The journal of Parap.sychology plained by conventional theories of coincidence (Diaconis 8c Mosteller, 1989). Hyman and Honorton (1986) have stated, ...the best way to resolve the [ganzfeld] controversy. ... is to await the outcome of future ganzfeld experiments. These experiments, ideally, will be carried out in such a way as to circumvent the file-drawer prob- lem, problems of multiple analysis, and the various dekcts in random- ization, statistical application, and documentation pointed out by Hyman. If a variety of parapsychologists and other investigators con- tinue to obtain significant results under these conditions, then the exis- tence of a genuine communications anomaly will have been demon- strated. (pp. 353-354) We have presented a series of experiments that satisfy these guidelines. Although no single investigator or laboratory can satisfy the requirement of independent replication, the automated ganzfeld studies are quite consistent with the earlier studies. On the basis of the cumulative evidence, we conclude that the ganzfeld effect rep- resents a genuine communications anomaly. This conclusion will either be strengthened or weakened by additional independent rep- lications, but there is no longer any justification for the claim made by some critics that the existing evidence does not warrant serious attention by the scientific community. Recommendations for Future Research Recent psi ganzfeld research has necessarily focused on meth- odological issues arising from the ganzfeld controversy. It is essen- tial that future studies comply with the methodological standards agreed-carc-he--rs imperative_that- serious attention be given to conditions associated with successful outcomes. Small to medium effect sizes characterize many research findings in the biomedical and social sciences (e.g., Cohen, 1977; Rosenthal, 1984). Rosenthal (1986) and Utts (1986) make a strong case for more careful consideration of the magnitude of effect in the design and analysis of future ganzfeld studies. The automated ganzfeld studies show a success rate slightly in excess of 34%. Utts's (1986) power analysis shows that for an effect of this size, the investigator has only about one chance in three of obtaining a statistically signif- icant result in a 50-trial experiment. Even with 100 trials?an unu- sually large sample size in ganzfeld research?the probability of a significant outcome is only about. .5. Psi Communication in the Ganzfeld 137 We urge ganzfeld investigators to use dynamic targets and to de- sign their studies to allow subjects to have the option to have friends or acquaintances as their senders. The similarity of the autoganzfeld and meta-analysis data sets strongly indicates that these factors are important moderators of psi ganzfeld performance. If our estimate4:; of the impact of dynamic and static targets is accurate, a 50-sessionig series using dynamic targets has approximately an 84% chance of yielding a significant outcome. A comparable series with static tar- a gets has only about one chance in five of achieving significance. -n 0 REFERENCES ALCOCK, J. E. (1986). Comments on the Hyman-Honorton ganzfeld contro- versy. Journal of Parapsychology, 50, 345-348. AKERS, C. (1984). Methodological criticisms of parapsychology. In S. Krippner r%) (Ed.), Advances in parapsychological research, Vol. 4 (pp. 112-164). Jeffer- s8 son, NC: McFarland. BERGER, R. E., & HONORTON, C. (1986). An automated psi ganzfeld testing 0 system. In D. H. Weiner & D. I. Raclin (Eds.), Research in parapsychology It 1985 (pp. 85-88). Metuchen, NJ: Scarecrow Press. CO BLACKMORE, S. (1980). The extent of selective reporting of ESP. ganzfeld 0 studies. European Journal of Parapsychology, 3, 213-219. 131,AcKmoRE, S. (1987). A report of a visit to Carl Sargent's laboratory. Journal of the Society for Psychical Research, 54, 186-198. BRAUD, W. G. (1978). Psi conducive conditions: Explorations and interpre- tations. In B. Shapin & L. Coly (Eds.), Psi and states of awareness (pp. 1? (69) 34). New York: Parapsychology Foundation, Inc. 6 BRAUD, W. G., WOOD, R., & BRAUD, L. W. (1975). Free-response GESP per- F,i) formance during an experimental hypnagogic state induced by visual S'a and acoustic ganzfeld techniques: A replication and extension. Journal of g the American Society for Psychical Research, 69, 105-113. 0 Sc-M-Thits,-1-:-13-:-(-1-g5-7).-Myers-13-riggs--Type-indicator Fortn-F.-P-alo Alto, CA: Consulting Psychologists Press, Inc. 0 ? 0 BROWNLEE, K. A. (1965). Statistical theory and methodology in science and engi- neering. New York: John Wiley & Sons, Inc. 0 0 CHILD, I. L. (1986). Comments on the ganzfeld controversy. Journal of Para- o psychology. 50, 337-3,14. COHEN, J. (1977). Statistical power analysis for the behavioral sciences (rev. ed.). New York: Academic Press. DiAcoms, P., & MOSTELLER, F. (1989). Methods for studying coincidences. Journal of the American Statistical Association, 84, 853-861. DRUCKMAN, D., Sc SWETS, J. (1988). Enhancing human performance: Issues, the- ories, and techniques. Washington, DC: National Academy Press. 1-Inatkly, T.,. k MA-I-rums, G. (1987). Cheating, psi, and the appliance of science: A' reply to Blackmore. Journal of the Society for Psychical Research, 54, 199-207. Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Volume 19, Number 12, December 1989 Plenum Press ? New York-London This issue completes Volume 19 FNDPA4 19(12) 1441-1538 (1989) ISSN 0015-9018 FOUNDATIONS OF PHYSICS An International Journal Devoted to the Conceptual Bases and Fundamental Theories of Modern Physics, Biophysics, and Cosmology ? Editor: Alwyn van der Merwe Editorial Board Asim 0. Barut Peter G. Bergmann Nikolai N. Bogolubov David Bohm Robert S. Cohen Olivier Costa de Beauregard Robert H. Dicke Hao Max Jammer Brian D. Josephson R. Bruce Lindsay Per-Olov L6wdin Henry Margenau Jagdish Mehra Andr?ercier Louis Neel Kazuhiko Nishijima James L. Park Linus Pauling Rudolph Peierls Karl R. Popper Ilya Prigogine Abdus Salem John L. Synge 'Hans-J. Treder Jean-Pierre Vigier Mikhail Vol'kenshtein Carl Friedrich von Weizsacker Eugene P. Wigner Chen-Ning Yang -ounding Editors: Henry Margenau and Wolfgang Yourgraut Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Foundations of Physics, Vol. 19, No. 12, 1989 Evidence for Consciousness-Related Anomalies in Random Physical Systems Dean I. Radin1 and Roger D. Nelson2 Received May 6, 1988; revised June 12, 1989 Speculations about the tole of consciousness in physical sys ems are frequently observed in the literature concerned with the interpretation of t uantum mechanics. While only three experimental investigations can be found on his topic in physics journals, more than 800 relevant experiments have been repor ed in the literature of parapsychology. A well-defined body of empirical evidenc4 from this domain was reviewed using meta-analytic techniques to assess method logical quality and overall effect size. Results showedl effects conforming to ch4nce expectation in control conditions and unequivocal non-chance effects in expeimental conditions. This quantitative literature review agrees with the findings of Iwo earlier reviews, suggesting the existence of some form of consciousness-related anomaly in random physical systems. 1. INTRODUCTION The nature of the relationship between human consciousness and the physical world has intrigued philosophers for millenia. In this century, speculations about mind?body interactions persist, often contributed by physicists in discussions of the measurement problem in quantum mechanics. Virtually all of the founders of quantum theory?Planck, de Broglie, Heisenberg, Schrodinger, Einstein?considered this subject ii depth," ) and contemporary physicists continue this tradition.(2-7) 'Department of Psychology, Princeton University, Princeton, New Jersey 08544. Present address: Contel Technology Center, 15000 Conference Center Drive, P.O. Box 10814, Chantilly, Virginia 22021-3808. 2 Department of Mechanical and Aerospace Engineering, Princeton UniVersity, Princeton, New Jersey 08544. 1499 0015-9018/89/1200-1499106.00,0 ? 1989 Plenum Publishing Corporation Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 1500 Radin and Nelson The following expression of the problem can be found in a recent interpretation of quantum theory: If conscious choice can decide what particular observation I measure, and there- fore into what states my consciousness splits, might not conscious choice also be able to influence the outcome of the measurement? One possible place where mind may influence matter is in quantum effects. Experiments on whether it is possible to affect the decay rates of nuclei by thinking suitable thoughts would presumably be easy to perform, and might be worth doing.") Given the distinguished history of speculations about the role of consciousness in quantum mechanics, one might expect that the physics literature would contain a sizable body of empirical data on this topic. A search, however, reveals only three studies. The first is in an article by Hall, Kim, McElroy, and Shimony, who reported an experiment "based upon taking seriously the proposal that the reduction of the wave packet is due to a mind?body interaction, in which both of the interacting systems are changed."(91 This experiment examined whether one person could detect if another person had previously observed a quantum mechanical event (gamma emission from sodium-22 atoms). The idea was based on the supposition that if person A's observation actually changes the physical state of a system, then when person B obser- ves the same system later, B's experience may be different according to whether A has or has not looked at the system. Hall et al.'s results, based on a total of 554 trials, did not support the hypothesis; the observed number of "hits" obtained in their experiment was precisely the number expected -by chance (277), while the variance of their measurements was significantly smaller than expected (p< The second study is referred to by Hall el al., who end their article by pointing out that a similar, unpublished experiment using cobalt-57 as the source was successful (40 hits out of 67 trials).(10) The third study is a more systematic investigation reported by Jahn and Dunne," who summarize results of over 25 million binary trials collected during seven years of experimentation with random-event generators. These experiments, involving long-term data collection with 33 unselected individuals, provide persuasive, replicable evidence of an anomalous correlation between conscious intention and the output of random number generators. Thus, of three pertinent experiments referenced in mainstream physics journals, one describes results statistically too close to chance expectation and two describe positive effects.('") Given the theoretical implications of such an effect, it is remarkable that no further experiments of this type can be found in the physics literature; but this is not to say that no such experiments have been performed. In fact, dozens of researchers have Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18: CIA-RDP96-00789R003100030001-4 Consciousness in Physical Systems 1501 reported conceptually identical experiments in the puzzling anti uncertain domain of parapsychology. Perhaps because of the insular nature of scientific disciplines, the vast majority of these experiments are unknown to most scientists. A few critics who have considered this literature have dismissed the experiments as being flawed, nonreplicablel or open to fraud,(12-16) but their assertions are countered by at least two detailed reviews which provide strong statistical support for aim existence of anomalous consciousness-related effects with randoti number J generators:1'" In this paper, we describe the results of a corr prehensive, quantitative meta-analysis which focused on the questions of nrthodologi- cal quality and replicability in these experiments. 2. THE EXPERIMENTS The experiments involved some form of microelectro number generator (RNG), a human observer, and a set of ins the observer to attempt to "influence" the RNG to generat lc random ructions for particular numbers, or changes in a distribution, solely by intention. RNGs are usually based upon a source of truly random events such 4s electronic noise, radioactive decay, or randomly seeded pseudorandom sequences.(19) Feedback about the distribution of random events is often provided in the l form of a digital display, but audio feedback, computer graphics, and a variety of other mechanisms have also been used. Some o the RNGs described in the literature are technically sophisticated, the best devices employing electromagnetic shielding, environmental failsafe mechanisms triggered by deviant voltages, currents, or temperature automatic computer-based data recording on magnetic media, redundar t hard copy output, periodic randomness calibrations, and so on.(18?20) i RNGs are typically designed to produce a sequence of random bits at the press of a button. After generating a sequence of say, 100 random bits (0's or l's), the number of l's in the sequence may be provided as feedback. In an experimental protocol using a binary RNG, a run mi4it consist of an observer being asked to cause the RNG to produce, in three successive button presses, a high number (sum of l's greater than chancl expectation of 50), a low number (less than 50), and a control condition si.lith no direc- tional intention. An experiment might consist of a group of individuals each contributing a hundred such runs, or one individual icontributing several thousand runs. Results are usually analyzed by cornparing high aim and low aim means against a control mean or theor tical chance expectation. Approved For Release 2003/04/18: CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 1502 Radin and Nelsoo 3. META-ANALYTIC PROCEDURES The quantitative literature review, also called meta-analysis, has become a valuable tool in the behavioral and social sciences.(21' Meta-analysis is analogous to well-established procedures used in the physical sciences to determine parameters and constants. The technique assesses replication of an effect within a body of studies by examining the distribution of effect sizes.(22-24) In the present context, the null hypothesis (no mental influence on the RNG output) specifies an expected mean effect size of zero. A homogeneous distribution of effect sizes with nonzero mean indicates replication of an effect, and the size of the deviation of the mean from its expected value estimates the magnitude of the effect. Meta-analyses assume that effects being compared are similar across different experiments, that is, that all studies seek to estimate the same pop- ulation parameters. Thus the scope of a quantitative review must be strictly delimited to ensure appropriate commonality across the different studies that are combined.(21,25) This can present a nontrivial problem in meta- analytic reviews because replication studies typically investigate a number of variables in addition to those studied in the original experiments. In the present case, because different subjects, experimental protocols, and RNGs were employed within the reviewed literature, some heterogeneity attributable to these factors was expected in the obtained distribution of effect sizes. However, the circumscription for the review required that every study in the database have the same primary goal or hypothesis, and hence estimat& the same underlying effect. Experiments selected for review examined the following hypothesis: The statistical output of an electronic RNG is correlated with observer intention in accordance with prespecified instructions, as indicated by the directional shift of distribution parameters (usually the mean) from expected values. Because this "directional shift" is most often reported as a standard normal deviate (i.e., Z score) in the reviewed experiments, we determined effect size as a Z score normalized by the square root of the sample size (N), e = Z , where N was the total number of individual random events (with probability of a hit at p = 0.5, p =0.25, etc.). This effect size measure is equivalent to a Pearson product moment correlation.(') 3.1. Unit of Analysis To avoid redundant inclusion of data in a meta-analysis, "units of analysis" are often specified. We employed the following method: If an author distinguished among several experiments reported in a single Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Consciousness in Physical Systems 1503 article with titles such as "pilot test" or "confirmatory test," or provided independent statistical summaries, each of these studies s6s coded and quality-assessed separately. If an experiment consisted ofl two or more conditions comparing different intentions or types of RNG devices, the data were split into separate units of analysis to allow the results to be coded unambiguously. In general, within a given reviewled report, the li largest possible aggregation of nonoverlapping data col ected under a single intentional aim was defined as the unit of analysis (l ereafter called an experiment or study). For each experiment, a Z score was assigned co responding to whether the observed result matched the direction of int ntion. Thus, a negative Z obtained under intention to "aim lbw" was recorded as a positive score. When sufficient data were provided in a report, Z was calculated from those data and compared with the reported results; the new calculation was used if there was a discrepancy. If oily probability levels were reported, these were transformed into the c rresponding 2 score. For experiments reported only as "nonsignificant," a conservative value of Z = 0 was assigned; if the outcome was reported ?illy as "statisti- cally significant," Z = 1.645 was assigned; and if sample size was not repor- ted or could not be calculated from the information proNfided, a special code of N = 1 was assigned. 3.2. Assessing Quality Because the hypothesized anomalous effect is not easily accom- modated within the prevailing scientific world-view, it is particularly important to assess the trustworthiness of each review d experiment. Unfortunately, estimating experimental quality tends to le a subjective task confounded by prior expectations and beliefs.(26.27) Est mates of inter- judge reliability in assessing the quality of research reports, for example, rarely exceed correlations of 0.5.(28) We addressed this problem by assigning to each experiment a single quality weight derived from a set of sixteen binary (present/absent) criteria. The first author coded and double-checked the coding for all studies; the second autho independently coded the first 100 studies. Inter-judge reliability for qual1 ty criteria was r = 0.802 with 98 degrees of freedom. These criteria were developed from published ciliticisms about random-number generator experiments 4j5.2933) and from expert opinion on important methodological considerations when perf rming studies involving human behavior.(20,34.35) Collectively, these c iteria form a measure of credibility by which to judge the reported da a. The criteria assess the integrity of the experiment in four categori s?procedures, Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 1504 Radin and Nelson statistics, the data, and the RNG device?and they cover virtually all methodological criticisms raised to date. They are (1) control tests noted, (2) local controls conducted, (3) global controls conducted, (4) controls established through the experimental protocol, (5) randomness calibrations conducted, (6)failsafe equipment employed, (7) data automatically recor- ded, (8) redundant data recording employed, (9) data double checked, (10) data permanently archived, (11) targets alternated on successive trials, (12) data selection prevented by protocol or equipment, (13) fixed run lengths specified, (14) formal experiment declared, (15) tamper-resistant RNG employed, and (16) use of unselected subjects. Each criterion was coded as being present or absent in the report of an experiment, specifically excluding consideration of previously published descriptions of RNG devices or control tests. This strategy was employed to reflect lower confidence in such experiments since, for example, random- ness tests conducted once on an RNG do not guarantee acceptable perfor- mance in the same RNG in all future experiments. As a result, assessed quality was conservative, that is, lower than the "true" quality for some experiments, especially those reported only as abstracts or conference proceedings. Using unit weights (which have been shown to be robust in such applications1361) on each of the sixteen descriptors, the quality rating for an individual experiment was simply the sum of the descriptors. Thus, while a quality score near zero indicated a low quality or poorly reported experiment, a score near sixteen reflected a highly credible experiment. 3.3. Assessing Effect Size Assume that each of K experiments produces effect size estimates e of a parameter E, based on N samples, and that each e has a known standard error s. The weighted mean effect size is calculated as e. = E co,e,lEco? where co, = 1/4 = N1, and i ranges from 1 to K. The standard error of e. is se= (E co )-112. A test for homogeneity for the K estimates of e; is given by HK=Ea),(e,?e.)2, where HK has a chi-square distribution with K-1 degrees of freedom.(") The same procedure can be followed to test for homogeneity of effect size across M independent investigators. In this case, e.; and se; are calculated per investigator, and the test for homogeneity is performed as H m=E e.,)2, where e. and cu., are mean weighted effect size and 1/se2 per investigator, respectively, e. m=E coje. ;ix cop and j ranges from 1 to M. HM has M? 1 degrees of freedom. For a quality-weighted analysis, we may determine e. Q= E (Q,cojedlE(Qico,), where Qi is the quality assessed for experiment i. The standard error associated with eQ is seQ=(E(Oodl(EQ,w,)2)-112; the test for homogeneity is similar to that described above. Finally, following Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Consciousness in Physical Systems 1505 I I the practice of reviewers in the physical sciences,(23?24) we deleted potential "outlier" studies to obtain a homogeneous distribution of effec sizes and to reduce the possibility that the calculated mean effect size m y have been spuriously enlarged by extreme values. The procedure used was as follows: If the homogeneity statistic for all studies was significant (a I the p .05). Free-response studies involving group testing. Only two FR studies involved group testing (Table 1, row 8). Both studies were contributed by the same inves- tigator. The mean weighted r is .19 (z = 1.83, p = .067, 95% CI from -.01 to .37). The results are significantly nonhomogeneous (x2i = 7.53, p 05 Notes. r is the weighted average correlation coefficient (Hedges ik ()Mu, 1985). X2 is tIllr within group homogeneity statistic (Rosenthal, 1984). Consistency across Investigators Table 3 shows the overall FR results by investiga- tor. Three of the four investigators have significant ESP/extraversion correlations and the results of the fourth investigator (Braud) approach significance. The z by investigator is 5.11, a result that should arise by chance less than one time in 3 .3 million. The results are homogeneous across investigators (x23 = 2.51, p > .05). Although 10 of the 14 FR studies were contrib- uted by one investigator (Sargent), evidence for the relationship between free-response ESP performance and extraversion is not dependent upon that investi- gator. When Sargent's work is eliminated, the results of the three remaining investigators still strongly sup- ports a relationship between ESP performance and extraversion (z = 335, p = 0.0008, two-tailed). There- fore, we conclude that the ESP/extraversion relation- ship is consistent across investigators. Extraversion Measures Each FR investigator used a different scale for measuring extraversion. Marsh used the Bemreuter Personality Inventory (Super, 1942); Sargent and his group used the Cattell 16PF (Ca ttell, Eber gr Tatsuoka, 1970); Braud and Bells ez Morris used scales con- structed by the investigators (with no psychometric validation provided). It is impossible to isolate the effects of the instruments for measuring extraversion from the ensemble of procedures and research styles associated with the investigators. All that can be said is that a relationship between extraversion and ESP performance is evident in studies using four different measures of extraversion. Selective Reporting In order to assess the vulnerability of these studies to selective reporting, we used Rosenthal's (1984) "Fail-safe N" statistiC to estimate the number of unre- ported studies averin a' ' g null outcomes necessary to ed ruce the known lata base to nonsigificance. The Fail-safe N is 140 studies. In other words, if we were to assume that the 'observed outcomes arise from selective reporting, Owould be necessary to postulate 10 unreported studies averaging null outcomes for each reported study. Therefore, we conclude that the free-response ESP/extraversion relationship cannot be explained on the basis of selective reporting. 1 Power Analysis i The FR mean r of 120 is equivalent to an average ESP scoring advantae for extraverts over introverts of 0.4 standard devi4tions. The FR studies average sample size is 44 subjects and the likelihood of detect- ing a correlation of .2 at the five percent significance level with this samplsize?the statistical power?is 37 percent (Cohen, 197, p. 87). Thus, in a sample of elf/ 14 studies, the expected number of statistically sig- nificant studies is 5.2;41e actual number of significant studies is seven (exact binomial probability, with p = .37 Sr q = .63, = i23, one-tailed). Thus, the ob- served rate of significnt outcomes is consistent with a correlation of .2. 1 Achievement of statistical significance, assuming I a correlation of .2, is essentially a coin toss with sample sizes less than 48 subjects; a sample size of 180 is necessary to achieve 85 percent power. In the following sec validity of the ESP/e comparing the meta-a of a new data set. on, we explore the predictive traversion meta-analysis by lytic estimate to the outcome Approved For. Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 HONORTON, FERRARI & BEM 9 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Table 4. ESP/Extraversion Correlations by Experimenter In the PRL Novice Series Experimenter N Subjects Experimenter EI Score Honorton 41 .27 1.71 101 Quant 69 .29 2.38 103 Derr 22 .03 0.68 81 Berger 13 -.37 -1.18 115 Varvoglis 21 .os 0.32 133 Schechter 7 -.05 -0.10 125 Ferrari 10 -.20 -0.54 133 Schatz 7 .15 0.92 69 Note. r is the weighted average correlation coefficient (Hedges & Olkin, 1985), A New Confirmation Extraversion data is available for 221 of the 241 subjects in a series of ESP ganzfeld studies reported by Honorton, Berger, Varvoglis, Quant, Derr, Han- sen, Schechter & Ferrari (1990) and conducted at the Psychophysical Research Laboratories (PRL) in Princeton, N.J. The experimental procedures are de- scribed in detail in the Honorton, et al. (1990) report. Subjects The subjects were 131 women and 90 men. Their average age is 37 years (sd = 11.7). This is a well-edu- cated group; the mean formal education is 15.5 years (sd = 2.0) and belief in psi is strong in this population. On a seven-point scale where "1" indicates strong disbelief and '7" indicates strong belief in psi, the mean is 6.20 (sd = 1.03). Personal experiences sugges- tive of psi were reported by 88percent of the subjects; eighty percent reported ostensible telepathic experi- ences. Eighty percent have had some training in meditation or other techniques involving internal fo- cus of attention. One hundred and sixty-three sub- jects contributed a single ESP ganzfeld session and 58 contributed multiple sessions. Extraversion Measure Extraversion was measured using the continuous scores of the Extraversion/Introversion (El) Scale in Form F of the Myers-Briggs Type Indicator (MBTI; Briggs 8c Myers, 1957). The MBTI was not used in any of the meta-analysis studies. The MBTI EI Scale is constructed so that scores below 100 indicate extrav- ersion and scores above 100 indicate introversion. (For consistency with the meta-analysis, we have reversed the signs so that positive correlations reflect a positive relationship between ESP performance and extraversion.) The mean EI score for the PRL subjects is 100.36 (sd = 25.18). ESP Measure ESP performance was measured using the stand- ardized ratings of the target and decoys (Stanford's z-scores; Stanford and Sargent, 1983). Stanford z's were averaged for subjects with multiple sessions. Results Overall results. The correlation between ESP per- formance and extraversion in the PRL series is signifi- cant (r = .18,219 df, t = 2.67, p = .008, two-tailed, 95% CI from .05 to. .30). This outcome is very close to the meta-analytic estimate for free-response studies (r = .20) and the difference between the two correla- tions is nonsignificant (Cohen's q = .02, z = -0.26, p = .793, two-tailed). Ganzfeld Novices. The results are similar if we re- strict our analysis to the five PRL Novice series with inexperienced subjects who each completed a single ganzfeld session. MBTI data is available for 190 of the 205 Novices and the mean weighted r for the five series is .17 (z = 2.25, p = .024, two-tailed, 95% CI from .02 to .31). The ESP/extraversion correlations are ho- mogeneous across the five series (x24 = 2.88, p > .05). Eleven subjects in the first Novice series (Series 101) completed the MBTI between six and eighteen months after their ESP ganzfeld session and we did not maintain records of their identity. However, the results are essentially the same when this series is eliminated. The mean weighted r for the remaining four Novice series is .19 (z = 2.30, p = .021, two-tailed, 95% CI from .03 to .34). Outcome by experimenter. Eight experimenters con- tributed to the PRL data base (Honorton, et al., 1990). Table 4 shows the ESP/extraversion correlation by Approved For Release 2003/04/18 : CIA-RDP96-00789R0031000300014 10 EXTRAVERSION & ESP: A META-ANALYSIS & NEW CO4'IRMATION Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 experimenter for the five Novice series. The mean weighted r for the eight experimenters is .16 (z = 2.09, p = .037, two-tailed, 95% CI from .01 to .30). The results are homogeneous across the eight experi- menters (X27 = 6.43, p > .05). Outcome in relation to EI status of experimenter. It is possible that the relationship between ESP perform- ance and extraversion is moderated by personality characteristics of the experimenter. The last column of Table 4 shows the MBTI EI scores for each experi- menter. Only two experimenters (Derr and Schlitz) are extraverts. Two others (Honorton and Quant) are borderline introverts. While the above analyses indi- cate that the ESP/extraversion correlation is consis- tent across experimenters, there is a nonsignificant tendency for the relationship to be stronger in the data of less introverted experimenters (r = .47,6 df, p = .235, two-tailed). Combined Estimate of the Relationship between Free-response ESP Performance and Extraversion Combining the new confirmation with the meta- analysis, the overall mean weighted r is .19 (z = 5.50, p = 3.8 x 10-8, 95% CI from .13 to .26). The 'Tail-safe N" for the combined estimate is 181 studies, or a ratio of 12 unreported studies averaging null effects for each known study. Four of the five investigators have overall significant outcomes and the outcomes are homogeneous across investigators %24 = 6.03, p >.05). Discussion The Meta-Analysis Forced-choice studies. The meta-analysis challenges the conclusions from earlier narrative reviews of the relationship between extraversion and forced-choice ESP performance (Eysenck, 1967; Palmer, 1977; Sar- gent, 1981). The apparent relationship between ex- traversion and ESP performance in these studies ap- pears to be due to the influence of subjects' knowl- edge of their ESP performance on their subsequent responses to the extraversion measures. Evidence for a relationship between ESP and extraversion occurs only when extraversion was measured after the ESP test; no evidence of an ESP/extraversion relationship is found in studies where extraversion was measured before the ESP task. Evidence for a tilonzero effect in the forced-choice studies is also limited to the subset of studies involv- ing ESP testing ptiocedures that were vulnerable to potential sensory leakage. There is reason to believe, however, that this nay result from a procedural con- found: six the eight studies in this subgroup for which information on th4 order of testing is available also involved extraver ion testing following ESP feed- back. The apparent bi ' sing effect of ESP feedback prob- ably arises from ore of two possibilities. Awareness of "success" or "fliure" may lead subjects to later th perceive emselvs as more extraverted or intro- verted. Or, the prOblem may arise from an experi- menter expectancy 'effect (Rosenthal & Rubin, 1978), in which subjects rer-pond to the investigator's expec- tations that extraverts are more successful in ESP tasks than introvert. Obviously, further research will be necessary to cla4fy the problem. The existence of this problem, however, necessar- ily arouses concerr over the viability of reported relationships betwen ESP performance and other personality factors such as neuroticism (Palmer, 1977). Much of the research in these areas was con- ducted by the same similar methods we sions regarding the formance and other suspended until the examined with res vestigators, and it is likely that e used. We believe that conclu- relationship between ESP per- personality variables should be relevant study domains can be ct to this problem. Free-response studies. The meta-analysis does sup- port the existence of relationship between extraver- sion and free-responeESP performance. The free-re- sponse studies are i4iot amenable to explanation in terms of an order artlfact or other identifiable threats to validity. The ove4all correlation of .20 would be expected to occur or4y about one time in 674,000 by chance. Three of the four investigators contributing to this data base obtainIed significant ESP/extraver- sion relationships, aid the fourth investigator's re- sults approach sig n ' cance. The correlations are ho- mogeneous across investigators, and across the larg- est grouping of studis in which subjects were tested individually. The effect remains highly significant even when 71 percent of the studies, contributed by one investigator, are eliminated from consideration. Thus, the relationshij seems to be robust. Estimation of the filedrawer prob em (Rosenthal,1984), indicates that it would be neces ry to postulate 10 unreported studies averaging n 11 results for every retrieved study in order to acccunt for the observed effect on the basis of selective reporting. Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 140NORTON, FERRARI & BEM 11 Approved For Release 2003/04/18: CIA-RDP96-00789R003100030001-4 The New Confirmation The results of the confirmation, involving a new set of investigators and a new scale of extraversion, support the meta-analytic findings and increase their generalizability. The relationship between free-re- sponse ESP performance and extraversion now spans 833 subjects and five independent investigator teams. The homogeneity of the effect across the eight experi- menters in the confirmatory study further increases our confidence that the effect is replicable and is not dependent upon unknown characteristics of individ- ual investigators. A nonsignificant trend in the data does suggest that the ESP/extraversion relationship may, to some extent, be moderated by the experi- menter's extravertedness and it may be advisable for future investigators to record and report extraver- sion/introversion scores of the experimenters. The Predictive Validity of Meta-Analysis Meta-analysis is a powerful tool for summarizing existing evidence. It enables more precise estimation of the significance and magnitude of behavioral ef- fects than has been possible with traditional narrative reviews, and is useful in identifying moderating vari- ables. In the present case, meta-analytic techniques revealed a serious source of bias that had been over- looked in earlier narrative reviews of the ESP /extrav- ersion domain. Moreover, the meta-analysis identi- fied a subset of the domain that is not amenable to the discovered bias and provided an estimate of the mag- nitude of the relationship between ESP and extraver- sion in that subset. Ultimately, the usefulness of meta-analysis will be judged by its ability to predict new outcomes and in this regard we consider the results of the confirma- tion study to be especially noteworthy. The correla- tion between ESP performance and extraversion in the confirmation study is very close to that predicted by the meta-analysis. This is the second test of the predictive validity of meta-analysis in parapsy- chological problem areas; we have previously re- ported that ESP ganzfeld performance in a new series of studies (Honorton, etal., 1990), closely matched the outcomes of earlier studies in a meta-analysis (Honorton, 1985). Predictability is the hallmark of successful science and these findings lead us to be optimistic concerning the prospect that parapsychol- ogy may be approaching this more advanced stage of development. References Briggs, K. C., 4rici Myers, I. B. (1957). Myers-Briggs Type Indicator Form F. Palo Alto, CA: Consulting Psychologists Press, Inc. Cattell, R. B., Eber, H. W., & Tatsuoka, M. M. (1970). Handbook for the Sixteen Personality Factor Question- naire. Champaign, IL: Institute for Personality and Ability Testing. Cohen, J. (1977). Statistical power analysis for the behav- ioral sciences. New York: Academic Press. (Re- vised Edition.) Eysenck, H. J. (1967). Personality and extra-sensory perception. Journal of the Society for Psychical Re- search, 44, 55-70. Hedges, L. V., & Olkin, I. (1985). Statistical methods for Meta-Analysis. New York: Academic Press. Hedges, L. V. (1987). How hard is hard science, how soft is soft science? The empirical cumulativeness of research. American Psychologist, 42, 443-455. Honorton, C. (1985). Meth-analysis of psi ganzfeld research: a response to Hyman. Journal of Parapsy- chology, 49, 51-92. Honorton, C., Berger, R. E., Varvoglis, M. P., Quant, M., Derr, P., Hansen, G., Schechter, E. I., and Ferrari, D. C. (1990). Psi communication in the ganzfeld: experiments with an automated testing system and a comparison with a meta-analysis of earlier studies. In Research in Parapsychology 1989. Metuchen, NJ: Scarecrow Press. (In press.) Honorton, C., & Ferrari, D.C. (1989) "Future Telling": a meta-analysis of forced-choice precognition ex- periments, 1935-1987. journal of Parapsychology, 53, in press. Hyman, R. (1985). The psi ganzfeld experiment: A critical appraisal. Journal of Paiapsycholo , 49, 3- 49. McCarthy, D., & Schechter, E. I. (1986). Estimating effect size from critical ratios. In D. H. Weiner & D. I. Raclin (Eds.) Research in Parapsychology 1985. Scarecrow Press, pp. 95-96. Palmer, J. (1977). Attitudes and personality traits in experimental ESP research. In B. B. Wolman (Ed.) Handbook of parapsychology. New York: Van Nos- trand Reinhold. Palmer, J., & Lieberman, R. (1975). The influence of psychological set on ESP and out-of-the-body ex- periences. journal of the American Society for Psychi- cal Research, 69,193-213. Radin, D. I., & Nelson, R. D. (1989). Evidence for consciousness-related anomalies in random physical systems. Foundations of Physics, 19 ,1499- 1514. Rosenthal, R. (1984). Meta-Analytic procedures for social research. Beverly Hills, CA: Sage. Approved For Release 2003/04/18: CIA-RDP96-00789R003100030001-4 72 DaRAVERSION & ESP: A META- Approved For Release 2003/04/18 : CIA-R Rosenthal, R., & Rubin, D. B. (1978). Interpersonal expectancy effects: The first 345 studies. Behavioral and Brain Sciences, 3, 377-386. Sargent, C. L. (1981). Extraversion and perform- ance in 'extra-sensory perception' tasks. Personal- ity and Individual Differences, 2,137-143. Stanford, R. G., and Sargent, C. L. (1983). Z scores in free-response methodology: comments on their utility and correction of an error. Journal of the American Society for Psychical Research, 77,319-326. Super, D. E. (1942). The Be rnreuter Personality Inven- tory: a review of research. Psychological Bulletin, 39, 94-125. Tukey, J. W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley. Studies Used in the Meta-Analysis Some reports contain more than one study. For reports with multiple studies, the number of studies is indicated in brackets following the reference. Ashton, H. T., Dear, P. R., & Harley, T. A. (1981). A four-subject study of psi in the ganzfeld. Journal of the Society for Psychical Research, 51, 12-21. Astrom, J. (1965). GESP and the MPI measures. Jour- nal of Parapsychology, 29, 292-293. Bellis, J., & Morris, R. L. (1980). Openness, closeclness and psi. Research in Parapsychology 1979, 98-99. Braud, L. W. (1976). Openness versus dosedness and its relationship to psi. Research in Parapsychology 1975, 155-159. Braud, L. W. (1977). Openness vs. closedness and its relationship to psi. Research in Parapsychology 1976, 162-165. Casper. G. W. (1952). Effects of the receiver's attitude toward the sender in ESP tests. Jou rruzi of Parapsy- chology, 16,212-218. Fisk, G. W (1960). The Rhodes experiment. Linkage in extra-sensory perception by M. C. Marsh. Jour- nal of the Society for Psychical Research, 40,219-239 and M. C. Marsh. (unpublished). Linkage in Extra- Sensory Perception. Unpublished doctoral disser- tation, Dept. of Psychology, Rhodes University, Grahamstown, South Africa. 450 pages. Green, C. E. (1966). Extra-sensory perception and the Maudsley Personality Inventory. Journal of the So- ciety for Psychical Research, 43, 285-286. Green, C. E. (1966). Extra-sensory perception and the extraversion scale of the Maudsley Personality Inventory. Journal of the Society for Psychical Re- search, 43,337. , ANALYSIS & NEW CONORMATION DP96-00789R003100030001-4 Haraldsson, E. (10,70). Psychological variables in a GESP test using plethysmograph recordings. Pro- ceedings of the Parapsychological Association. 7, 6-7. Harley, T. A., & Sairl gent, C. L. (1980). Trait and state rs facto influen ing ESP performance in the gan- zfeld. Research in Parapsychology 1979, 126-127. Humphrey, B. M. (1945). An exploratory correlation study of persotlity measures and ESP scores. Journal of Parap chology, 9, 116-123. [3 studies] Humphrey, B. M. t(1951). Introversion-extraversion ratings in relatiOn to scores in ESP tests. Journal of Parapsychology, 15, 252-262. Kanthamani, B. K. (1966). ESP and social stimulus. Journal of Parapsychology, 30,31-38. Kanthamani, B. K.,& Rao, K. R. (1972). Personality characteristics Of ESP subjects: III. Extraversion and ESP. Journall of Parapsychology, 36, 198-212. Krishna, S. R., & Rio, K. R. (1981). Personality and 'belief' in relatio4 to language ESP scores. Research in Parapsycholo 1980, 61-63. [2 studies] McElroy, W. A., and Brown, W. K. R. (1950 Electric shocks for errorsin ESP card tests. Journal of Para- psychology, 14, 257-266. Nash, C. B. (1966). 4ation between ESP scoring level and the Minn ta Multiphasic Personality In- ventory. Journal ofthe American Society for Psychical Research, 60, 56-62. [8 studies] Nicol, J. F., & Humphrey, B. M. (1953). The explora- tion of ESP and human personality. Journal of the American Society f4r Psychical Research, 47,133-178. Nicol, J. F., & Humphlrey, B. M. (1955). The repeatabil- ity problem in ES -personality research. Journal of the American Soci y for Psychical Research, 49,125- 156. Nielsen, W. (1970). Relationships between precogni- tion scoring level and mood. Journal of Parapsy- chology, 34, 93-116 Nielsen, W. (1970). S ' dies in group targets: a social psychology class. Iroceedings of the Parapsychologi- cal Association, 7, -57. Nielsen, W. (1970). Studies in group targets: an un- usual high school group. Proceedings of the Para- psychological Association, 7, 57-58. Sargent, C. L. (1978). Hypnosis as a psi-conducive state: a controllec replication study. Journal of Parapsychology, 42,257-275. [2 studies] bring psi in the ganxfdd. New Foundation, Inc. [2 stud- Sargent, C. L. (1980). York: Pa rapsychol les] Sargent, C. L., Bartlett Response structur zfeld free-responsi H. J., and Moss, S. P. (1982). and temporal incline in gan- GESP testing. Journal of Para- psychology, 46, 85-110. [2 studies] Sargent, C. L., Harley, T. A., Lane, J., & Radcliffe, K. (1981). Ganzfeldpi-optimization in relation to session duration. R earch in Parapsychology 1980, ' 82-84. Approved For, Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 HONORTON, FERRARI & BEM 13 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Sargent, C. L., St Harley, T. A. (1981). Three studies using a psi-predictive trait variable question- naire. Journal of Parapsychology, 45, 199-214. Sargent, C. L., and Matthews, G. (1982). Ganzfeld GESP performance with variable-duration test- ing. Research in Parapsychology 1981, 159-160. Shields, E. (1962). Comparison of children's guessing ability (ESP) with personality characteristics. Journal of Parapsychology, 26, 200-210. [2 studies] Shrager, E. F. (1978). The effects of sender-receiver relationship and associated personality variables on ESP scores. Journal of the American Society for Psychical Research, 72, 35-47. [2 studies] Szczygielski, D., St Schrneidler, G. R. (1975). ESP and two measures of introversion. Research in Parapsy- chology/974, 15-17. ThaIboume, M. A., Beloff, J., and Delanoy, D. (1982). A test for the 'extraverted sheep versus intro- verted goats' hypothesis. Research in Parapsychol- ogy 1981, 155-156. [2 studies] Thalbourne, M. A., Beloff, J., Delanoy, D., & Jung- kuntz, J. H. (1983). Some further tests of the ex- traverted sheep versus introverted goats hypothesis. Research in Parapsychology 1982, 199- 200. [4 studies] Thalboume, M. A., and Jungkuntz, J. H. (1983). Ex- traverted sheep versus introverted goats: experi- ments VII and VIII. Journal of Parapsychology, 47, 49-51. [2 studies] Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Journal 4 Parapsychology, V ol. 50. December 1986 META-ANALYTIC PROCEDURES AND THE NATURE OF REPLICATION: THE GANZFELD DEBATE By ROBERT ROSENTHAL ABSTRACT: This paper is a commentary on the valuable debate between Charles Honorton (1985) and Ray Hyman (1985) about the evidence for psi in the ganzfeld situation. Their debate was a creative, constructive, and task-oriented dialogue that served admirably to sharpen the issues involved. In my commentary I focus on the concept of replication, distinguishing the troublesome older view with a more use- ful alternative. Specific issues related to replication are discussed including prob- lems of multiple testing, subdividing studies, weighting replications, and problems of small effects. The earlier meta-analytic work is summarized, evaluated, and com- pared with a meta-analysis of a different controversial area. Rival hypotheses of procedural and statistical types are discussed, and a tentative inference is offered. The conclusion calls for wider use of newer views of the success of replication. Science in general and parapsychological inquiry in particular have been well served by the recent ganzfeld debate between Charles Honorion (1985) and Ray- Hyman (1985) as organized by the Journal's editor, K. Ramakrishna Rao. Two serious and highly knowledgeable scholars have invested a, great amount of time, en- ergy, and creative thought to produce a debate that is a model of task-oriented, constructive dialogue. It is clear that the participants have been devoted to clarifying and understanding the scientific is- sues rather than simply to "scoring points." As a result of their efforts we have an excellent review of the issues to be considered in evaluating the data generated by the ganz- feld experiments. In addition, through their meta-analytic work, we have an enormously valuable quantitative summary of the ganzfeld studies. In the end, Hyman and Honorton have not resolved all their differences, nor is it likely that they will. Hyman has raised cogent and telling questions. Honorton has answered them in co- gent and ? telling terms. I am sure that Hyman will have excellent The preparation of this paper and the development of some.:'of the ?procedures described within it were supported by the National Science Foundation. Much of the summary and interpretation of the meta-analyses will bC included in a paper com- missioned by the National Academy tif Sciences dint is in preparation by Monica J. Harris and Robert Rosenthal. 318 The journal of Parapsychology study failed to replicate that of Smith. Such errors are made very frequently in most areas of psychology and the other behavioral sci- CrICCS. >? Pseudo-Successful Replications . Return now to Table I and focus attention on cell B, the cell of < M successful replication." Suppose that two investigators both rejected 0- the null hypothesis at p < .05 with both results in the same direc- . m tion. Suppose further, however, that in one study the effect size r X) was .90 whereas in the other, study the effect size r was only .10, (D 'significantly smaller than the r of .90 (Rosenthal & Rubin, 1982a). CD fa)Iii this case our interpretation is more complex. We have indeed cn M had a successful replication of the rejection of the null, but we have ts.) not come even close to a successful replication of the effect size. "Successful Replication" of Type II Error co Cell C of Table 1 represents the situation in which both studies 0 failed to reject the null hypothesis. Under those conditions investi- 1>gators might conclude that there was no relationship between the 73variables investigated. Such a conclusion could be very much in er- ror, the more so the lower the power of the two studies was low (Cohen, 1977). If power levels of the two studies (assuming medium ?effect sizes in, the population) were very high, say .90 or .95, then two failures to obtain a significant relationship would provide evi- Vence that the effect investigated was not likely to he a very large affect. If power calculations had been made assurrfiLig_a very_sma1--- --two--faihrres-to-rerett-rt.le-n-u-11 altho-ugh not providing gtrong evidence for the null would at least suggest that. the size of ahe effect in the population was probably quite modest. 2, If sample sizes of the two studies failing to reject the null were aiodest so that power to detect all but the largest effects were low, 4.e ry little could be concluded .from two failures to reject except that the effect sizes were unlikely to be enormous. For example, two in- vestigators with Ars of 20 and 40, respectively, find results not sig- nificant at p < .05. The effect sizes phi (i.e., r for dichotomous var- iables) were .29 and .20, respectively, and both p's xver'(' approximately .20. The combined p ol these two results, however., is .035[(z1 + z.2)/\77 = zi, and the mean efTect size in the mid-.20's is not trivial (Rosenthal & Rubin, 1982h). Gaizzfeld Debate-Rosen/hal 'FABLE 3 COMPARISON OF Two SF.TS OF REPLICATIONS 319 Replication sets A B 0 Study 1 Study 2 Study 1 Study 2 0 96 15 98 27 p (two-tailed) .05 .05 .01 .18 z (p) 1.96 1.96 2.58 1.34 .90 .50 .26 .96 CD (1') .20 .55 .27 .27 cs) Cohen's q (z, - .35 .00 CD Comparing Views of Replication The traditional, not very useful, view Table 1 has two primary characteristics: I. It focuses on significance level as the relevant summary statis- tic of a study. 73 of replication modeled in CO 2. It makes its evaluation of whether replication has been suc- cessful in a dichotomous fashion. For example, replications are suc- cessful if both or neither p < .05 (or .01, etc.), and they are unsuc- cessful if one p < .05 (or .01, etc.) and the other p > .05 (or .01, etc.). Psychologists' reliance on a dichotomous decision procedure accompanied by an untenable discontinuity of credibility in results varying in p levels has been well documented (Nelson, Rosenthal, & Rosnow, 1986; Rosenthal & Gaito, 1963, 1964). . --The -n-ewer,---rrrofe-tigeftit -View tirre-Plication success has two pri- mary characteristics: 1. It focuses on effect size as the more important summary sta- tistic of a study with only a relatively minor interest in the statistical significance level. 2. It makes its evaluation of whether replication has been suc- cessful in a continuous fashion. For example, two studies are. not said to be successful or unsuccessful replicates of each other but, rather, the degree of failure to replicate is specified. Table 3 shows two sets of replications. Replication set A shows two results both rejecting the null hut with a difference in effect sizes of .30 in units of' I or .35 in units of Fisher's z transformation of r (Cohen, 1977; Rosenthal & Rosnow, 1984; Snedecor & 'Coch- ran, 1980). That difference, in units of r or Fisher's z is the degree 7.) CD 6 CO CD 0 0 C.4 322 The Journal of Parapsychology multiple questions, multiple dependent variables make good scien- tific sense. However, as both Honorton (1985) and Hyman (1985) .int out, the use of multiple dependent variables may affect the curacy of the p levels computed. For example, ii. five dependent. giriables are used and one of these is found to show an effect at p a .05, it would be misleading to say that an effect has been dem- rrustrated at p < .05. That is because the actual p of finding one p ggnificant at .05 (or any other chosen level) increases as the number Azu tests made increases. That is not a good reason to decrease the Eiriety of dependent variables used, assuming there is a good the- /retical basis for choosing to use each one. Alternate procedures are available. Bonferroni procedures can ge used to adjust for the number of tests made (Rosenthal & Rubin, k)983). To overcome the conservatism of this basic approach and de- Z-ease Type II errors, it is possible to weight the dependent varia- les according to their importance and apply a so-called ordered onlerroni procedure (Rosenthal & Rubin, 1984, 1985). Perhaps it. g most useful, however, to apply specially developed procedures ?1"1".D., integrate all the information 11:011) all .the .dependent variables ittzl obtain only a single overall test of significance and effect size lstimate. This can be accomplished very easily so long as we have ?reasonable estimates of the intercorrelations among the dependent ariables (Rosenthal & Rubin, 1986). co 'ttbdi-oiding Studies w An issue discussed in the ganzield debate has to do with die sub- alivision of studies into substudies as a function of different experi- anental procedures or individual difference variables such as sex, age, degree of belief in psi effects, and the like (Schmeidler, 1968). cAs long as all the data are preserved and entered into the meta- j,,analysis, no harm is done by subdividing. Indeed, subdividing is very useful in the search for moderator variables (Rosenthal, 1984). Subdividing could have a very biasing effect on the accuracy of a cited p value if the overall data are subdivided in various ways, significant results are reported for one or inure substudies, and the rest of the substudies are "thrown away." In the ordinary more proper application of meta-analytic procedures, however, subdivid- ing makes little difference. Consider a psi experiment. with an over- all nonsignificant effect = .13, two-tailed). After the study is over, it is noted thiit about hall the subjects were favorable toward psi and half were not and that there had been both female and male sub- Gang:4d Debate?Rosenthal 323 TABLE SUBDIVISION OF A LARGER EXPERIMENT Believing subjects Disbelieving subjects Two-tailed p z Two-tailed p Female Males .05 .39 2.0 1.0.62 .62 0.5 ?0.5 0 CD Noir: For the study as a whole. p was .13 and z was 1.5 berme subdividing. Positive z's reflect results in the predicted direction; negative z's reflect results in the unpre- dicted direction. jects. Suppose that a subgroup of subjects, say female believers, show a significant psi effect but the remaining groups do not. No harm is done by reporting that fact, though an adjustment is useful in reporting the obtained p that takes into account how many subgroups were tested. It is essential, however, that the results of significance tests for the nonsignificant subgroups also be entered into the meta-analysis. Table 4 illustrates the situation; four substudies have been formed, only one of which was .significant. When we combine the results of the four substudies, however, we find the overall z to be [(2.0) + (1.0) + (0.5) + (-0.5)]/V21 = 1.5, p = .13, two-tailed. Es- sentially, subdividing makes little difference so long as no data are discarded. If a particular substudy showed great promise of evi- dencing psi, nothing would prevent the investigator from conduct- ing new studies using only the preselected experimental conditions or types of subjects. It would also be appropriate to conduct .a meta- analysis on all the substudies that could be found that met the promising condition. In that case, however, the initial "study of dis- covery" should be entered with an adjustment for the fact that sev- eral tests of significance were computed (Rosenthal & Rubin, 1983, 1984). Flaw Effects and Weighting Replications There are few flawless studies in the behavioral sciences. Flaws can increase Type I or Type II errors, and the wise meta-analyst would do well to note how well Hyman (1985) and Honorton (1985) have searched for and evaluated flaws. For each flaw, it would be desirable to make some estimate of how much difference it made to the outcome. In the present debate some flaws scented to make a difference and others did not. When Haws matter we can adjust for 320 - The Journal of Parapsychology of failure to replicate. "lhat both studies were able to reject the null and at exactly the same p level is simply a function of sample size. Replication set B shows two studies with different p values, one sig- nificant at < .05, the other not significant. However, the two effect size estimates are in excellent agreement. We would say, accord- ingly, that replication set B shows more successful replication than does replication set A. It should be noted that the values of Table 3 were chosen so that the combined probability of the two studies of set A would be iden- tical to the combined probability of the two studies of set B; (z, +. z2)/V2 = z of 2.77, p = .0028, one-tailed. The Metrics of the Success of Replication Once we adopt a view of the success of replication as a function of similarity of effect sizes obtained, we can become more precise in our assessments of the success of replication. Figure 1 shows the "replication plane" generated by crossing the results of the first study conducted (expressed in units of the effect size r) by the re- sults of the second study conducted. All perfect replications, those in which the effect sizes are identical in the two studies, fall on a diagonal rising from the lower left corner (-1.00,-1.00) to the up- per right corner (+1.00, +1.00). The results of replication set B from Table 3 are shown to fall exactly on the diagonal of successful replication (+ .26, +.26). The results of replication set A are shown to fall somewhat above the line representing perfect replication. Fig- ure 1 shows that although set B reflects a more successful replica- tion than set A, the latter is also located fairly close to the line and is, therefore, a fairly successful replication set as well. Cohen's q. An alternative to the indexing of the success of repli- cation by the difference between obtained effect size r's is to trans- form the I's to Fisher's z's before_ taking the_clifference?Fishe-es-z------ IffelTifli-distributed nearly normally and can thus be used in setting confidence intervals and testing hypotheses about r's, whereas r's distribution is skewed, and the more so as the population value of ?- moves further from zero. Cohen's q is especially useful for testing the significance of difference between two obtained effect size r's. This is accomplished by means of the fact that 1 1 N, ? 3 N, ? 3 is distributed as z, the standard normal deviate (Rosenthal, 1984; Ganzfeld Debate?Rosenthal 321 ?1.00 ?.80 ?.60 ?.40 ? .20 .00 .20 .40 .60 .80 1.00 1.00 I I .80 ? .60 ? .40 --- n o ? z .00 ? 0 40 ? .20 ? '6 ? .60 ? .80 ? 1.00 Set A e,'R. Set B ? Figure 1. The replication plane. Rosenthal & Rubin, 1982a; Snedecor & Cochran, 1980). When there are more than two effect size r's to be evaluated for their variability (i.e., heterogeneity), the three references above all provide the ap- propriate formula for computing the test of the heterogeneity of ISSUES RELATED TO REPLICATION Multiple Testing In ganzfeld studies, in parapsychological research more broadly, and, indeed, in most areas of behavioral science, it is common that more than one test of significance is computed to evaluate a -re- search hypothesis.. There may, for example, be a set of several de- pendent variables used to evaluate outcome. So long as there are 0 0 0 0 0 324 The. J ourn al of Para psvch o logy these flaws in our weighting or studies. For example, we Call give weights of zero to truly terrible studies and lowered but nonzero eights to less than truly terrible studies. Such weighting may lead ?!;) less biased conclusions than simple discarding of studies for flaws iske, 1978; Rosenthal, 1984; ? Rosenthal & Rubin, 1985). eplication Difficulty and Small Effects 0 Although I lyman (1985) and I Ionorton (1985) disagree on the 2.egree of confidence warranted by the ganzleld literature, they gree that the results reported do not reflect an enormous magni- (Pude of effect. In Cohen's (1977) terminology, the average size of ganzfeld effect reported by Hyman. (1985) and H.onorton (1985) a on the small side. That, of course, is not surprising. Controversial aiesearch areas are characterized by small effect sizes. For example, a recent review of five controversial areas of human performance Tesearch, Harris and Rosenthal (1986) estimated the actual effect. oizes (r) to range only from .00 to .18 with a median of .10 and a 5% confidence interval ranging From .02 to .19. *I Small effect sizes are just what we shonld expect from contro- gersial areas. According to fundamental principles of statistical ower (Cohen, 1977), if the true effect. size were substantial, studies cbvith only modest sample sizes would routinely be able to reject the ? c. -41u11. For example, if the population value of r were .60, J0 of co crceplication attempts would he significant at p < .05 with sample sizes gl 24 (Cohen, 1977, p. 92). However, if' the population value of r avere .10, the median of out- five controversial areas (Harris & Ro- aenthal, 1986), only 7% of replication attempts would be significant it p < .05 with sample sizes of 24. For the small population value r (.10), it would require sample sizes of' over 1,000 to achieve a rate of rejecting the null at p < .05. " Even though controversial research areas are characterized by small effects (including zero as a possibility), that does not mean that the effects are of no practical importance. Indeed, the median small effect of five areas cited above (r = .10) is equivalent to improving our success rate from 45% to a success rate of' 55% (Rosenthal & Rubin, 19824 Before leaving the topic of replication difficulty, it may help us to place this problem in useful perspective by noting that it is not only in the parapsychological or 01.11C1' bell:160VA sciences that rep- lication difficulties emerge. Indeed, students of the physical sciences have pointed out failures to replicate the construction of TEA-lasers Gan;reld Debate?Rosenthal 325 despite the availability of detailed instructions for replication. Ap- parently TEA-lasers could be replicated dependably only when the replication instructions were accompanied by a scientist who had ac- tually built a laser (Collins, 1985). SUMMARIZING THE META-ANALYSES Ilyman (1985) and Honorton (1985) have done important meta- analytic work on the topic of' the ganzfekl experiments; it is this work 1 summarize here. Five indices of "psi" success have been used in ganzfeld research (Honorton, 1985). One criticism of research in this area is that some investigators used several such indices in their studies and failed to adjust their reported levels of significance (p) for the fact that they had made multiple tests (Hyman, 1985). Because most studies used a particular one of these five methods, the method of direct hits; Honorton focused his meta-analysis on just those 28 studies ?(or.:.ii total of 42) for which direct hit data were available. The method of direct hits scores a success only when the single correct target is chosen out of a set of I total targets. Thus, the prob. ability of success on a single trial is 1// with I usually = 4 but some,- times 5 or 6. The other methods, using some form of partial credit, appear to be more precise in that they use more of the information available. Although they differ in their interpretation of the results, Honorton (1985) and Hyman (1985) agree quite well on the basic quantitative results of the meta-analysis of these 28 studies. This agreement holds both for the estimation of statistical significance (Honorton, 1985, p. 58) and of effect size (Hyman, 1985, p. 13): Stem-and-Leaf Display Table 5 shows a stem-and-leaf display of the 28 effect size esti- mates based on the direct hits studies summarized by Honorton (1985, p. 84). The effect size estimates shown in Table 5 are in units of Cohen's h, which. is the difference between (a) the arcsine 'trans- formed proportion of direct hits obtained and (b) the arcsine trans- formed proportion of direct hits expected under the null hypothesis (i.e., lit). The advantage of it over j, the difference between raw pro- portions, is that all It values that are identical are identically .detect- able whereas all j values that are identical (e.g., .65 - .45 and .25-.05) are not equally detectable (Cohen, 1977, p. 181). Approved For Release 2003/04/18 396 The immlill oJ ParapAyehology TABLE 5 STEM-AND-LEAF PLOT or "DIRECT HIT" GANZFELD STUDIES: COHEN'S It Stem Le a 1.4 1.3 1.2 1.1 1.0 .9 .8 .7 .6 .5 .4 .3 .2 .1 .0 -.0 -.1 .2 -.3 -.4 CD -.5 -.6 0 co CD -.9 3 0 _ ukey (1977) developed the stem-and-leaf plot as a special form o f frequency distribution to facilitate .the inspection of a batch of Qata. Each number in the data batch is made up of one stein and re leaf, but each stem may serve several leaves. Thus, the stem .1 -14 followed by leaves of 3, 8, 8 representing the numbers .13, .18, 8. The first digit is the stem; the next digit is the leaf. The stem- and-leaf display functions as any other frequency distribution but the original data are retained precisely. Distribution of studies. From Table 5 we see that the distribution of effect sizes is unimodal, with the bulk of the results (80%) falling between -.10 and .58. The distribution is nicely symmetrical, with the skewness index (gi = .17). only 24% of that required for signif- icance at p < .05 (Snedecor & Cochran, 1980, pp. 78-79, 492). The . tails of the distribution, however, are too long for, normality with , 4 3 3 8 0 2 2 2 4 1 2 2 4 4 7 8 2 3 8 8 7 7 9 5 0 2 0 : Ganzfeld Debate-Rosenthal 327 kurtosis index g, = 2.04, p = .02. Relative to what we would expect from a normal distribution, we have studies that show larger posi- tive and larger negative effect sizes than would be reasonable. In- deed, the two largest positive effect sizes are significant outliers at p < .05, and the largest negative effect size approaches significance, with a Dixon index of .37 compared to one of .40 for the largest positive effect size (Snedecor & Cochran, 1980, pp. 279-280, 490). The total sample of studies is still small; however, if a much larger sample showed the same result, that would be a pattern consistent with the idea that both strong positive results ("psi") and strong neg- ative results ("psi-missing") might be more likely to find their way into print or at least to be more available to a meta-analyst. Distribution of subjects. It is useful to examine the distribution of effect sizes obtained in the summarized studies. It would also be: useful to examine the distribution of effect sizes obtained by indi- vidual subjects wit/un the studies summarized. For example, in a study with a mean I,. of .20, is the distribution of h fairly normal with centering at .20, or is the distribution skewed with the bulk of the subjects centered closer to zero but with a few subjects earning con- sistently high values of .1i? ? ? ? ? - ?? ? - ? ? ??? ? ? Distribution of investigators. Just as it is useful to examine the dis- tribution of the results of studies ancl of subjects within studies. it is also useful to examine the distribution of results obtained by differ- ent investigators (Honorton, 1985; Hyman, 1985; Rosenthal, 1969, 1984). The 28 direct hit studies were conducted by 10 different in- vestigators (Honorton, 1985, p. 60). Four investigators conducted only one study each, two conducted two studies each, two conducted ree-ai tidies each, one conductMlive studies, and one conducted nine studies. Analysis of variance showed that these 10 investigators differed significantly and importantly in the average magnitude of the effects they obtained with F(9,18) = 3.81, p < .01, eta = .81. Interestingly, there was little relationship between the mean effect size obtained by each investigator and the number of studies con7 ducted (r = .11; 48) = 0.31, p > .70). That different investigators may obtain significantly different re:: sults from their subjects is well known in various areas Of psychology (Rosenthal, 1966). For example, in such a standard experimental area as eyelid conditioning, studies conducted at Iowa obtained re- sults in the predicted direction 94% of the time, whereas those cOn- ducted elsewhere obtained such results only 62% of the time with x2(1) = 4.05, p < .05, N = 25, r = .40 (Rosenthal, 1966, p. 24; 19"`I 110 328 The Jon tool TABLE 6 STATISTICAL SUMMARY OF -DIREC1* HIT" GANZFELD STUDIES > Central tendency (Coheds Variability Unweighted mean .98 Maximum 1.44 o 'Weighted mean ? .23 Quartile 3 (Q3) .42 CD Median .32 Median (Q2) .32 a Proportion positive sign .89 Quartile 1 (Q1) .08 0 Minimum - .93 .Sio-nificance tests h Q3 - Q1 .3/1 combined Stouffer z (T) I test of mean z 6.60 3.23 ii-: 1.75 (Q3 -- Q1J .96 .45 Z of proportion positive CD 3.10 Correlation of Ii With z .86 w Confidence intenials" With raw j .98 0 Front To co 0 80% 95% 99% 99.9% .17 .11 .04 -.03 .39 .45 .52 .59 'Based on N of 28 studies. 0 toll Summary of Stein-and-Leaf Display 6 Table 6 provides a summary of the stem-and-leaf display of Ta- CO ble 5 and some additional useful information about central ten- ? dency, variability, significance tests, confidence intervals, and corre- lations between Cohen's h and (a) significance level (z) and (b) raw tj difference in proportions (j). Only a few comments are required. 0 Effect size. The bulk of the results (82%) show a positive effect (.4 size where 50% would be expected under the null (p= .0004). The 0 mean effect size, h, of .28 is equivalent to having a direct hit rate of .38 when .25 was expected under the null. The 95% confidence in- terval suggests the likely range of effect sizes to be from .11 to .45, equivalent to accuracy rates of .30 to .46 when .25 was expected under the null hypothesis. Significance testing. The overall probability that obtained accuracy was better than the accuracy expected under the null was a p of 3.37/10" associated with a Stouffer z of 6.60 (Mosteller Ss: Bush, 1954; Rosenthal, 1978a, 1984). File-drawer analysis. A combined p as low as that obtained can be used as a guide to the tolerance level for null results that never found their way into the meta-analytic data base (Rosenthal, 1979, Ganzfeld Debate 329 1984). It has long been believed that studies failing to reach statis- tical significance may be less likely to be published (Rosenthal, 1966; Sterling, 1959). Thus it may be that there is a residual of nonsignifi- cant studies languishing in the investigators' file drawers. With sim- ple calculations, it can be shown that, for the current studies sum- marized, there would have to be 423 studies with mean p .50, one-tailed, or z = 0.00 in those file drawers before the overall com- bined p would become just > .05, as Honorton (1985) has pointed out. That many studies unret rieved seems unlikely for this specialized area of parapsychology (1-lonorton, 1985; Hyman, 1985). Based on experience with meta-analyses in other-domains of research (e.g., interpersonal expectancy effects) the mean z or effect size for non- significant studies is not 0.00 but a value pulled strongly from 0.00 toward the mean z or mean effect size of the obtained studies (Ro- senthal & Rubin, 1978). Comparison with an Earlier Meta-Analysis It is instructive to compare die results of the ganzfeld research meta-analysis by Honorton (1985) with the results of an older and larger meta-analysis of another controversial research domain- that of interpersonal expectancy effects (Rosenthal & Rubin, 1978). In that analysis, eight areas of expectancy effects were summarized; effect sizes (Cohen's d, roughly equivalent to Cohen's h) ranged from .14 to 1.73 with a grand mean d of .70. Honorton's mean ef- fect size (h = .28) exceeds the mean d of two of the eight areas (reaction time experiments [d = .17], and studies using laboratory interviews [d .14]). The earlier meta-analysis displayed the distribution of the z's as- sociated with the obtained p levels. Table 7 shows a comparison of the two meta-analyses' distributions of z's. It is interesting to note the high degree of similarity in the distributions of significance lev- els. The total proportion of significant results is somewhat higher for the ganzfeld studies but not significantly so (x2(1) = 1.07, N' = 373, p = .30, (1) = .05). INTERPRETING THE IVIETA-ANALVTIG RESULTS Although the results of the meta-analysis are clear, the meaning of these results is open to various interpretations. The most obvious 0 ? Predicted direction + 3.72 and above + 3.09 and above 0 + 2.33 and above ? + 1.65 and above - Not significant ? - 1.64 to + 1.64 ? Unpredicted direction - 1.65 and below 0 C.4 330 The journal of Parapsychology TABLE 7 PROPORTION OF STUDIES REACHING CRITICAL LEVELS FOR Two RESEARCH AREAS OF SICNIFICANCE Interval for z Expected Expectancy Ganzfeld proportion research research" Difference .0001 .07 .04 -.03 .001 .12 .18 M6 .01 .19 .25 .06 .05 .36 .43 .07 .90 .60 .50 -.10 .05 .03 .07 .04 W = 345 studies: from Rosenthal & Rubin (1978). ''N = 28 studies; Iron] Ilonorton (1085). al interpretation might be that at a very low p, and with a fairly im- O pressive effect size, the ganzfeld psi phenomenon has been dem- onstrated. However, there are rival hypotheses that will need to be iJ -considered, many of them put forward in the detailed evaluation by Hyman (1985). c't Procedural Rival Hypotheses co Senso7y leakage. A standard rival hypothesis to the hypothesis of . 0 ? ESP is that sensory leakage occurred and that the receiver was knowing by the-sender ot by an incerfriedi- ? e ary between the sender and receiver. As early as 1895, Hansen and E3 Lehmann (1895) described "unconscious whispering" in the labora- tory, and Kennedy (1938, 1939) was able to show that senders in ? telepathy experiments could give auditory cues to their receivers 4. quite unwittingly. Ingenious use of parabolic sound reflectors made this demonstration possible. Moll (1898), Stratton (1921), and War- ner and Raible (1937) all gave early warnings on the dangers of un- intentional cueing (for summaries see Rosenthal, 1965, 1966). The subtle kinds of cues described by these early workers were just the kind we have come to look for in searching for cues given off by experimenters that might serve to mediate the experimenter ex- pectancy effects found in laboratory settings (Rosenthal, 1966, 1985). ii II I Gan4-eld Debate-Rosenthal 331 By their nature, ganzfeld studies tend to minimize problems of sensory cueing. An exception occurs when the subject is asked to choose which of four (or more) stimuli has been "sent" by another person or agent.. When the sante stimuli held originally by the sender are shown to the receiver, finger smudges or other marks may serve as cues. Honorton has shown, however, that studies con- trolling for this type of cue yield at least as many significant effects as do the studies not controlling for this type of cue. Recording errors. A second rival hypothesis has nearly as long a history. Kennedy and Uphoff (1939) and Sheffield and Kaufman (1952) both found biased errors of recording the data of parapsy- chological experiments. In a meta-analysis of 139,000 recorded ob- servations in 21 studies, it was found that about 1 % of all observa- tions were in error and that, of the errors committed, twice as many favored the hypothesis as opposed it (Rosenthal, 1978b). Although it is difficult to rule recording errors out of.ganzfeld studies (or any other kind of research), their magnitude is such that they could probably have only a small biasing effect on the estimated average effect size (Rosenthal, 1978b, p. 1007). Intentional error. The very recent history of science has reminded us that even though fraud in science is not quite of epidemic pro- portion, it must be given close attention (Broad & Wade, 1982; Zuckerman, 1977). Fraud in parapsychological research has been a constant concern, a concern found to be justified by periodic fla- grant examples (Rhine, 1975). In the analyses of Hyman (1985) and Honorton (1985), in any case, there appeared to be no relationship between degree of -monitoring of participants and the results of the _study. Statistical Rival Hypotheses File-drawer issues. The problem of biased retrieval of studies for any meta-analysis was described earlier. Part or this problem is ad- dressed by the 10-year-old norm of the Parapsychological Associa- tion of reporting negative results at its meetings and in its journals (Honorton, 1985). Part of this problem is addressed also by Black- more (1980), who conducted a survey to retrieve unreported ganz- feld studies. She found that 7 of her total of 19 studies were judged significant overall by the investigators. This proportion of significant results (.37) was not significantly (or appreciably) lower than the proportion of published studies found significant (.43) in Honor- ton's (1985) meta-analysis of direct hit ganzleld studies ((1) = rI ill 332 The Journal of Parapsychology 0.17, ck. = .06. Somewhat similar results were obtained by Sommer > (in press). in her analysis of research on the menstrual cycle. She _Om found 61% of the published results to be significant compared to a 40% .of the unpublished studies; x2(1) = 2.30, p < .065, one-tailed, al (I) = .20. The results of the Blackmore and Sommer studies did not a' differ significantly (z = 0.69). Taken together, these studies provide 071 only modest evidence for a serious file-drawer problem. A problem that seems to be a special case of the file-drawer a) problem was pointed out by Hyman (1985). That was a possible ten- dency to report the results of pilot studies along with subsequent ((0 significant results when the pilot data were significant. At the same n.) time it is possible that pilot studies were conducted without prom- o i . o sing results, pilot studies that then found their way into the file --- drawers. In any case, it is nearly impossible to have an accurate es- timate of the number of unretrieved studies or pilot studies actually at conducted. Chances seem good, however, that there would be fewer ^ than the 423 results of mean z = 0.00 required to bring the overall 0 combined p io > .05. Multiple testing. Each ganzfeld -study may have More than one de- ? pendent variable for scoring degree of success. If investigators use co these dependent variables sequentially until they find one significant 6 at p < .05, the true p will be higher than .05 (Hyman, 1985). This ^ issue was discussed earlier; it is not an inherently intractable one ?S (Rosenthal & Rubin, 1986). ? Randomization. Hyman (1985) has noted that the target stimulus 0 may not have been selected in a truly random way from the pool of " potential targets. To the extent that this is the case, the p values 0 calculated can be in error. Hyman (1985) and Honorton (1985) dis- agree over the frequency in this sample of studies of improper ran- o 0 domization. In addition, they disagree over the magnitude of the " relationship between inadequate randomization and study outcome. 4` Hyman felt this relationship to be significant and positive; Honorton felt this relationship to be nonsignificant and negative. Because the median p level of just those 16 studies using random number tables or generators (z = .94) was essentially identical to that found for all 28 studies, it seems unlikely that poor randomization procedures were associated with notch of an increase in significance level (Ilon- orton, 1985, P. 71). Statistical errors. Hyman (1985) and Honorton agree that 6 of the 28 studies contained statistical errors. However, the median effect size of these studies (II = .33) was very similar to the overall median = .32), so that it seems unlikely that these errors had a major Ganzfeld Debate?Rosenthal effect on the overall effect size from the analysis decreases the is equivalent to a drop of the when .25 is the expected value A Tentative Inference 333 estimate. Omitting these six studies mean h from .28 to .26. Such a drop mean accuracy rate from .38 to .37 under the null. On the basis of the preceding summary and the very valuable meta-analytic evaluations of Honorton (1985) and Hyman (1985), what are we to believe? It would be easiest to say, "Let's wait until more data have been accumulated from studies purged of the prob- lems noted by Hyman, Honorton, and others." That is not a realistic approach. At any point in time some judgment can be made, and though our judgment might be more accurate later on when those more nearly perfect studies become available, the situation for the ganzfeld domain seems reasonably clear. We feel it would be im- plausible to entertain the null given the combined p from these 28 studies. Given the various problems or flaws pointed out by Hyman and Honorton, the true effect size is almost surely smaller than the mean h of .28 equivalent to a mean accuracy of 38% when 25% is expected under the null. We are persuaded that the net result of statistical errors was a biased increase in estimated effect size of at least a full percentage point (from 37% to 38%). Furthermore, we are persuaded that file-drawer and related problems are such that some of the smaller effect size results have probably been kept off the market. If pressed to estimate a more accurate effect size, we might think in terms of a shrinkage of /z from the obtained value.of .28 to perhaps an h of .18. Thus, when the accuracy rate expected under the null is 1/4, we might estimate the obtained accuracy rate to be about 1/3. CONCLUSION Parapsychologists in particular and scientists in general owe a great debt of gratitude to Ray Hyman (1985) and Charles Honorton (1985) for their careful and extensive analytic and meta-analytic work on the ganzfeld problem. Their debate has yielded an espe- cially high lightTheat ratio, and many of the important issues have now been brought out into bold relief. In my commentary on the ganzfeld debate, I focused Most closely on the concept of replication. That seemed appropriate, not Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 P-1?000?0001,COM68/00-96dCIU-VI3 814170/?00Z eseeieu -10d 130A0iddV 334 The Journal of Parapsychology only because of the centrality of the problem of replicabilitv in the parapsychological literature, but also because of the centrality of the problem in many sciences, especially when the. efThcl sizes sought in the population are small. The effect size zero is only a special case of the class of small effect sizes. In closing I want only to suggest that parapsychological and other behavioral sciences would be well served to modify their view of the success of replication in the direction of the following newer view: I. A replication is successful to the degree that the second study obtains an effect size similar to the effect size of the first study. 2. Three or more investigations arc successful replicates of one another to the extent that the effect sizes are homogeneous. 3. Significance testing has nothing to do with success of replica- tion though it can be useful in many ways, including the assessment. of the likelihood of the null given all prior research (weighted as desired and as reasonable) and the likelihood of real differences among the effect sizes of two or more studies. R.I.TERENUE,S BLACKMORE, S. (1980). The extent of selective reporting of. ESP ganzfekl studies. European Journal of Parapsychology, 3, 213-219. BROAD, W., & WADE, N. (1982). Betrayers of the truth. New York: Simon and Schuster. COHEN, J. (1977). Statistical power analysis for the behavioral sciences (rev. ed.). New York: Academic Press. COLLINS, H. M. (1985). Changing order: Replication and induction in scientific ,-GA:-Sage. FISKE, D. W. (1978). The several kinds of generalization. The Behavioral and Brain Sciences, 3, 393-394. HANSEN, F. C. C., & 1..EnmANN, A. (1895). Veber Unwillkiirliches Flustern. Philosophische Studien, 11, 471-530. HARRIS, M. J., & ROSENTHAL, R. (1986). Interpersonal expectancy effects- and human pmformance research. Report prepared for the National Academy of Sciences. HONORTON, C. (1985). Meta-analysis of psi ganzfeld research: A response to Hyman. Journal of Parapsychology, 49, 51-91. HYMAN, R. (1985). The ganzfeld psi experiment: A critical appraisal. jour- nal of Parapsychology, 49, 3-49. KENNEDY, J. L. (1938). Experiments on "unconscious whispering." Psycho- logical Bulletin, 35, 526. (Abstract) KENNEDY, J. 1.. (1939). A methodological review of extra-sensory percep- 'il#on. lrylogic 1letnz TIv9-17 : !11 Ganzfeld Debate-Rosenthal 335 KENNEDY, J. I.., & tIMton,. H. F. (1939). Experiments on the nature of extra-sensory perception: 111. The recording error criticism of extra- chance scores. Journal of Pampsychohny, 3, 226-245. Mutt., A. (1898). //y/mo/isnt (Atli ed.). New York: Scribner. MosTELLER, F. M., & Busti, R. R. (1954). Selected quantitative techniques. In G. Lindsey (Ed.), Handbook of social psychology: Vol. I. Theory and method (pp. 289-334). Cambridge, MA: Addison-Wesley. NELSON, N., ROSENTHAL, R., & RosNow, R. L. (1986). Interpretation of significance levels and effect sizes by psychological researchers. Ameri- can Psychologist, 41, 1299-1301. RAo, K. R. (1985). The ganzfeld debate. Journal of Parapsychology, 49, 1-2. RHINE, J. B. (1975). Second report on a case of experimenter fraud. Journal of ['a ra psycho( oKy 19 906-325, ROSENTIIM? R. (1965), Clever Hans: A case study of scientific method. In 0. Pfungst, Clever Hans (pp. ix-xlii). New York: Holt, Rinehart and Winston. ROSENTHAL, R. (1966). Experimenter (fleets in behavioral research. New York: Appleton-Century-Crofts. RosEKrunt., R. (196(.0. Interpersonal expectations. In R. Rosenthal R. L. Rosnow (Eds.), Artifact in behavioral research (pp. 181-277). New York: Academic Press. ROSENTIIM., R. (1978a). Combining results of 'independent studies. Psycho- logical Bulletin, 85, 185-193. ROSENTHAL, R. (1978b). How often are our numbers wrong? American Psy- chologist, 33, 1005-1008. ROSENTHAL, R. (1979). The "file drawer problem" and tolerance for null results. Psychological Bulletin, 86, 638-641. RosEgrum., R. (1984). Meta-analytic procedures for social Hills, CA: Sage. -Rostwm-AL, R. (+98-5)-.--Nonvti hal cues- in t1 mediation of-interpersonal-. research. Beverly expectancy effects. In A. W. Siegman & S. Feldstein (Eds.), Mzdtichannel 0 0 integrations of nonverbal behavior (pp. 105-128). Hillsdale, NJ: Lawrence 0 Erlbaum Associates. 0 ROSENTHAL, R., & GArro, J. (1963). The interpretation of levels of signifi- cance by psychological researchers. Journal of Psychology, 55, 33-38. RosErmint., R., & GAIT?, J. (1964). Further evidence for the cliff effect in the interpretation of levels of significance. Psychological Reports, 15, 570. ROSENTHAL, R., & ROSNOW, R. L. (1984). Essentials of behavioral research: Methods and data analysis. New York: McGraw-Hill. ROSENTHAL, R., & RUBIN, D. B. (1978). Interpersonal expectancy effects: The first 345 studies. The Behavioral and Brain Sciences, 3, 377-386. ROSENTHAL, R., & RUBIN, D. B. (1979). Comparing significance levels of independent studies. Psychological Bulletin, 86, 1165-1168. ROSENTHAL, R., & RuinN, I). B. (1982a). Comparing effect sizes of 'hide- , -tpdieF 13,-trho/ogH?,qulletin 92, 500-504. 'I 111 17-1.000?0001.?00t169/00-96dCltl-VI3 914170/?00Z eseeieu JOd peACLIddV 336 The Journal of Parapsycholog,y RosENTHAL, R., & RUBIN, D. B. (1982b). A simple, general purpose display of magnitude of experimental effect. journal of Educational Psychology, 74, 166-169. ROSF.NTHAL, R., & RUBIN, D. B. (1983). Ensemble-adjusted p values. Psycho- logical Bulletin, 94, 540-541. ROSENTHAL, R., & RUBIN, D. B. (1984). Multiple contrasts and ordered ? Bonferroni procedures. journal of Educational Psychology, 76, 1028- 1034 RosENTHAt., R., & RUBIN, I). B. (1985). Statistical analysis: Summariziry evidence versus establishing facts. Psychological Bulletin, 97, 527-529. ROSEN-I-HAL, R., & RUBIN, D. B. (1986). Meta-analytic procedures for corn. bining studies with multiple effect sizes. Psychological Bulletin, 99, 400- 406. SCHMEIDLER, G. R. (1968). Parapsychology. In International Encyclopedia the Social Sciences (pp. 386-390). New York: MacMillan & Free Press. SHEFFIELD, F. D., KAUFMAN, KS., & RHINE, J. B. (1952). A PK experimen at Yale starts a controversy. Journal of the American Society for Psychica Research, 46, 111-117. SNEDECOR, G. W., & COCHRAN, W. G. (1980). Statistical methody (7th ed.) Ames: Iowa State University Press. SOMMER, B. (in press). The file drawer effect and publication rates in men strual cycle research. Psychology of Women Quarterly. SPENCE, K. W. (1964). Anxiety (drive) level and performance in eyelid con dit ioning, Psychological Bulletin. 6 1 , I 20-139, STERLING, T. D. (1959). Publication decisions and their possible effects o inferences drawn from tests of significance?or vice versa. Journal ( the American Statistical Association, 54, 30-34. STRATTON, G. M. (1921). The control of another person by obscure sign Psychological Review, 28, 301-314. TRUZZI, M. (1981). Reflections on paranormal communication: A zetetic perspective. In T. A. Seheok & R. Rosenthal (Eds.), The Clever Hal phenomenon (pp. 297-309). New York: New York Academy of Science TUKEY, J. W.. (1977). Exploratmy data analysis. Reading, MA: Addison-We Icy. \VARNER, 1,.. & RAnn.E. M. (1937). Telepathy in the psychophysical labor tory. Journal of Parapsychology, 1, 44-51. ZUCKERMAN, 11. (1077). Deviant behavior and social control in science. 1 E. Sagarin (Ed.), Deviance and social change (pp. 87-138). Beverly Hill CA: Sage. Department of Psychology Harvard University Cambridge, MA 02138 Approved For Release 2003/04/18 CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 REPLICATION AND META-ANALYSIS IN PARAPSYCHOLOGY Jessica Utts Division of Statistics University of California, Davis 1. INTRODUCTION In a June 1990 Gallup Poll, 49% of the 1,236 respondents claimed to believe in extrasensory perception (ESP), and one in four claimed to have had a personal experience involving telepathy (Gallup and Newport, 1991). Other surveys have shown even higher percentages; the University of Chicago's National Opinion Research Council recently surveyed 1,473 adults, of which 67% claimed that they had experienced ESP (Greeley, 1987). Public opinion is a poor arbiter of science, however, and experience is a poor substitute for the scientific method. For more than a century, small numbers of scientists have been conducting laboratory experiments to study phenomena such as telepathy, clairvoyance, and precognition, collectively known as "psi" abilities. This paper will examine some of that work, as well as some of the statistical controversies it has generated. Parapsychology, as this field is called, has been a source of controversy throughout its history. Strong beliefs tend to be resistant to change even in the face of data, and many people, scientists included, seem to have made up their minds on the question without examining any empirical data at all. A critic of parapsychology recently acknowledged that "The level of the debate during the past 130 years has been an embarrassment for anyone who would like to believe that scholars and scientists adhere to standards of rationality and fair play" (Hyman, 1985a, p.89). While much of the controversy has focused on poor experimental design and Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 8 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 2 potential fraud, there have been attacks and defenses of the statistical methods as well, sometimes calling into question the very foundations of probability and statistical inference. Most of the criticisms have been leveled by psychologists. For example, a 1988 report of the U.S. National Academy of Sciences concluded that "The committee finds no scientific justification from research conducted over a period of 130 years for the existence of parapsychological phenomena" (Druckrnan and Swds, 1988, p. 22). The chapter on parapsychology was written by a subcommittee chaired by a psychologist who had published a similar conclusion prior to his appointment to the committee (Hyman, 1985a, p.'7). There were no parapsychologists involved with the writing of the report. Resulting accusations of bias (Palmer, Honorton and Utts, 1989) led U.S. Senator Claiborne Pell to request that the Congressional Office of Technology Assessment (OTA) conduct an investigation with a more balanced group. A one-day workshop was held on September 30, 1988 bringing together parapsychologists, critics, and experts in some related fields (including the author of this paper). The report concluded that parapsychology needs TMa fairer hearing across a broader spectrum of the scientific community, so that emotionality does not impede objective assessment of experimental results" (Office of Technology Assessment, 1989). It is in the spirit of the OTA report that this article is written. After Section 2, which offers an anecdotal account of the role of statisticians and statistics in parapsychology, the discussion turns to the more general question of replication of experimental results. Section 3 illustrates how replication has been (mis)interpreted by scientists in many fields. Returning to parapsychology in Section 4,a particular experimental regime called the "ganzfeld" is described, and an extended debate about the interpretation of the experimental results is discussed. Section Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 3 5 examines a meta-analysis of recent ganzfeld experiments designed to resolve the debate. Finally, Section 6 contains a brief account of meta-analyses that have been conducted in other areas of parapsychology, and conclusions are given in Section 7. 2. STATISTICS AND PARAPSYCHOLOGY Parapsychology had its beginnings in the investigation of purported mediums and other anecdotal claims in the late 19th century. The Society for Psychical Research was founded in Britain in 1882, and its American counterpart was founded in Boston in 1884. While these organizations and their members were primarily involved with investigating anecdotal material, a few of the early researchers were already conducting "forced-choice" experiments such as card-guessing. (Forced-choice experiments are like multiple choice tests; on each trial the subject must guess from a small, known set of possibilities.) Notable among these was Nobel Laureate Charles Richet, who is generally credited with being the first to recognize that probability theory could be applied to card-guessing experiments (Rhine, 1977, p.26; Richet, 1884). F.Y. Edgeworth, partly in response to what he considered to be incorrect analyses of these experiments, offered one of the earliest treatises on the statistical evaluation of forced- choice experiments in two articles published in the Proceedings of the Society for Psychical Research (Edgeworth, 1885, 1886). Unfortunately, as noted by Mauskopf and McVaugh (1979) in their historical account of the period, Edgeworth's papers were "perhaps too difficult for their immediate audience" (p. 105). Edgeworth began his analysis by using Bayes Theorem to derive the formula for the Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 3 5 examines a meta-analysis of recent ganzfeld experiments designed to resolve the debate. Finally, Section 6 contains a brief account of meta-analyses that have been conducted in other areas of parapsychology, and conclusions are given in Section 7. 2. STATISTICS AND PARAPSYCHOLOGY Parapsychology had its beginnings in the investigation of purported mediums and other anecdotal claims in the late 19th century. The Society for Psychical Research was founded in Britain in 1882, and its American counterpart was founded in Boston in 1884. While these organizations and their members were primarily involved with investigating anecdotal material, a few of the early researchers were already conducting "forced-choice" experiments such as card-guessing. (Forced-choice experiments are like multiple choice tests; on each trial the subject must guess from a small, known set of possibilities.) Notable among these was Nobel Laureate Charles Richet, who is generally credited with being the first to recognize that probability theory could be applied to card-guessing experiments (Rhine, 1977, p.26; Richet, 1884). F.Y. Edgeworth, partly in response to what he considered to be incorrect analyses of these experiments, offered one of the earliest treatises on the statistical evaluation of forced- choice experiments in two articles published in the Proceedings of the Society for Psychical Research (Edgeworth, 1885, 1886). Unfortunately, as noted by Mauskopf and McVaugh (1979) in their historical account of the period, Edgeworth's papers were "perhaps too difficult for their immediate audience" (p. 105). Edgeworth began his analysis by using Bayes Theorem to derive the formula for the Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 4 posterior probability that chance was operating, given the data. He then continued with an argument "savouring more of Bernoulli than Bayes" in which "it is consonant, I submit, to experience, to put 1/2 both for a and 13, " i.e. for both the prior probability that chance alone was operating, and the prior probability that "there should have been some additional agency." He then reasoned (using a Taylor Series expansion of the posterior probability formula) that if there were a large probability of observing the data given that some additional agency was at work, and a small objective probability of the data under chance, then the latter (binomial) probability "may be taken as a rough measure of the sought a posteriori probability in favour of mere chance" (p. 195). Edgeworth concluded his article by applying his method to some data published previously in the same journal. He found the probability against chance to be .99996, which he said "may fairly be regarded as physical certainty" (p. 199). He concluded: "Such is the evidence which the calculus of probabilities affords as to the existence of an agency other than mere chance. The calculus is silent as to the nature of that agency -- whether it is more likely to be vulgar illusion or extraordinary law. That is a question to be decided, not by formulae and figures, but by general philosophy and common sense" (p. 199). Both the statistical arguments and the experimental controls in these early experiments were somewhat loose. For example, Edgeworth treated as binomial an experiment in which one person chose a string of eight letters and another attempted to guess the string. Since it has long been understood that people are poor random number (or letter) generators, there is no statistical basis for analyzing such an experiment. Nonetheless, Edgeworth and his contemporaries set the stage for the use of controlled experiments with statistical evaluation in laboratory parapsychology. One of the first American researchers to use statistical methods in parapsychology was Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 5 John Edgar Coover, who was the Thomas Welton Stanford Psychical Research Fellow, in the Psychology Department at Stanford University, from 1912 to 1937 (Dommeyer, 1975). In 1917 Coover published a large volume summarizing his work (Coover, 1917). Coover believed that his results were consistent with chance, but others have argued that Coover's definition of significance was too strict (Dommeyer, 1975). For example, in one evaluation of his telepathy experiments, Coover found a two-tailed p-value of .0062. He concluded "Since this value, then, lies within the field of chance deviation, although the probability of its occurrence by chance is fairly low, it cannot be accepted as a decisive indication of some cause beyond chance which operated in favor of success in guessing" (Coover, 1917, p. 82). On the next page he made it explicit that he would require a p-value of .0000221 to declare that something other than chance was operating. It was during the summer of 1930, with the card-guessing experiments of J.B. Rhine at Duke University, that parapsychology began to take hold as a laboratory science. In fact, Rhine's laboratory still exists under the name of the Foundation for Research on the Nature of Man, housed at the edge of the Duke University campus. It wasn't long after Rhine published his first book, Extrasensory Perception in 1934, that the attacks on his methodology began. Since his claims were wholly based on statistical analyses of his experiments, the statistioal methods were closely scrutinized by critics anxious to find a plausible explanation for Rhine's positive results. The most persistent critic was a psychologist from McGill University named Chester Kellogg (Mauskopf and McVaugh, 1979). Kellogg's main argument was that Rhine was using the binomial distribution (and normal approximation) on a series of trials that were not Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 6 independent. The experiments in question consisted of having a subject guess the order of a deck of 25 cards, with five each of five symbols, so technically Kellogg was correct. By 1937 several mathematicians and statisticians had come to Rhine's aid. Mauskopf and McVaugh (1979) speculated that since statistics was itself a young discipline, "a number of statisticians were equally outraged by Kellogg, whose arguments they saw as discrediting their profession" ( p. 258). The major technical work, which acknowledged that Kellogg's criticisms were accurate but did little to change the significance of the results, was conducted by Charles Stuart and Joseph A. Greenwood and published in the first volume of the Journal of Parapsychology (Stuart and Greenwood, 1937). Stuart, who had been an undergraduate in mathematics at Duke, was one of Rhine's early subjects, and continued to work with him as a researcher until Stuart's death in 1947. Greenwood was a Duke mathematician, who apparently converted to a statistician at the urging of Rhine. Another prominent figure who was distressed with Kellogg's attack was E. V. Huntington, a mathematician at Harvard. After corresponding with Rhine, Huntington decided that, rather than further confuse the public with a technical reply to Kellogg's arguments, a simple statement should be made to the effect that the mathematical issues in Rhine's work had been resolved. Huntington must have successfully convinced his former student, Burton Camp of Wesleyan, that this was a wise approach. Camp was the 1937 President of IMS. When the annual meetings were held in December of 1937 (jointly with AMS and AAAS), Camp released a statement to the press that read: "Dr. Rhine's investigations have two aspects: experimental and statistical. On the experimental side Mathematicians, of course, have nothing to say. On the statistical side, however, recent mathematical work has established the fact that, assuming that the experiments have been properly performed, the statistical Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 7 analysis is essentially valid. If the Rhine investigation is to be fairly attacked, it must be on other than mathematical grounds" (Camp, 1937). One statistician who did emerge as a critic was William Feller. In a talk at the Duke Mathematical Seminar on April 24, 1940, Feller raised three criticisms to Rhine's work (Feller, 1940). They had been raised before by others (and continue to be raised even today). The first was that inadequate shuffling of the cards resulted in additional information from one series to the next. The second was what is now known as the "file-drawer effect," namely, that if one combines the results of published studies only, there is sure to be a bias in favor of successful studies. The third was that the results were enhanced by the use of optional stopping, i.e. by not specifying the number of trials in advance. All three of these criticisms were addressed in a rejoinder by Greenwood and Stuart (1940), but Feller was never convinced. Even in its third edition published in 1968, his book An Introduction to Probability Theory and Its Applications still contains his conclusion about Greenwood and Stuart: "Both their arithmetic and their experiments have a distinct tinge of the supernatural" (Feller, 1968, P. 407). In his discussion of Feller's position, Diaconis (1978) remarks, "I believe Feller was confused.. .he seemed to have decided the opposition was wrong and that was that." Several statisticians have contributed to the literature in parapsychology to greater or lesser degrees. T.N.E. Greville devoted much of his professional life to developing statistical methods for parapsychology; Fisher (1924, 1929) addressed some specific problems in card- guessing experiments; Wilks (1965) described various statistical methods for parapsychology; Lindley (1957) presented a Bayesian analysis of some parapsychology data; and Diaconis (1978) pointed out some problem S with certain experiments and presented a method for analyzing experiments when feedback is given. Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 8 Occasionally, attacks on parapsychology have taken the form of attacks on statistical inference in general, at least as it is applied to real data. Spencer-Brown (1957) attempted to show that true randomness is impossible, at least in finite sequences, and that this could be the explanation for the results in parapsychology. That argument re-emerged in a recent debate on the role of randomness in parapsychology, initiated by psychologist J. Barnard Gilmore (Gilmore, 1989; Utts, 1989a; Palmer, 1989; Gilmore, 1990; Palmer, 1990). Gilmore stated that "The agnostic statistician, advising on research in psi, should take account of the possible inappropriateness of classical inferential statistics" (1989, p.338). In his second paper, Gilmore reviewed several non-psi studies showing purportedly random systems that do not behave as they should under randomness (e.g. Iversen, Longcor, Mosteller, Gilbert, and Youtz, 1971; and Spencer-Brown, 1957). Gilmore concluded that "Anomalous data ...should not be found nearly so often if classical statistics offers a valid model of reality" (1990, p. 54), thus rejecting the use of classical statistical inference for real-world applications in general. 3. REPLICATION Implicit and explicit in the literature on parapsychology is the assumption that in order to truly establish itself, the field needs to find a repeatable experiment. For example, Diaconis (1978) starts the summary of his article in Science with the words "In search of repeatable EH) experiments, modern investigators..." (p. 131). On October 28-29, 1983, the 32nd International Conference of the Parapsychology Foundation was held in San Antonio, Texas, to address "The Repeatability Problem in Parapsychology." The Conference Proceedings (Shapin and Coly, 1985) reflect the diverse views among parapsychologists on the nature of the problem. Honorton Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 9 (1985a) and Rao (1985), for example, both argued that strict replication is uncommon in most branches of science, and that parapsychology should not be singled out as unique in this regard. Other authors expressed disappointment in the lack of a single repeatable experiment in parapsychology, with titles such as "Unrepeatability: Parapsychology's Only Finding" (Blackmore, 1985), and "Research Strategies for Dealing with Unstable Phenomena" (Beloff, 1985). It has never been clear, however, just exactly what would constitute acceptable evidence of a repeatable experiment. In the early days of investigation, the major critics "insisted that it would be sufficient for Rhine and Soal to convince them of ESP if a parapsychologist could perform successfully a single 'fraud-proof experiment" (Hyman, 1985a, p. 71). However, as soon as well-designed experiments showing statistical significance emerged, the critics realized that a single experiment could be statistically significant just by chance. British psychologist C.E.M. Hansel quantified the new expectation, that the experiment should be repeated a few times, as follows: "If a result is significant at the .01 level and this result is not due to chance but to information reaching the subject, it may be expected that by making two further sets of trials the antichance odds of one hundred to one will be increased to around a million to one, thus enabling the effects of ESP -- or whatever is responsible for the original result -- to manifest itself to such an extent that there will be little doubt that the result is not due to chance" (Hansel, 1980, p.298). In other words, three consecutive experiments at p .01 would convince Hansel that something other than chance was at work. This argument implies that if a particular experiment produces a statistically significant result, but subsequent replications fail to attain significance, then the original result was probably due to chance, or at least remains unconvincing. The problem with this line of reasoning is that Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 10 there is no consideration given to sample size or power. Only an experiment with extremely high power should be expected to be "successful" three times in succession. It is perhaps a failure of the way statistics is taught that many scientists do not understand the importance of power in defining successful replication. To illustrate this point, psychologists Tversky and Kahnemann (1982) distributed a questionnaire to their colleagues at a professional meeting, with the question: "An investigator has reported a result that you consider implausible. He ran 15 subjects, and reported a significant value, t = 2.46. Another investigator has attempted to duplicate his procedure, and he obtained a nonsignificant value of t with the same number of subjects. The direction was the same in both sets of data. You are reviewing the literature. What is the highest value of t in the second set of data that you would describe as a failure to replicate?" (1982, p. 28). In reporting their results, Tversky and Kahnemann stated: "The majority of our respondents regarded t = 1.70 as a failure to replicate. If the data of two such studies (t = 2.46 and t = 1.70) are pooled, the value of t for the combined data is about 3.00 (assuming equal variances). Thus, we are faced with a paradoxical state of affairs, in which the same data that would increase our confidence in the finding when viewed as part of the original study, shake our confidence when viewed as an independent study" (1982, p. 28). At a recent presentation to the History and Philosophy of Science Seminar at the University of California at Davis, I asked the following question. Two scientists, Professors A and B, each have a theory they would like to demonstrate. Each plans to run a fixed number of Bernoulli trials and then test Ho: p = .25 versus H.: p > .25. Professor A has access to large numbers of students each semester to use as subjects. In his first experiment he runs 100 subjects, and there are 33 successes (p = .04, one-tailed). Knowing the importance of replication, Professor A nuis an additional 100 subjects as a second experiment. He finds 36 successes (p = .009, one-tailed). Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 11 Professor B only teaches small classes. Each quarter she runs an experiment on her students to test her theory. She carries out ten studies this way, with the following results: Number of successes one-tailed p-value 10 4 .22 15 6 .15 17 6 .23 25 8 .17 30 10 .20 40 13 .18 18 7 .14 10 5 .08 15 5 .31 20 7 .21 I asked the audience by a show of hands to indicate whether or not they felt the scientists had successfully demonstrated their theories. Professor A's theory received overwhelming support, with approximately 20 votes, while Professor B's theory received only one vote. If you aggregate the results of the experiments for each Professor, you will notice that each conducted 200 trials, and Professor B actually demonstrated a higher level of success than Professor A, with 71 as opposed to 69 successful trials. The one-tailed p-values for the combined trials are .0017 for Professor A and .0006 for Professor B. To address the question of replication more explicitly, I also posed the following scenario. In December of 1987 it was decided to prematurely terminate a study on the effects of aspirin in reducing heart attacks because the data were so convincing (See e.g. Greenhouse and Greenhouse, 1988; Rosenthal, 1990a). The physician-subjects had been randomly assigned to take aspirin or a placebo. There were 104 heart attacks among the 11,037 subjects in the aspirin group, and 189 heart attacks among the 11,034 subjects in the placebo group (chi-square Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 12 = 25.01, p< .00001). After showing the results of that study, I presented the audience with two hypothetical experiments conducted to try to replicate the original result, with outcomes as follows: REPLICATION #1 REPLICATION #2 Heart Attack Heart Attack Yes No Yes No Aspirin 11 1156 Aspirin 20 2314 Placebo 19 1090 Placebo 48 2170 Chi-square = 2.596, p=.11 Chi-square = 13.206, p =.0003 I asked the audience to indicate which one they thought was a more successful replication. The audience chose the second one, as would most journal editors, because of the "significant p-value". In fact, the first replication has almost exactly the same proportion of heart attacks in the two groups as the original study, and is thus a very close replication of that result. The second replication has very different proportions, and in fact the relative risk from the second study is not even contained in a 95% confidence interval for relative risk from the original study. The magnitude of the effect has been much more closely matched by the "non- significant" replication. Fortunately, psychologists are beginning to notice that replication is not as straightforward as they were originally led to believe. A special issue of the Journal of Social Behavior and Personality was entirely devoted to the question of replication (Neuliep, 1990). In one of the articles, Rosenthal cautioned his colleagues: "Given the levels of statistical power at which we normally operate, we have no right to expect the proportion of significant results Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 13 that we typically do expect, even if in nature there is a very real and very important effect" (Rosenthal, 1990b, p.16). Jacob Cohen, in his insightful article titled "Things I Have Learned (So Far)," identified another misconception common among social scientists: "Despite widespread misconceptions to the contrary, the rejection of a given null hypothesis gives us no basis for estimating the probability that a replication of the research will again result in rejecting that null hypothesis" (Cohen, 1990, p.1307). Cohen and Rosenthal both advocate the use of effect sizes as opposed to significance levels when defining the strength of an experimental effect. In general, effect sizes measure the amount by which the data deviate from the null hypothesis in terms of standardized units. For instance, the effect size for a two-sample t-test is usually defined to be the difference in the two means, divided by the standard deviation for the control group. This measure can be compared across studies without the dependence on sample size inherent in significance levels. (Of course there will still be variability in the sample effect sizes, decreasing as a function of sample size.) Comparison of effect sizes across studies is one of the major components of meta-analysis. Similar arguments have recently been made in the medical literature. For example, Gardner and Altman (1986) stated that the use of p-values "to define two alternative outcomes - significant and not significant.- is not helpful and encourages lazy thinking" (p. 746). They advocated the use of confidence intervals instead. As discussed in the next section, the arguments used to conclude that parapsychology has failed to demonstrate a replicable effect hinge on these misconceptions of replication and failure to examine power. A more appropriate analysis would compare the effect sizes for similar Approved For. Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 14 experiments across experimenters and across time to see if there have been consistent effects of the same magnitude. Rosenthal also advocates this view of replication: "The traditional view of replication focuses on significance level as the relevant summary statistic of a study and evaluates the success I of a replication in a dichotomous fashion. The newer, more useful view of replica.tion focuses on effect size as the more important summary statistic of a study and evaluates the success of a replication not in a dichotomous but in a continuous fashion" (Rosenthal, 1990b, p. 28). The dichotomous view of replication has been used throughout the history of parapsychology, by both parapsychologists and critics (Utts, 1988). For example, the National Academy of Sciences Report critically evaluated "significant" experiments, but entirely ignored "nonsignificant" experiments. In the next three sections we will examine some of the results in parapsychology using the broader, more appropriate definition of replication. In doing so, we will show that the results are far more interesting than the critics would have us believe. 4. THE GANZFELD DEBATE IN PARAPSYCHOLOGY An extensive debate took place in the mid-1980's between a parapsychologist and critic, questioning whether or not a particular body of parapsychological data had demonstrated psi abilities. The experiments in question were all conducted using the ganzfeld setting (described below). Several authors were invited to write commentaries on the debate. As a result, this data base has been more thoroughly analyzed by both critics and proponents than any other, and provides a good source for studying replication in parapsychology. The debate concluded with a detailed series of recommendations for further experiments, and left open the question of whether or not psi abilities had been demonstrated. A new series Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 15 of experiments that followed the recommendations were conducted over the next few years. The results of the new experiments will be presented in Section 5. 4.1 Free-response Experiments Recent experiments in parapsychology tend to use more complex target material than the cards and dice used in the early investigations, partially to alleviate boredom on the part of the subjects and partially because they are thought to "more nearly resemble the conditions of spontaneous psi occurrences" (Burdick and Kelly, 1977, p. 109). These experiments fall under the general heading of "free-response" experiments, because the subject is asked to give a verbal or written description of the target, rather than being forced to make a choice from a small discrete set of possibilities. Various types of target material have been used, including pictures, short segments of movies on video tapes, actual locations, and small objects. Despite the more complex target material, the statistical methods used to analyze these experiments are similar to those for forced-choice experiments. A typical experiment proceeds as follows. Before conducting any trials, a large pool of potential targets is assembled, usually in packets of four. Similarity of targets within a packet is kept to a minimum, for reasons made clear below. At the start of an experimental session, after the subject is sequestered in an isolated room, a target is selected at random from the pool. A sender is placed in another room with the target. The subject is asked to provide a verbal or written description of what he or she thinks is in the target, knowing only that it is a photograph, an object, etc. After the subject's description has been recorded and secured against the potential for later alteration, a judge (who may or may not be the subject) is given a copy of the subject's description and the four possible targets that were in the packet with the correct target. A Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 16 properly conducted experiment either uses video tapes or has two identical sets of target material and uses the duplicate set for this part of the process, to ensure that clues such as fingerprints don't give away the answer. Based on the subject's description, and of course on a blind basis, the judge is asked to either rank the four choices from most to least likely to have been the target, or to select the one from the four that seems to best match the subject's description. If ranks are used, the statistical analysis proceeds by summing the ranks over a series of trials and comparing the sum to what would be expected by chance. If the selection method is used, a "direct hit" occurs if the correct target is chosen, and the number of direct hits over a series of trials is compared to the number expected in a binomial experiment with p = .25. Note that the subjects' responses cannot be considered to be "random" in any sense, so probability assessments are based on the random selection of the target and decoys. In a correctly designed experiment, the probability of a direct hit by chance is .25 on each trial, regardless of the response, and the trials are independent. These and other issues related to analyzing free-response experiments are discussed by Utts (19891i). 4.2 The Psi Ganzfeld Experiments The ganzfeld procedure is a particular kind of free-response experiment utilizing a perceptual isolation technique originally developed by Gestalt psychologists for other purposes. Evidence from spontaneous case studies and experimental work had led parapsychologists to a model proposing that psychic functioning may be masked by sensory input and by inattention to internal states (Honorton, 1977). The ganzfeld procedure was specifically designed to test whether or not reduction of external "noise" would enhance psi performance. In these experiments, the subject is placed in a comfortable reclining chair in an Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 17 acoustically shielded room. To create a mild form of sensory deprivation, the subject wears headphones through which white noise is played, and stares into a constant field of red light. This is achieved by taping halved translucent ping-pang balls over the eyes and then illuminating the room with red light. In the psi ganzfeld experiment, the subject speaks into a microphone and attempts to describe the target material being observed by the sender in a distant room. At the 1982 Annual Meeting of the Parapsychological Association, a debate took place over the degree to which the results of the psi ganzfeld experiments constituted evidence of psi abilities. Psychologist and &Ale Ray Hyman and parapsychologist Charles Honorton each analyzed the results of all known psi ganzfeld experiments to date, and reached strikingly different conclusions. The debate continued with the publication of their arguments in separate articles in the March 1985 issue of the Journal of Parapsychology. Finally, in the December 1986 issue of the Journal of Parapsychology, Hyman and Honorton wrote a joint article in which they highlighted their agreements and disagreements, and outlined detailed criteria for future experiments. That same issue contained commentaries on the debate by ten other authors. The data base analyzed by Hyman and Honorton consisted of results taken from 34 reports written by a total of 47 authors. Honorton counted 42 separate experiments described in the reports, of which 28 reported enough information to determine the number of direct hits achieved. Twenty three of the studies (55%) were classified by Honorton as having achieved statistical significance at .05. 4.3 The Vote-Counting Debate Vote-counting is the term commonly used for the technique of drawing inferences about an experimental effect_ by counting the number of significant versus non-significant studies of Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 18 the effect. Hedges and 011dn (1985) give a detailed analysis of the inadequacy of this method, showing that it is more and more likely to make the wrong decision as the number of studies increases. While Hyman acknowledged that "vote-countirg raises many problems (Hyman, 1985b, p.8)," he nonetheless spent half of his critique of :he ganzfeld studies showing why Honorton's count of 55% was wrong. Hyman's first complaint was that several of the studies contained multiple conditions, each of which should be considered as a separate study. rsing this definition he counted 80 studies (thus further reducing the sample sizes of the individual studies), of which 25 (31%) were "successful." Honorton's response to this was to invite readers to examine the studies and decide for themselves if the varying conditions constituted separate experiments. Hyman next postulated that there was selection bias, so that significant studies were more likely to be reported. He raised some important issues about how pilot studies may be terminated and not reported if they don't show significant results, or may at least be subject to optional stopping, allowing the experimenter to determine the number of trials. He also presented a chi-square analysis that "suggests a tendency to report studies with a small sample only if they have significant results" (Hyman, 1985b, p.14). but I have questioned his analysis elsewhere (Utts, 1986, p. 397). Honorton refuted Hyman's argument with four rejoinders (Honorton, 1985b, p.66). In addition to reinterpreting Hyman's chi-square analysis, Honorton pointed out that the Parapsychological Association has an official policy encouraging the publication of non- significant results in its journals and proceedings, that a large number of reported ganzfeld studies did not achieve statistical significance, and that there would have to be 15 studies in the Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 19 "file-drawer" for every one reported to cancel out the observed significant results. The remainder of Hyman's vote-counting analysis consisted of showing that the effective error rate for each study was actually much higher than the nominal 5%. For example, each study could have been analyzed using the direct hit measure, the sum of ranks measure, or one of two other measures used for free-response analyses. Hyman carried out a simulation study that showed the true error rate would be .22 if "significance" was defined by requiring at least one of these four measures to achieve the .05 level. He suggested several other ways in which multiple testing could occur, and concluded that the effective error rate in each experiment was not the nominal .05, but rather was probably close to the 31% he had determined to be the actual success rate in his vote-count. Honorton acknowledged that there was a multiple testing problem, but he had a two-fold response. First, he applied a Bonferroni correction and found that the number of significant studies (using his definition of a study) only dropped from 55% to 45%. Next, he proposed that a uniform index of success be applied to all studies. He used the number of direct hits, since it was by far the most commonly reported measure and was the measure used in the first published psi ganzfeld study. He then conducted a detailed analysis of the 28 studies reporting direct hits and found that 43% were significant at .05 on that measure alone. Further, he showed that significant effects were reported by six of the 10 independent investigators, and thus were not due to just one or two investigators or laboratories. He also noted that success rates were very similar for reports published in refereed journals and those published in unrefereed monographs and abstracts. While Hyman's arguments identified issues such as selective reporting and optional Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 20 stopping that should be considered in any meta-analysis, the dependence of significance levels on sample size makes the vote-counting technique almost useless for assessing the magnitude of the effect. Consider for example the 24 studies where the direct hit measure was reported and the chance probability of a direct hit was .25, the most common type of study in the data base. (There were 4 direct hit studies with other chance probabilities and 14 that did not report direct hits.) Of the 24 studies, 13 (54%) were "nonsignificant" at a = .05, one-tailed. But if the 367 trials in these "failed replications" are combined, there are 106 direct hits, z = 1.66, and p ? .0485, one tailed. This is reminiscent of the dilemma of Professor B in Section 3. Power is typically very low for these studies. The median sample size for the studies reporting direct hits was 28. If there is a real effect and it increases the success probability from the chance .25 to an actual .33 (a value whose rationale will be rnade clear below), the power for a study with 28 trials is only .181 (Utts, 1986). It should be no surprise that there is a "repeatability" problem in parapsychology. 4.4 Flaw Analysis and Future Recommendations The second half of Hyman's paper consisted of a "Meta-Analysis of Flaws and Successful Outcomes" (1985b, p. 30), designed to explore whether or not various measures of success were related to specific flaws in the experiments. While many critics have argued that the results in parapsychology can be explained by experimental flaws, Hyman's analysis was the first to attempt to quantify the relationship between flaws and significant results. Hyman identified 12 potential flaws in the ganzfeld experiments, such as inadequate randomization, multiple teits used without adjusting the significance level (thus inflating the significance level from the nominal 5%), and failure to use a duplicate set of targets for the Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 21 judging process (thus allowing possible clues such as fingerprints). -Using cluster and factor analyses, the 12 binary flaw variables were combined into three new variables, which Hyman named General Security, Statistics and Controls. Several analyses were then conducted. The one reported with the most detail is a factor analysis utilizing 17 variables for each of 36 studies. Four factors eme:ged from the analysis. From these, Hyman concluded that security had increased over the years, that the significance level tended to be inflated the most for the most complex studies, and that both effect size and level of significance were correlated with the existence of flaws. Following his factor analysis, Hyman picked the three flaws that seemed to be most highly correlated with success, which were inadequate attention to both randomization and documentation, and the potential for ordinary communication between the sender and receiver. A regression equation was then computed using each of the three flaws as dummy variables, and the effect size for the experiment as the dependent variable. From this equation, Hyman concluded that a study without these three flaws would be predicted to have a hit rate of 27%. He concluded that this is "well within the statistical neighborhood of the 25% chance rate" (ibid, p. 37), and thus "the ganzfeld psi data base, despite initial impressions, is inadequate either to support the contention of a repeatable study or to demonstrate the reality of psi" (ibid p. 38). Honorton discounted both Hyman's flaw classification and his analysis. He did not deny that flaws existed, but objected that Hyman's analysis was faulty and impossible to interpret. Honorton asked psychometrician David Saunders to write an Appendix to his article, evaluating Hyman's analysis. Saunders first criticized Hyman's use of a factor analysis with 17 variables (many of which were dichotomous) and only 36 cases, and concluded that "the entire analysis Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 22 is meaningless" (Saunders, 1985, p.87). He then noted that Hyman's choice of the three flaws to include in his regression analysis constituted a clear case of multiple analysis, since there were 84 possible sets of three that could have been selected (out of nine potential flaws), and Hyman chose the set most highly correlated with effect size. Again, Saunders concluded that "any interpretation drawn from [the regression analysis] must be regarded as meaningless" (ibid, p. 88). Hyman's results were also contradicted by Harris and Rosenthal (1988b) in an analysis requested by Hyman in his capacity as Chair of the National Academy of Sciences' Subcommittee on Parapsychology. Using Hyman's flaw classifications and a multivariate analysis, Harris and Rosenthal concluded that "Our analysis of the effects of flaws on study outcome lends no support to the hypothesis that ganzfeld research results are a significant function of the set of flaw variables" (1988b, p. 3). Hyman and Honorton were in the process of preparing papers for a second round of debate when they were invited to lunch together at the 1986 Meeting of the Parapsychological Association. They discovered that they were in general agreement on several major issues, and decided to coauthor a "Joint Communique" (Hyman and Honorton, 1986). It is clear from their paper that they both thought it was more important to set the stage for future experimentation than to continue the technical-arguments over the current data base. In the abstract to their paper they wrote: "We agree that there is an overall significant effect in this data base that cannot reasonably be explained by selective reporting or multiple analysis. We continue to differ over the degree to which the effect constitutes evidence for psi, but we agree that the final verdict awaits the outcome of future experiments conducted by a broader range of investigators and according to more stringent standards" (Ibid, p. 351). Approved For Release 2003/04/18: CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 CIA-RDP96-00789R003100030001-4 23 The paper then outlined what these standards should be. They included controls against any kind of sensory leakage, thorough testing and documentation of randomization methods used, better reporting of judging and feedback protocols, control for multiple analyses, and advance specification of number of trials and type of experiment. Indeed, any area of research could benefit from such a careful list of procedural recommendations. 4.5 Rosenthal's Meta-Analysis The same issue of the Journal of Parapsychology in which the Joint Communique appeared also carried commentaries on the debate by 10 separate authors. In his commentary, psychologist Robert Rosenthal, one of the pioneers of meta-analysis in psychology, summarized the aspects of Hyman's and Honorton's work that would typically be included in a meta-analysis (Rosenthal, 1986). It is worth reviewing Rosenthal's results so that they can be used as a basis of comparison for the more recent psi ganzfeld sti..idies reported in Section 5. Rosenthal, like Hyman and Honorton, focused only on the 28 studies for which direct hits were known. He chose to use an effect size measure called Cohen's h, which is the difference between the arcsin transformed proportions of direct hits that were observed and expected: ? h =2 x(arcsinyiii -arcsinji) One advantage of this measure over the difference in raw proportions is that can be used to compare experiments with different chance hit rates. If the observed and expected numbers of hits were identical, the effect size would be zero. Of the 28 studies, 23 (82%) had effect sizes greater than zero, with a median effect size Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDp96-00789R003100030001-4 24 of .32 and a mean of .28. Thest, correspond to direct hit rates of .40 and .38 respectively, when .25 is expected by chance. A 95% confidence interval for the true effect size is from .11 to .45, corresponding to direct hit rates of from .30 to .46 when chance is .25. A common technique in meta-analysis is to calculate a "combined z," found by summing the individual z scores and dividing by the square root of the number of studies. The result should have a standard normal distribution if each z score has a standard normal distribution. For the ganzfeld studies, Rosenthal reported a combined z of 6.60 with ap-value of 3.37 x He also reiterated Honorton's file-drawer assessment by calculating that there would have to be 423 studies unreported to negate the significant effect in the 28 direct hit studies. Finally, Rosenthal acknowledged that because of the flaws in the data base and the potential for at least a small file drawer effect, the true average effect size was probably closer to .18 than .28. He concluded, "Thus, when the accuracy rate expected under the null is 1/4, we might estimate the obtained accuracy rate to be about 1/3" (Ibid, p. 333). This is the value used for the earlier power calculation. It is worth mentioning that Rosenthal was commissioned by the National Academy of Sciences to prepare a background paper to accompany its 1988 report on parapsychology. That paper (Harris and Rosenthal, 1988a) contained much of the same analysis as his commentary summarized above. Ironically, the discussion of the ganzfeld work in the National Academy Report focused on Hyman's 1985 analysis, but never mentioned the work it had commissioned Rosenthal to perform, which contradicted the final conclusion in the report. Approved For Release 2003/04/18 : CIA-RDp96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 25 5. A META-ANALYSIS OF RECENT GANZFELD EXPERIMENTS After the initial exchange with Hyman at the 1982 Parapsychological Association Meeting, Honorton and his colleagues developed an automated ganzfeld experiment, that was designed to eliminate the methodological flaws identified by Hyman. The execution and reporting of the experiments followed the detailed guidelines agreed upon by Hyman and Honorton. Using this "autoganzfeld" experiment, eleven experimental series were conducted by eight experimenters between February 1983 and September 1989, when the equipment had to be dismantled due to lack of funding. In this section the results of these experiments are summarized and compared to the earlier ganzfeld studies. Much of the information is derived from Honorton et al (1990). 5.1 The Automated Ganzfeld Procedure Like earlier ganzfeld studies, the "autoganzfeld" experiments require four participants. The first is the Receiver (R), who attempts to identify the target material being observed by the Sender (S). The Experimenter (E) prepares R for the task, elicits the response from R, and supervises R's judging of the response against the four potential targets. (Judging is double- blind; E does not know which is the correct target.) The fourth participant is the lab assistant (LA) whose only task is to instruct the computer to randomly select the target. No one involved in the experiment knows the identity of the target. Both R and S are 'sequestered in sound-isolated, electrically shielded rooms. R is prepared as in earlier ganzfeld studies, with white noise and a field of red light. In a Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18: CIA-RDP96-00789R003100030001-4 26 non-adjacent room, S watches the target material on a television and can hear R's target description ("mentation") as it is being given. The mentation is also tape-recorded. The judging process takes place immediately after the 30 minute sending period. On a TV monitor in the isolated room, R views the four choices from the target pack that contains the actual target. R is asked to rate each one according to how closely it matches the ganzfeld mentation. The ratings are converted to ranks, and if the correct target is ranked first, a direct hit is scored. The entire process is automatically recorded by the computer. The computer then displays the correct choice to R as feedback. There were 160 pre-selected targets, used with replacement, in ten of the eleven series. They were arranged in packets of 4, and the decoys for a given target were always the remaining three in the same set. Thus, even if a particular target in a set were consistently favored by R's, the probability of a direct hit under the null hypothesis would remain at 1/4. Popular targets should be no more likely to be selected by the computer's random number generator than any of the others in the set. The selection of the target by the computer is the only source of randomness in these experiments. This is an important point, and one that is often misunderstood. (See Utts, 1989b for elucidation.) Eighty of the targets were "dynamic," consisting of scenes from movies, documentaries and cartoons; and 80 were "static", consisting of photographs, art prints, and advertisements. The four targets within each set were all of the same type Rarlier studies indicated that dynamic targets were more likely to produce successful results, and one of the goals of the new experiments was to test that theory. The randomization procedure used to select the target and the order of presentation for Approved For Release .2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 27 judging was thoroughly tested before and during the experiments. A detailed description is given by Honorton et al (1990. p. 118-120). Three of the eleven series were pilot series, five were formal series with novice receivers, and three were formal series with experienced receivers. The last series with experienced receivers was the only one that did not use the 160 targets. Instead, it used only one set of four dynamic targets in which one target had previously received several first place ranks, and one had never received a first place rank. The receivers, none of whom had had prior exposure to that target pack, were not aware that only one target pack was being used. They each contributed one session only to the series. This will be called the "special series" in what follows. Except for two of the pilot series, numbers of trials were planned in advance for each series. Unfortunately, three of the formal series were not yet completed when the funding ran out, including the special series, and one pilot study with advance planning was terminated early when the experimenter relocated. There were no unreported trials during the six year period under review, so there was no "file-drawer". Overall, there were 183 R's who contributed only one trial and 58 who contributed more than one, for a total of 241 participants and 355 trials. Only twenty three R's had previously participated in ganzfeld experiments and 194 R's (81%) had never participated in any parapsychological research. 5,2 Results While acknowledging that no probabilistic conclusions can be drawn from qualitative data, Honorton et al (1990), included several examples of session excerpts that R's identified as Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 28 providing the basis for their target rating. To give a flavor for the dream-like quality of the mentation and the amount of information that can be lost by only assigning a rank, the first example is reproduced here. The target was a painting by Salvador Dali called "Christ Crucified." The correct target received a first place rank. The part of the mentation R used to make this assessment read: "... I think of guides, like spirit guides, leading me and I come into a court with a king. It's quiet.... It's like heaven. The king is something like Jesus. Woman. Now I'm just sort of summersaulting through heaven.... Brooding.... Aztecs, the Sun God.... High priest.... Fear.... Graves. Woman. Prayer.... Funeral.... Dark. Death.... Souls.... Ten Commandments. Moses...." (Ibid, p. 120). Over all eleven series there were 122 direct hits in the 355 trials, for a hit rate of 34.4% (exact binomial p-value = .00005) when 25% were expected by chance. Cohen's h is .20, and a 95% confidence interval for the overall hit rate is from .30 to .39. This calculation assumes, of course, that the probability of a direct hit is constant and independent across trials, an assumption that may be questionable except under the null hypothesis of no psi abilities. Honorton et al also calculated effect sizes for each of the eleven series and each of the eight experimenters. All but one of the series (the first novice series) had positive effect sizes, as did all of the experimenters. The special series with experienced R's had an exceptionally high effect size with h = .81, corresponding to 16 direct hits out of 25 trials (64%), but the remaining series and the experimenters had relatively homogeneous effect sizes given the amount of variability expected by chance. If the special series is removed, the overall hit rate is 32.1%, h = .16. Thus, the positive effects are not due to just one series or one experimenter. ? Seventy one of the 218 trials contributed by novices were direct hits (32.5%, h = .17), Approved For Release 2003/04/18: CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 29 compared with 51 hits in the 137 trials by those with prior ganzfeld experience (37%, h = .26). The hit rates and effect sizes were 31% (h = .14) for the combined pilot series, 32.5% (h = . 17) for the combined formal novice series, and 41.5% (h = .35) for the combined experienced series. The last figure drops to 31.6% if the outlier series is removed. Finally, without the outlier series the hit rate for the combined series where all of the planned trials were completed was 31.2% (h = .14) while it was 35% (h = .22) for the combined series that were terminated early. Thus, optional stopping cannot account for the positive effect. There were two interesting comparisons that had been suggested by earlier work and were preplanned in these experiments. The first was to compare results for trials with dynamic targets with those for static targets. In the 190 dynamic target sessions there were 77 direct hits (40%, h = .32) and for the static targets there were 45 hits in 165 trials (27%, h = .05), thus indicating that dynamic targets produced far more successful results. The second comparison of interest was whether or not the sender was a friend of the receiver. This was a choice the receiver could make. If he or she did not bring a friend, a lab member acted as sender. There were 211 trials with friends as senders (some of whom were also lab staff), resulting in 76 direct hits (36%, h = .24). Four trials used no sender. The remaining 140 trials used non-friend lab staff as senders and resulted in 46 direct hits (33 %, h = .18). Thus, trials with friends as senders were slightly more successful than those without. Consonant with the definition of replication based on consistent effect sizes, it is informative to compare the autoganzfeld experiments with the direct hit studies in the previous data base. The overall success rates are extremely similar. The overall direct hit rate was 34.4% for the autoganzfeld studies and was 38% for the comparable direct hit studies in the Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 30 earlier meta-analysis. Rosenthal's (1986) adjustment for flaws had placed a more conservative estimate at 33%, very close to the observed 34.4% in the new studies. One limitation of this work is that the autoganzfeld studies, while conducted by eight experimenters, all used the same equipment in the same laboratory. Unfortunately, the level of funding available in parapsychology and the cost in time and equipment to conduct proper experiments make it difficult to amass large amounts of data across laboratories. Another autoganzfeld laboratory is currently being constructed at the University of Edinburgh in Scotland, so interlaboratory comparisons may be possible in the near future. Based on the effect size observed to date, large samples are needed to achieve reasonable power. If there is a constant effect across all trials, resulting in 33% direct hits when 25% are expected by chance, to achieve a one tailed significance level of .05 with 95% probability would require 345 sessions. We end this section by returning to the aspirin and heart attack example in Section 3, and expanding a comparison noted by Atkinson et al (1990, p. 237). Computing the equivalent of Cohen's h for comparing observed heart attack rates in the aspirin and placebo groups results in h = .068. Thus, the effect size observed in the ganzfelcl data base is triple the much- publicized effect of aspirin on heart attacks. 6. OTHER META-ANALYSES IN PARAPSYCHOLOGY Four additional meta-analyses have been conducted in various areas of parapsychology since the original ganzfeld meta-analyses were reported. Three of the four analyses focused on evidence of psi abilities, while the fourth examined the relationship between extraversion and Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18: CIA-RDP96-00789R003100030001-4 31 psychic functioning. In this section, each of the four analyses will be briefly summarized. There are only a handful of English-language journals and proceedings in parapsychology, so retrieval of the relevant studies in each of the four cases was simple to accomplish by searching those sources in detail and by searching other bibliographic data bases for keywords. Each analysis included an overall summary, an analysis of the quality of the studies versus the size of the effect, and a "file-drawer" analysis to determine the possible number of unreported studies. Three of the four also contained comparisons across various conditions. 6.1 Forced-choice Precognition Experiments Honorton and Ferrari (1989) analyzed forced-choice experiments conducted from 1935 to 1987, in which the target material was randomly selected after the subject had attempted to predict.what it would be. The time delay in selecting the target ranged from under a second to one year. Target material included items as diverse as ESP cards and automated random number generators. Two investigators, S.G. Soal and Walter J. Levy, were not included because some of their work has been suspected to be fraudulent. Overall Results. There were 309 studies reported by 62 senior authors, including more than 50,000 subjects and nearly two million individual trials. Honorton and Ferrari used z /Vn as the measure of effect size (ES) for each study, where n was the number of Bernoulli trials in the study. They reported a mean ES of 0.020, and a mean z-score of 0.65 over all studies. They also reported a combined z of 11.41, p = 6.3 x 10-25. Thirty percent (92) of the studies were statistically significara at cy = .05. The mean ES per investigator was 0.033, and the significant results were not due to just a few investigators. Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 32 Quality. Eight dichotomous quality measures were assigned to each study, resulting in possible scores from zero for the lowest quality, to eight for the highest. They included features such as adequate randomization, preplanned analysis, and automated recording of the results. The correlation between study quality and effect size was 0.081, indicating a slight tendency for higher quality studies to be more successful, contrary to claims by critics that the opposite would be true. There was a clear relationship between quality and year of publication, presumably because over the years experimenters in parapsychology have responded to suggestions from critics for improving their methodology. File-drawer. Following Rosenthal (1984), the authors calculated the "fail-safe .N" indicating the number of unreported studies that would have to be sitting in file-drawers in order to negate the significant effect. They found N = 14,268, or a ratio of 46 unreported studies for each one reported. They also followed a suggestion by Dawes et al (1984) and computed the mean z for all studies with z > 1.65. If such studies were a random sample from the upper 5% tail of a N(0,1) distribution, the mean z would be 2.06. In this case it was 3.61. They concluded that selective reporting could not explain these results. Comparisons. Four variables were identified that appeared to have a systematic relationship to study outcome. The first was that the 25 studies using subjects selected on the basis of good past performance were more successful than the 223 using unselected subjects, with mean effect sizes of .051 and .008, respectively. Second, the 97 studies testing subjects individually were more successful than the 105 studies that used group testing; mean effect sizes were .021 and .004, respectively. Timing of feedback was the third moderating variable, but information was only available for 104 studies. The 15 studies that never told the subjects what Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 33 the targets were had a mean effect size of -.001. Feedback after each trial produced the best results, the mean ES for the 47 studies was .035. Feedback after each set of trials resulted in mean ES of .023 (21 studies), while delayed feedback (also 21 studies) yielded a mean ES of only .009. There is a clear ordering, as the gap between time of feedback and time of the actual guesses decreased, effect sizes increased. The fourth variable was the time interval between the subject's guess and the actual target selection, available for 144 studies. The best results were for the 31 studies that generated targets less than a second after the guess (mean ES = .045), while the worst were for the 7 studies that delayed target selection by at least a month (mean ES = .001). The mean effect sizes showed a clear trend, decreasing in order as the time interval increased from minutes to hours to days to weeks to months. 6.2, Attempts to Influence Random Physical Systems Radin and Nelson (1989) examined studies designed to test the hypothesis that The statistical output of an electronic RNG [random number generator] is correlated with observer intention in accordance with prespecified instructions" (p. 1502). These experiments typically involve RNGs based on radioactive decay, electronic noise, or pseudorandom number sequences seeded with true random sources. Usually the subject is instructed to try to influence the results of a string of binary trials by mental intention alone. A typical protocol would ask a subject to press a button (thus starting the collection of a fixed-length sequence of bits), and then try to influence the random source to produce more zeroes or more ones. A run might consist of three successive button presses, One each in which the desired result was more zeroes or more ones, and one as a control with no conscious intention: A z score would then be computed for each Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 34 button press. The 832 studies in the analysis were conducted from 1959 to 1987, and included 235 "control" studies in which the output of the RNGs were recorded but there was no conscious intention involved. These were usually conducted before and during the experimental series, as tests of the RNGs. Results. The effect size measure used was again z /Vn, where z was positive if more bits of the specified type were achieved. The mean effect size for control studies was not significantly different from zero (-1.0 x 10-5). The mean effect size for the experimental studies was also very small, 3.2 x 104, but it was significantly higher than the mean ES for the control studies (z = 4.1). Quality. Sixteen quality measures were defined and assigned to each study, under the four general categories of procedures, statistics, data, and the RNG device. A score of 16 reflected the highest quality. The authors regressed mean effect size on mean quality for each investigator, and found a slope of 2.5 x 10-5 with standard error of 3.2 x 10, indicating little relationship between quality and outcome. They also calculated a weighted mean effect size, using quality scores as weights, and found that it was very similar to the unweighted mean ES. They concluded that "differences in methodological quality are not significant predictors of effect size" (p. 1507). File-drawer. Radin and Nelson used several methods for estimating the number of unreported studies (p. 1508-10). Their estimates ranged from 200 to 1000 based on models assuming that all significant studies were reported. They also calculated the fail-safe N to be 54,000. Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 35 6.3 Attempts to Influence Dice Radin and Ferrari (1991) examined 148 studies, published from 1935 to 1987, designed to test whether or not consciousness can influence the results of tossing dice. The also found 31 "control" studies in which no conscious intention was involved. ? Results. The effect size measure used was z /Vn, where z was based on the number of throws in which the die landed with the desired face (or faces) up, in n throws. The weighted mean ES for the experimental studies was 0.0122 with a standard error of 0.00062; for the control studies the mean and standard error were 0.00093 and 0.00255, respectively. Weights for each study were determined by quality, giving more weight to high quality studies. Combined z scores for the experimental and control studies were reported by Radin and Ferrari to be 18.2 and 0.18, respectively. Quality. Eleven dichotomous quality measures were assigned, ranging from automated recording to whether or not control studies were interspersed with the experimental studies. The final quality score for each study combined these with information on method of tossing the dice, and with source of subject (defined below). A regression of quality score versus effect size resulted in a slope of -.002, with a standard error of .0011. However, when effect sizes were weighted by sample size there was a significant relationship between qiinlity and effect size, leading Radin and Ferrari to conclude that higher quality studies produced lower weighted effect sizes. File-drawer. Radin and Ferrari calculated Rosenthal's fail-safe N for this analysis to be 17,974. Using the assumption that all significant studies were reported, they estimated the number of unreported studies to be 1,152. As a final assessment, they compared studies Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 36 published before and after 1975, when the Journal of Parapsychology adopted an official policy of publishing nonsignificant results. They concluded, based on that analysis, that more nonsignificant studies were published after 1975, and thus "We must consider the overall (1935- 1987) data base as suspect with respect to the filedrawer problem." Comparisons. Radin and Ferrari noted that there was bias in both the experimental and control studies across die face. Six was the face most likely to come up, consistent with the observation that it has the least mass. Therefore, they examined results for the subset of 69 studies in which targets were evenly balanced among the six faces. They still found a significant effect, with mean and standard error for effect size of 8.6 x 10-3 and 1.1 x 10-3, respectively. The combined z was 7.617 for these studies. They also compared effect sizes across types of subjects used in the studies, categorizing them as unselected, experimenter and other subjects, experimenter as sole subject, and specially selected subjects. Like Honorton and Ferrari (1989), they found the highest mean ES for studies with selected subjects; it was approximately .02, more than twice that for unselected subjects. 6.4 Extraversion and ESP Performance Honorton, Ferrari and Bern (1990) conducted a meta-analysis to examine the relationship between scores on tests of extraversion and scores on psi-related tasks. They found 60 studies by 17 investigators, conducted from 1945 to 1983. Results. The effect size measure used for this analysis was the correlation between each subject's extraversion score and ESP score. A variety of measures had been used for both scores across studies, so various correlation coefficients were used. Nonetheless, a stem and leaf diagram of the correlations showed an approximate bell shape with mean and standard Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 37 deviation of .19 and .26, respectively, and with an additional outlier at r = .91. Honorton et al reported that when weighted by degrees of freedom, the weighted mean r was .14, with a 95% confidence interval covering .10 to .19. Forced-choice versus Free-response Results. Because forced-choice and free-response tests differ qualitatively, Honorton et al chose to examine their relationship to extraversion separately. They found that for free-response studies there was a significant correlation between extraversion and ESP scores, with mean r = .20 and z = 4.46. Further, this effect was ? homogeneous across both investigators and extraversion scales. For forced-choice studies, there was a significant correlation between ESP and extraversion, but only for those studies that reported the ESP results to the subjects before measuring extraversion. Honorton et al speculated that the relationship was an artifact, in which extraversion scores were temporarily inflated as a result of positive feedback on ESP performance. Confirmation with New Data. Following the extraversion/ESP meta-analysis, Honorton et al attempted to confirm the relationship using the autoganzfeld data base. Extraversion scores based on the Myers-Briggs Type Indicator were available for 221 of the 241 subjects who had participated in autoganzfeld studies. The correlation between extraversion scores and ganzfeld rating scores was r = .18, with a 95% confidence interval from .05 to .30. This is consistent with the mean correlation of r = .20 for free-response experiments, determined from the meta-analysis. These correlations indicate that extraverted subjects can produce higher scores in free-response ESP tests. Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 38 7. CONCLUSIONS Parapsychologists often make a distinction between "proof-oriented research" and "process-oriented research." The former is typically conducted to test the hypothesis that psi abilities exist, while the latter is designed to answer questions about how psychic functioning works. Proof-oriented research has dominated the literature in parapsychology. Unfortunately, many of the studies used small samples and would thus be nonsignificant even if a moderate- sized effect exists. The recent focus on meta-analysis in parapsychology has revealed that there are small but consistently nonzero effects across studies, experimenters, and laboratories. The size of the effects in forced-choice studies appear to be comparable to those reported in some medical studies that had been heralded as breakthroughs. (See Section 5, and Honorton and Ferrari, 1989, p. 301.) Free-response studies show effect sizes of far greater magnitude. A promising direction for future process-oriented research is to examine the causes of individual differences in psychic functioning. The ESP/extraversion meta-analysis is a step in that direction. In keeping with the idea of individual differences, Bayes and empirical Bayes methods would appear to make more sense than the classical inference methods commonly used, since they would allow individual abilities and beliefs to be modelled. Jeffreys (1990) reported a Bayesian analysis of some of the RNG experiments, and showed that conclusions were closely tied to prior beliefs even though hundreds of thousands of trials were available. It may be that the nonzero effects observed in the meta-analyses can be explained by something other than ESP, such as shortcomings in our understanding of randomness and Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 39 independence. Nonetheless, there is an anomaly that needs an explanation. As I have argued elsewhere (Utts, 1987) research in parapsychology should receive more support from the scientific community. If ESP does not exist, there is little to be lost by erring in the direction of further research; which may in fact uncover other anomalies. If ESP does exist there is much to be lost by not doing process-oriented research, and much to be gained by discovering how to enhance and apply these abilities to important world problems. Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 40 REFERENCES Atkinson, R.L., Atkinson, R.C., Smith, E.E. and Bern, D.J. (1990). Introduction to Psychology, 10th Ed. Harcourt Brace Jovanovich, San Diego. Beloff, J. (1985). Research strategies for dealing with unstable phenomena. In The Repeatability Problem in Parapsychology (B. Shapin and L. Coly. eds.) 1-21. Parapsychology Foundation, New York. Blackmore, S.J. (1985). Unrepeatability: Parapsychology's only finding. In The Repeatability Problem in Parapsychology (B. Shapin and L. Coly. eds.) 183-206. Parapsychology Foundation, New York. Burdick, D.S. and Kelly, E.F. (1977). Statistical methods in parapsychological research. In Handbook of Parapsychology (B.B. Wolman, ed.) 81-130. Van Nostrand Reinhold, New York. Camp, B.H. (1937). (Statement in Notes Section.) Journal of Parapsychology 1 305. Cohen, J. (1990). Things I have learned (so far). American Psychologist 45 1304-1312. Coover, I.E. (1917). Experiments in Psychical Research at Leland Stanford Junior University. Stanford University, Stanford, CA. Dawes, R.M. , Landman, I. and Williams, J. (1984). Reply to Kurosawa. American Psychologist 39 74-75. Diaconis, P. (1978). Statistical problems in ESP research. Science 201 131-136. Dommeyer, F.C. (1975). Psychical Research at Stanford University. Journal of Parapsychology 39 173-205. Druckman, D. and Swets, J.A., Eds. (1988). Enhancing Human Performance: Issues, Theories, and Techniques. National Academy Press, Washington, DC. Edgeworth, F.Y. (1885). The calculus of probabilities applied to psychical research. Proceedings of the Society for Psychical Research 3 190-199. Edgeworth, F.Y. (1886). The calculus of probabilities applied to psychical research II. Proceedings of the Society for Psychical Research 4 189-208. Feller, W.K. (1968). An Iniroduction to Probability Theory and Its Applications, Volume 1, 3rd Ed. Wiley, New York. Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 41 Feller, W.K. (1940). Statistical aspects of ESP. Journal of Parapsychology 4 271-297. Fisher, R.A. (1924). A method of scoring coincidences in tests with playing cards. Proceedings of the Society for Psychical Research 34 181-185. Fisher, R.A. (1929). The statistical method in psychical research. Proceedings of the Society for Psychical Research 39 189-192. Gallup, G.H. Jr. and Newport, F. (1991). Belief in paranormal phenomena among adult Americans. Skeptical Inquirer 15 137-146. Gardner, M.J. and Altman, D.G. (1986). Confidence intervals rather than p-values: estimation rather than hypothesis testing. British Medical Journal 292 746-750. Gilmore, J.B. (1989). Randomness and the search for psi. Journal of Parapsychology 53 309- 340. Gilmore, J.B. (1990). Anomalous significance in pararandom and psi-free domains. Journal of Parapsychology 54 53-58. Greeley, A. (1987). Mysticism goes mainstream. American Health 7 47-49. Greenhouse, J.B. and Greenhouse, S.W. (1988). An aspirin a day...? Chance 1 24-31. Greenwood, J.A. and Stuart, C.E. (1940). A review of Dr. Feller's critique. Journal of Parapsychology 4 299-319. Hansel, C.E.M. (1980). ESP and Parapsychology: A Critical Re-evaluation. Prometheus Books, Buffalo. Harris, M.J. and Rosenthal, R. (1988a). Interpersonal Expectancy Effects and Human Peiformance Research. National Academy Press, Washington DC. Harris, M.J. and Rosenthal, R. (1988b). Postscript to Interpersonal Expectancy Effects and Human Performance Research. National Academy Press, Washington DC. Hedges, L.V. and 011dn, I. (1985). Statistical Methods for Meta-Analysis . Academic Press, Inc., Orlando, FL. Honorton, C. (1977). Psi and internal attention states. In Handbook of Parapsychology (B.B. Wolman, ed.) 435-472. Van Nostrand Reinhold, New York. Honorton, C. (1985a). How to evaluate and improve the replicability of parapsychological effects. In The Repeatability Problem in Parapsychology (B. Shapin and L. Coly. eds.) Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 42 238-255. Parapsychology Foundation, New York. Honorton, C. (1985b). Meta-analysis of psi ganzfeld research: A response to Hyman. Journal of Parapsychology 49 51-91. Honorton, C., Berger, R.E., Vargoglis, M.P., Quant, M., Derr, P., Schechter, E.I. and Ferrari, D.C. (1990). Psi communication in the ganzfeld: experiments with an automated testing system and a comparison with a meta-analysis of earlier studies. Journal of Parapsychology 54 99-139. Honorton, C. and Ferrari, D.C. (1989). "Future telling": A meta-analysis of forced-choice precognition experiments, 1935-1987. Journal of Parapsychology 53 281-308. Honorton, C., Ferrari, D.C., and Bern, D.J. (1990). Extraversion and ESP Performance: A meta-analysis and a new confirmation. Proceedings of the Annual Meeting of the Parapsychological Association. Hyman, R. (1985a). A critical overview of parapsychology. In A Skeptic's Handbook of Parapsychology (P. Kurtz, ed.) 1-96. Prometheus Books, Buffalo. Hyman, R. (1985b). The ganzfeld psi experiment: A critical appraisal. Journal of Parapsychology 49 3-49. Hyman, R. and Honorton, C. (1986). Joint communique: The psi ganzfeld controversy. Journal of Parapsychology 50 351-364. Iversen, G.R, Longcor, W.H., Mosteller, F., Gilbert, J.P., and Youtz, C. (1971). Bias and runs in dice throwing and recording: A few million throws. Psychomenika 36 1-19. Jeffreys, W.H. (1990). Bayesian analysis of random event generator data. Journal of Scientific Exploration 4 153-169. Lindley, D.V. (1957). A statistical paradox. Biometrika 44 187-192. Mauskopf, S.H. and McVaugh, M. (1979). The Elusive Science: Origins of Experimental Psychical Research. The Johns Hopkins University Press, Baltimore. McVaugh, M.R. and Mauskopf, S.H. (1976). J.B. Rhine's Extrasensory Perception and its background in psychical research. Isis 67 161-189. Neuliep, J.W. (Ed.) (199Q). Handbook of replication research in the behavioral and social sciences. Journal of Social Behavior and Personality (Special Issue) 5(4). Office of Technology Assessment (1989). Report of a workshop on experimental Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 43 parapsychology. Journal of the American Society for Psychical Research 83 317-339. Palmer, J. (1989). A reply to Gilmore. Journal of Parapsychology 53 341-344. Palmer, J. (1990). Reply to Gilmore: Round two. Journal of Parapsychology 54 59-61. Palmer, LA., Honorton, C. and Utts, J. (1989). Reply to the National Research Council study on parapsychology. Journal of the American Society for Psychical Research 83 31-49. Radin, D.I. and Ferrari, D.C. (1991). Effects of consciousness on the fall of dice: A meta- analysis. Journal of Scientific Exploration 5 (to appear). Radin, D.I. and Nelson, R.D. (1989). Evidence for consciousness-related anomalies in random physical systems. Foundations of Physics 19 1499-1514. Rao, K.R. (1985). Replication in conventional and controversial sciences. In The Repeatability Problem in Parapsychology (B. Shapin and L. Coly. eds.) 22-41. Parapsychology Foundation, New York. Rhine, J.B. (1934). Extrasensory Perception. Boston Society for Psychical Research, Boston. (Reprinted by Branden Press in 1964). Rhine, J.B. (1977). History of experimental studies. In Handbook of Parapsychology (B.B. Wolman, ed.) 25-47. Van Nostrand Reinhold, New York. Richet, C. (1884). La suggestion mentale et le calcul des probabilites. Revue Philosophique 18 608-674. Rosenthal, R. (1984). Meta-Analytic Procedures for Social Research. Sage, Beverly Hills. Rosenthal, R. (1986). Meta-analytic procedures and the nature of replication: The ganzfeld debate. Journal of Parapsychology 50 315-336. Rosenthal, R. (1990a). How are we doing in soft psychology? American Psychologist 45 775- 777. Rosenthal, R. (1990b). Replication in behavioral research. Journal of Social Behavior and Personality. 5 1-30. Saunders, D.R. (1985). On Hyman's factor analysis. Journal of Parapsychology 49 86-88. Shapin, B. and Coly, L. (Eds.) (1985). The Repeatability Problem in Parapsychology. Parapsychology Foundation, New York. Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 44 Spencer-Brown, G. (1957). Probability and Scientific Inference. Longmans Green, London and New York. Stuart, C.E. and Greenwood, J.A. (1937). A review of criticisms of the mathematical evaluation of ESP data. Journal of Parapsychology 1 295-304. TVersky, A. and Kahneman, D. (1982). Belief in the law of small numbers. In D. Kahneman, P. Slovic and A. Tversky (als), Judgment under uncertainly: Heuristics and biases. Cambridge University Press, Cambridge. Utts, J. (1986). The ganzfeld debate: A statistician's perspective. Journal of Parapsychology 50 395-402. Utts, J. (1987). Psi, statistics, and society. Behavioral and Brain Sciences 10 615-616. Utts, J. (1988). Successful replication versus statistical significance. Journal of Parapsychology 52 305-320. Utts, J. (1989a). Randomness and randomization tests: A reply to Gilmore. Journal of Parapsychology 53 345-351. Utts, J. (1989b). Analyzing free-response data - a progress report. To appear in Psi Research Methodology: A Re-examination (L. Coly, ed.). Parapsychology Foundation, New York. Wilks, S.S. (1965). N.Y. Statistician 16 (nos. 6 and 7). Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 III MAIN-STREAM PUBLICATIONS One measure of the acceptance of anomalous mental phenomena as a valid area for investigation is the degree to which research papers appear in the main-stream scientific literature. The reports in this section have been selected because they are a representative sample of such papers. The number that appears in the upper right?hand corner of the first page for each publication is keyed to the following descriptions: 9. Targ, R. and Puthoff, H. E., "Information Tiansmission Under Conditions of Sensory Shielding," Nature, Vol. 252, pp. 602-607, (October, 1974). 'Parg and Puthoff describe a series of experiments with selected individuals, including Mr. Un Geller, and introduce an anomalous cognition technique called remote viewing. The paper also includes a pilot experiment to investigate the effects of anomalous cognition on the alpha rhythms in the brain. 10. Puthoff, H. E. and Targ, R., "A Perceptual Channel for Information Transfer over Kilometer Distances: Historical Perspective and Recent Research," Proceedings of the IEEE, Vol. 64, No. 3, pp. 329-354, (March, 1976). Puthoff and Targ provide a historical review of the pertinent literature and describe over 50 remote viewing (i.e., anomalous cognition) trials. The paper also includes representative examples of remote viewing. 11. Jahn, R. G., "The Persistent Paradox of Psychic Phenomena: An Engineering Perspective," Invited Paper, Proceedings of the IEEE, Vol. 70, No. 2, pp. 136-170, (February, 1982). Jahn describes a replication of remote viewing and extends the distance to over 10,000 kilometers. In addition to an independent overview of parapsychology, Jahn also includes descriptions of a number of anomalous perturbation experiments. 12. Child, I. L., "Psychology and Anomalous Observations: The Question of ESP in Dreams," American Psychologist, Vol. 40, No. 11, pp. 1219-1230, (November, 1985). Professor Child, the then Chairman of the Psychology Department at Yale University, provides a critical review of the anomalous cognition dream studies conducted at Maimonides Medical Center in the early 1970's. Professor Child warns the general psychological research community not to dismiss the body of research and suggests that it should be of wide interest to them. 13. Atkinson, R. L, Atkinson, R. C., Smith, E E., and Bern, D. J., Introduction to Psychology, 10th Edition, pp. 234-243, Harcourt Brace Jovanovich, New York, (1990). Professor Bern included anomalous cognition in a chapter on consciousness and its altered states in a widely-used introductory text in psychology. Bern provides definitions of terms, a review of the experimental evidence for anomalous cognition, an analysis of the debate over the evidence, and a review of the anecdotal evidence. 14. Walker, E. H., May, E. C., Spottiswoode, S. J. P., and Piantanida, T:, "Testing Schrodinger's Paradox with a Michelson Interferometer," Physics B, Vol. 151, pp. 339-348, (1988). While not directly related to anomalous mental phenomena, this paper describes an experimental test to determine if consciousness is a necessary ingredient for determining physical reality. The authors conclude that is it not, and thus, this result has implications for anomalous perturbation research. 15. Hyman, R., "Parapsychological Research: A Tutorial Review and Critical Appraisal," Invited Paper, Proceedings of the IEEE, Vol. 74, No. 6, pp. 823-849, (June, 1986). Dr. Hyman is a Professor of Psychology at the University of Oregon in Eugene and has been a long-time critic of and commentator on the field of parapsychology. Hyman reviews the historical experiments and provides a critical analysis of the current research. Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 ourc Vo(. 1)1 October 18 19/4 pproved For Release N. 251 No. 5476 October 18, 1974 Published weekly by Macmillan Journals Ltd 4 Little telephone: Telegrams: Ill London Essex Street, WC2R 3LF (01) 836 6633 Telex: 262024 Phusis London WC2R 3LF Washington National Press Building, DC 20045 Telephone: (202) 737 2355 Telex: 64280 . Editor David Davies Deputy Editor Roger Woodham Editorial staff Gillian Boucher Colin Norman* John Gribbin Sally Owen John Hall Allan Piper Eleanor Lawrence Miranda Robertson ?Mary Lindley Fiona Selkirk Peter Milford Robert Vickers Peter Newmark Mary Wade' John Wilson ?Washington office Publishing Director Jenny Hughes Display advertisement enquiries to: London Office Classified adyertisement enquiries to: T. G. Scott and Son Ltd, 1 Clement's Inn, London WC2A 2ED Telephone: (01) 242 6264 and (01) 405 4743 Telegrams: Textualist London WC2A 2ED Subscription enquiries to: Macmillan Journals Ltd, Brunel Road, Basingstoke, Hams, R021 2XS Telephone: Basingstoke 29242 Publication address in the United States The Wm Byrd Press Inc., 2901 Byrdhill Road, , Richmond, Virginia 23228 Second Class Postage for the USA paid at Richmond, Virginia US Postmaster, please send form 3579' to Nature, 711 National Press Building, Washington DC 20045 ? . Price ?22 per year?excepting USA - and Canada (?28 per year) Registered as a newspaper at the . British Post Office ' Copyright 0 Macmillan Journals Ltd, October 18, 1974 Corer Picture A hundred years ago Nature was reviewing E. J. Marey's Animal Mechanism (page 518, October 29, 18741, These cumbersome mechanisms were soon to be replaced by Muy bridge's zoopraxiscope camera. On page .567 we looks at a Muybridge sequence and?a century later?what happens when the-light is,switc.bed on. Volume 252 October 18, 1974 Investigating the paranormal 559 For those in peril on the factory floor 560 INTERNATIONAL NEWS 562 NEWS AND VIEWS 569 ARTICLES Human reproduction and family planning: research strategies in developing countries? A. Kessler and C. C. Standley 577 Compositional variation in recent Icelandic tholeiites and the Kverkfjoll hot spot? G. E. Sigvaldason, S..Steinthorsson, N. Oskarsson and?, fins/and ? 579 Climatic significance of deuterium abundance in growth rings of Picea?W, E. Shiegl 582 Properties of hybrids between Salmonella phage P22 and coliphage 7.? D. Botstein and I. Herskowitz ? 584 LETTERS TO NATURE?Physical Sciences Distance to Cygnus X-I?C.-C. Cheng, K. J. H. Phillips and A. M. Wilson ? 589 High energy radiation from white holes?J. Y. Narlikar, K. M. V. Appa Rao and N. Dadhich 590 Spectrum of the cosmic background radiation between 3 mm and 800 pm? E. 1. Robson,.b. G. Vickers, J. S. Hui:inga, J. E. Beckman and P. E. Clegg A new solar?terrestrial relationship?G. M. Brown Rainfall, drought and the solar cycle?C. A. Wood and R. R. Lovett Dynamic implications of mantle hotspots?M. A. Khan A-type doubling in the CH molecule?R. E. Hammersley and W. G. Richards Sc 5t. Drag-reducing polymers and liquid-column oscillations?W. D. McComb lif noise with a low frequency white noise limit?K. L. Schick and A. A. Verveen Second Law of Thermodynamics?D. R. Wilkie 60 Information transmission under conditions of sensory shielding?R. Tar: and H. Puthoff LETTERS TO NATURE?Biological Sciences The stability of a feasible random ecosystem?A. Roberts 607 Objective evaluation of auditory evoked EEG responses?B. McA. Sayers and H. A. Beagley Imprinting and exploration of slight novelty in chicks?P. S. Jackson and P. P. G. Bateson 608 609 Microbial activation of prophenoloxidase from immune insect larvae?A. E. Pye 610 Elevation of total serum IgE in rats following helminth parasite infection? E. Jarrett and H. Bazin . 612 Alternative route for nitrogen assimilation in higher plants?P. J. Len and B. J. Miflin 614 Evolution of cell senescence,?atherosclerosis and benign tumours?D. Dykhuizen 616 Insulin stimulates myogenesis in a rat myoblast line?J.-L. Mandel and M. L. Pearson 618 Sickle cell resistance to In vivo hypoxia-0. Castro, S. C. Finch and G. Osbaldistone 62.0 Expression of the dystrophia muscularis (dy) recessive gene in mice?R. Parsons 621 Growth of human muscle spindles in vitro?B. J. Elliott and D. G. F. Harriman 622 Multiple control mechanisms underlie initiation of growth in animal cells? L.J. de Asua and E. Rozengurt 624 Control of cell division in yeast using the ionophore, A23187 with calcium and magnesium?J. N. Duffus and L. I. Patterson 626 ' Antigen of mouse bile capillaries and cuticle of intestinal mucosa? N. I. Khramkova and T. D. Beloshapkirta 627 Ultrastructural analysis of toxin binding and entry into mammalian cells? G. L. Nicolson 628 Serum dopamine Pfr-hydroxylase activity in developing hypertensive rats?T. Nagatsu, ? ? T. Kato, Y. Numata (Sudo), K. Thula, H. Umezawa, M. Matsuzaki and T. Takeuchi 630 Enzymatic synthesis of acetylcholine by a serotonin-containing neurone from Helix? M. R. Hanky, G. A. Cottrell, P. C. Einson and F. Fortnum 631 Approved For Release 2003/04/18 : CIA-RDP96-00789R0031000300014 Approved For Release 2003/04/18 : CIA-RDP96-00789R003100030001-4 Information transmission tinder conditions of sensory shielding. WE present results of experiments Suggesting the existence of one or more perceptual modalities through which individuals obtain information about -their environment,' -although this .information is not presented .to any .known sense. The litera- .-ture1-3 and our Observations lead us to conclude that such .abilities can be studied under laboratory conditions. We have investigated the ability of certain peopk to describe -graphical material or remote scenes shielded against ordinary perception. In addition, we performed pilot studies to determine if electroencephalographic (EEG) recordings might indicate perception of remote happenings even in the absence of correct overt responses. We concentrated on what we consider /o be our primary responsibility?to resolve under conditions as unambiguous as possible the basic issue of whether a certain class of pan- -normal perception phenomena exists. So we conducted our .experiments with sufficient control, utilising visual, ,acoustic and electrical shielding, to ensure that all conventional paths of sensory input were blocked. At all times we took measures to prevent ,sensory leakage and to prevent deception, whether intentional or unintentional. Our goal is not just to catalogue interesting, events, but to uncover patterns of cause-effect relationships That lend them- selves to analysis and hypothesis in the forms with which we are familiar in scientific study. The results presented here constitute a .first step toward" that goal; we have established under known conditions a daS base from which departures as a function of physical and psychological variables can be studied in future work. REMOTE PERCEPTION OF GRAPHIC MATERIAL First, we conducted experiments with Mr Uri Geller in which we examined his ability, while located in an electricallY shielded room, to reproduce target pictures drawn by experi- menters ? located at remote locations. Second, we conducted double-blind experiments with Mr Pat Price, in which vie measured his .ability to describe remote outdoor scenes many miles from his physical location. Finally, we conducted PT Approved For Release 2003/04/18: CIA-RDP96-00789R0031000300014 Approved For Release 2003/04/18: CIA-RDP96-00789R003100030001-4 piaiure Vol. 251 October 18 1974 liminary tests using EEGs, in which subjects were asked to perceive whether a remote light was flashing, and to determine whether a subject could perceive thc presence of the light, even if only at a noncogaitive level of awareness. . In preliminary tcsting Geller apparently demonstrated an ability to reproduce simple pictures (line drawings) ,which had been drawn and placed in opaque sealed envelopes which he was not permitted to handle. But since each of the targets was )(flown to at least one experimenter in the room with Geller, it was not possible on the basis of the preliminary testing to discriminate between Geller's direct perception of envelope contents and perception through some mechanism involving thc experimenters, whether paranormal or subliminal. So we examined thc phenomenon under conditions designed to eliminate all conventional information channels, overt or subliminal. Geller was separated from both the target material and anyone knowledgeable of the material, as in the experiments of ref. 4. In the first part of the study a series of 13 separate drawing experiments were carried out over 7 days. No experiments are deleted from the results presented here, ? At the beginning of the experiment either . Geller or the experimenters entered a shielded room so that from that time forward Geller was at all times visually, acoustically and electrically shielded from personnel and material at the target location. Only following Geller's isolation from the experi- menters was a target chosen and drawn, a procedure designed to eliminate pre-experiment cueing. Furthermore, to eliminate the possibility of pre-experiment target forcing, Geller was kept ignorant as to the identity of the person selecting the target and as to the method of target selection. This was accomplished by thc use of three different- techniques: (1) pseudo-random technique of opening a dictionary arbitrarily and choosing the first word that could be drawn (Experiments 1.4); (2) targets, blind to experimenters and subject, prepared independently by a 60 ] SRI scientists outside the experimental group (followitn Ocher's isolation) and provided to the e,xperirnenters during the course of the expe.risnent (Experiments 5-7, 11-13); and (3 arbitrary selection from a target pool decided upon in advanci of daily experimentation and designed to provide data concern ing information content for use in testing specific hypothese. (Experiments 8-10). Geller's task was to reproduce with per on paper the line drawing generated at the target location Following a period of effort ranging from a few minutes tc half an hour, Geller either passed (when he did not feel con fident) or indicated he was ready to submit a drawing to flu 'experimenters, in which case the drawing was collected befor .Geller was permitted to See the target To prevent sensory cueing of the target information, Experiment 1 through 10 were carried out using a shielded room in SRI's facilit for EEG research. The acoustic and visual isolation is provide by a double-walled steel room, locked by means of an inner aria outer door, cad) of which is secured with a refrigerator-type lockim mechanism. Following target selection when Geller was insid the room, a one-way audio monitor, operating only from the hisid to the outside, was activated to monitor Geller during his efforts The target piaurc was never discussed by the experimenters after th picture was drawn and brought near the shielded room. In ou detailed examination of the shielded room and the protocol used ii these experiments, no sensory leakage has been found. The conditions and results for the 10 experiments carried out in ti shielded room are displayed in Table 1 and Fig. 1. All experimen except 4 and 5, were conducted with Geller inside the shielded roorc In Experiments 4 and 5, the procedure was reversed. For thos experiments in which Geller was inside the shielded room, the targs location was in an adjacent room at I distance of about 4 m, excep for Experiments 3 and 8, in which the target locations were, resp: tively, an of5= at a distance of 475 m and a room at a distance c about 7 m. A response was obtained in all experiments except Number 5-7. In Experiment 5, the person-to-person link was eliminate' by arranging for a scientist outside the usual experimentr gronel to draw a picture, lock it in the- shielded room befor Geller's arrival at SRI, and leave the area. Geller was then le ? TARGET ' RESPONSE TAR 0 ET ? : RIESPOftiE 2 REspolog Approiied For Release 2003/(14/18 : CIA-RDP96-00789603100030001-4 _Fig. 1 Target pictures and responses drav.en. by Uri Geller under shielded conditions.- R ESPONSE ???p? 01:11 ga-L.F.-Ct )ic .NL T AA 0 ET RESPONSE Nature Vol. 251 October 18 1974 Expert. Da (month,?day, year 1 8/4/73 2 8/4/73 3 8/5/73 -4 8/5/73 5 8/6/73 6 817/73 7 8/7/73 8 8/8/73 9 8/8/73 10 818173 11 8/9/73 12 8/10/73 13 8/1 0/73 ti4PF2P3PMQUIDP96200789R003100030001-4 Geller Location Target location Shielded room]' Shielded room 1 Shielded 'room1 ? Room adjacent to shielded room] Room adjacent to shielded room I Shielded room 1 Shielded-room 1 Shielded room 1 , Shielded room 1 Shielded room 1 Shielded room 24 Shielded room 2 Shielded room 2 Adjacent room (4.1 Erin Adjacent room (4.1 m) Office -(475 in) Shielded room I ? (3.2.m) . Shielded room 1 (32m) Adjacent room (4.1 in) Adjacent room (4.1 m) Remote room (6.75 m) Adjacent room (4.1 m) ? Adjacent room (4.1 m) Computer (54 m) Computer (54 in) Computer (54 m) Target Firecracker Grapes ? Devil Solar system. Rabbit Tree Envelope Camel )3ridge Seagull Kite (cognputer CRT) Church (computer memory) Arrow through heart (computer CRT, zero intensity) Figure a lb lc Id No drawing No drawing No drawing . .le If lg ? 24 ? -2b 2c *EEG Facility shielded room (set text). 4Ferceiver?target distances measured in metres. ISRI Radio Systems Laboratory shielded room (see text). by the experimenters to the -shielded room and asked to draw the picture located inside the roorn. Be said that he got no clear impression and therefore did not submit a drawing. The elimina- tion of the person-to-person link was examined further in the second series of experiments with this subject. Experiments 6 and 7 were carried out while we attempted to record Geller's EEG during his efforts to perceive the target pictures. The target pictures were, respectively, a tree and an envelope. He found it difficult to hold adequately still for good EEG records, said .that he experienced difficulty in getting impressions of the targets and again -submitted no drawings. Experiments 11 through 13 were -carried out in SRI's Engin- eering Building, to make use of the computer facilities available there. For the exPerimenters, Geller was secured in-a double- -walled, copper-screen Faraday cage 54 in down the ball and around the corner from the computer room. The Faraday cage provides 120 dB attenuation for plane wave radio 'frequency radiation over a range of 15 kHz to I GHz. For magnetic fields the attenuation is 68 dB at 15 kHz and decreaies? to 3 dB at 60 HZ. Following Creller's isolation, the targets for these experiments were chosen by computer laboratory personnel not otherwise associated with either the experiment or Geller, and the experimenters and subject were kept blind as to the contents of the target pool. ? For Experiment 11, a piCture of a kite was drawn on the face of a cathode ray tube display screen, driven by the computer's graphics program. For Experiment 12, a picture of a church was drawn and stored in the memory of the computer. .In Experiment 13, the 'target drawing, an arrow through a heart (Fig. 24), was drawn on the face of the cathode ray tube and then the display intensity was turned off so that no picture was visible. To obtain an independent evaluation of the correlation be- tween target and response data, the experimenters 'submitted the data for judging on a 'blind' basis by two -SRI scientists who were not otherwise associated with the research. For the 10 cases in which Geller provided a response, the judges were asked to match the response data with the corresponding target data (without replacement). In those cases in which Geller made more than One drawing as his respcinse to /he target, all the drawings were combined as a set for judging. The two judges each Matched the target data to the response data with no error. For either judge such a correspondence has an a priori probability, under the null hypothesis of no in- formation channel, of P = (109-1 = 3 x 10-7. A second series of experiments was carried out -to determine whether direct perception of enve4ope contents was possible . withoin some person knowing of the target picture. One hundred target pictures of. everydayObjects were drawn by an SRI artist and. sealed by other SRI personnel in double Approved For Release 2003/04/1 envelopes containing black 'cardboard. The hundred targets were divided randomly into groups of 20 for use in each of the Three days' experiments. On each of the three days of these experiments, Geller passed. That is, he declined .to associate any envelope with a drawing that he made, expressing dissatisfaction with the existence of such a large target pool. On each day he made approximately 12 recognisable drawings, which he felt were associated with the entire target pool of 100. On each of the three days, two of his drawings could reasonably be associated with two of the 20 daily targets. On the third day, two of his drawings were very dose replications of -two of that day's target pictures. The drawings resulting from this experiment do not depart signific- antly from what would be expected by chance. In a simpler experiment Geller was successful in obtaining information under conditions in which no persons were know- ledgeable of the target. A double-blind experiment was per- forrned in which a single 3/4 inch die was placed in a 3 x 4 x 5 inch steel box. The box was then vigorously shaken by one of the experimenters and placed on the table, a technique found in control runs to produce a distribution of die faces differing non- signfficantlyfrom 'chance. The orientation of the die within the box was unknown -to the experimenters at that time. Geller would then write down which die face was uppermost. The target pool was known, but the targets were individually pre- pared in a manner blind to all persons involved in the experi- ment. This experiment was performed ten times, with Geller passing twice and giving a response eight times. In the eight times in which he gave a response, he was correct each time. The distribution of responses consisted of three 2s, one 4, two 5s, and two 6s. The probability of this occurring by chance is approximately one in 101. In certain situations significant information transmission can take place under shielded conditions. Factors which appear to be important and therefore candidates for future investigation include whether the subject knows the set of targets in the target pool, the actual number of targets in the target pool at any given time, and whether the target is known by any of the experimenters. It has been widely reported that Geller has demonstrated the ability to bend metal by paranormal means. Although metal bending by Geller has been observed in our laboratory, we have not been able to combine such observations with adequately controlled experiments to obtain data sufficient to support the paranormal hypothesis. 'REMOTE VIEWING NATURAL TARGETS A study by Osis' led us to determine whether a suoject could describe randomly chosen geographical sites located several miles from the subject's position and demarcated by some 8 : CIA-RDP96-00789R003100030001-4 appropriate.means (remote viewing), This experiment carried the experimenters and the subject were kept blind as to the out wilinAhigc1M.rp?aafga oq seopAr_olD962664t,t6FIbiplibrakjaibiwich were used without replace,- e- inn, amon- ment. city co"uircilfrian, consisted-Zr a ren . stration-of-ability tests involving loaal. targets in the San An experimenter was closeted with Price at SRI to wait 30 Min to Francisco Bay area which could be documented by several jade- "begin the narrative description of the remote location. The SRI locations from which the subject viewed the remote locations con-- pendent judges. We .planned the experiment considering that sisted of an outdoor. park (Experiments 1. 2), the double-walled natural gcograPhiCal places or. man-inade sites that have copper-screen Faraday cage .discussed earlier (Experiments 3, 4, and existed for a long time are more potent targets for.paranormal 6-9); and an office (Experiment 5). 6 second experimenter would then perception experiments than arc artificial targets prepared iii the obtain a target location from the Division Director from a set of travelling Orders previously prepared and randamised by the Director laboratory. This is based on subject opinions that the use of and kept under his control. The target demarcation team (two to artificial targets involves a 'trivialisation of the ability' as corn- four SRI experimenters) then -proceeded directly to the target by pared with natural pre-existing targets.. automobile without communicating with the s-ubject or experimenter In each of nine experiments involving. Price as subject and remaining behind. Since the experimenter remaining with the subject at SRI was in ignorance both as to the particular target and as to SRI experimenters as a target demarcation team, a remote the target pool, he was free to question Price to clarify his descrip- location ,Was ? chosen in a double-blind protocol. Price, who tions. The demarcation team then remained at the 'target site for remained at SRI, was asked to describe this remote location, as 30 min after the 30 min allotted .fo'r travel. During the observation well as whatever. activities might be going on there. . period, the remote-viewing subject would describe his impressions of the target site into a tape recorder. A comparison was then made Several descriptions yielded significantly correct data per- when the demarcation team returned. `? taming to and descriptive of the target location. ? . Price's ability to describe correctly buildings, docks, roads, .. . In the experiments a set Of twelve target locations clearly gardens and so on, ? including structural materials, colour, differentiated from each other and within 30 min driving time ambience and activity, sometimes in great detail, indicated the from SRI had been 'chosen from a target-rich environment (more functioning of a remote perceptual ability. But the descriptions than 100 targets .of the type used in the experimental series) contained inaccuracies as well as coitet-t statements. To obtain prior to the experimental series by an individual in SRI manage- a numerical evaluation of the accuracy of the remote viewing ment, the director of .the Information Science and Engineering experiment, the experimental results were subjected to inde- Division,. not otherwise associated with the experiment. Both pendent judging on a blind basis by five SRI scientists who were TARGET Fig. ZsoOmputer drawings and responses drawn 14 Uri Geller. a, Computer drawing stored on video display; b conipitici-arawing.storeeFii corriputer.meapory. only; ccomputer drawing stored:on; video- display .with?ixrcp.intensityi. ? Approved For Release 2003/04/18 : CIA-RDP96-00789R003-100030001-4 ? Table 2 Distribution of correct selections by judges A, B, C, D, and E in remote viewing experiments . Descriptions chosen by judges 1 2 3 - ? Places viiited by judges 4 ! 5 6 7 8 9 Hoover Tower ABODE D Baylands Nature:Preserve 2 ABC Radio Telescope 3 ACD BE Redwood City Marina 4 CI) ABDE Bridge Toll Plaza 5 ABD DCE Drive-In Theatre 6 C Arts and Crafts Garden Plaza '7 A33CE Church 8 Rinconada Park 9 CE AL Of the 45 selections (5 judges, 9 choices), 24 were correct. Bold type indicates the description chosen Most often for each place visited. Ce__ choices lie on the main diagonal. The number of correct matches by Judges A through E is 7, 6, 5, 3, and 3, respectively. The expected nu_ ) of correct matches from the five judges was five; in the experiment 24 such matches were Obtained. The a priori probability of such an occur,,i by chance, conservatively assuming assignment without replacement on the part of the judges, is P = 8.10'4?. not otherwise associated with the research. The judges were asked to match the nine locations, which they independently visited, apinet the typed manuscripts Of the tape-recorded nar- ratives of the remote viewer. The transcripts were -unlabelled and presented in random order. The judges were asked to find a narrative which they would consider the best match for each of the places they visited. A given narrative could be assigned to more than one target location. A correct match requires that the transcript of a given date be associated with the target Of that date. Table 2. shows the distribution of the judges' choices. Among all possible analyses, the most conservative is a per- mutation analysis of the plurality vote of the judges' selections assuming assignment without replacement, an approach inde- pendent of the number of judges. By plurality vote, six of the Line descriptions and locations were correctly matched. Under the null hypothesis (no remote viewing and a random selection of descriptions without replacement), this outcome has an a priori probability of? = 5.6 x 10-4, since, among all possible permutations of the integers one through nine, the probability of six or more being in their natural position in the list has that value. Therefore, although Price's descriptions contain in- accuracies, -the descriptions are sufficiently accurate to permit the judges to differentiate among the various targets to the degree indicated. EEG EXPERIMENTS ? An experiment was undertaken to determine whether a physiological measure such as EEG activity could be used as an indicator of information transmission between an isolated subject and a remote stimulus. We hypothesised that perception could be indicated by such a measure even in the absence of verbal or other overt indicators.". It was assumed that the application of remote stimuli would result in responses similar to those obtained under conditions of direct stimulation. For example, when normal subjects are stimulated with a !dashing light, their EEG typically shows .a decrease in the amplitude of the resting rhythm and a driving of the brain waves at the frequency of the flashes'. We hypothe- sised that if we stimulated one subject iii this manner (a sender), the EEG of another subject in a remote room with no flash present (a receiver), might show changes in alpha (9-11 Hz) activity,and possibly EEG driving similar to that Of the sender. We informed our subject that at certain times a light was to be flashed in a sender's eyes in a distant room, and if the subject perceived that event, consciously or unconsciously, it might be evident from Changes in his EEG output. The receiver was seated in the visually opa'que, acoustically, and electrically shielded dotible-walled steel room previously described. The sender was stated in a room about 7 m from the receiver. ? To find subjects who were responsive to stich a remote stimulus, We initigy worked with four .female and tvio male volunteer subjects, all of whom believed that success in the experimental situation might be possible. These wereilesignated 'receivers'. The senders Were either other subjects or .1 experimenters. We decided beforehand to run one or tv, sessions of 3:6 trials each with each subject in this sele-'c procedure, and to do -a more extensive study with any stt whose results: were positive. A Grass' PS-2 photostimiilator placed about 1 m in front of t sender was used to present flash trains of 10 s duration. The recein EEG activity from the occipital region (Oz), referenced to 1- t mastoids, was amplified with a Grass 5P-1 preamplifier and asso driver amplifier with a bandpass of 1-120 Hz. The EEG data recorded on magnetic tape with an Ampex SP 300 recorder. On each trial, a tone burst of fixed frequency was presented to bo sender and receiver and was followed in one second by either ( train of flashes or a null flash interval presented to the sender. 7 -1 six such trials were given in an experimental session, consisting null trials-no flashes following the tone-: 2 trials of flashes at 6 f.p and 12 trials of flashes at 16 f.p.s., all randomly intermixed, dett mined by entries from a table of random numbers. Each of the it generated an 11-s EEG epoch. The last 4 s of the epoch was se t for analysis to minimise the desynchronising action of the wai . cue. This 4-s segment was subjected to Foorier analysis on a UNC computer. Spectrum analyses gave no evidence of =-G driving in any rec-ii although in control runs the receivers did exhibit driving physically stimulated with the flashes. Dm of the six subjects sJ initially, one subject (H. H.) showed a consistent alpha blocking effe We therefore undertook further study with this subject. Data from seven sets of 36 trials each were collected froi- tl subject on three separate days. This comprises all the data col to date with this subject under the test conditions described The alpha band was identified from average spectra, then scores average power and peak "power were obtained frorn individuaLtri: and subjected to statistical analysis. Of our six subjects, H. H. had by far the most monochro_1. EEG spectrum. Figure 3 shows an overiay of the three averag spectra from one of this subject's 36-trial runs, displayi changes in her alpha activity for the three stimulus condi Jr Mean values for the average power and peak power foi Table 3 EEG data for H.H. showing average power and peak in the 9-11 Hz band, as a function of Lash frequency and -lc Flash Frequency Sender 0 6 16 Average Power 0 6 16 Peak Power J.L. 94.8 84.1 76.8 357.7 329.2 289.6 R.T. 41.3 45.5 37.0 160.7 161.0 125.0 No sender (subject ' informed) , 25.1 35.7 28.2 87.5 95.7 81.7 ..-....i J.L. 54.2 55.3 44.8 191.4 170.5 149.3 J.L. 56.8 50.9 32.8 240.6 178.0 104.6 R.T. 39.8 24.9 30.3 145.2 74.2 122.1 No sender (subject not informed) 86.0 53.0 52.1 318.1 180.6 202.3 Averages 56.8 49.9 43.1 214.5 169.8 153.5 -12% -24%,(P