PROJECT STAR GATE RESEARCH AND PEER REVIEW PLAN

Document Type: 
Collection: 
Document Number (FOIA) /ESDN (CREST): 
CIA-RDP96-00789R002700010001-1
Release Decision: 
RIPPUB
Original Classification: 
S
Document Page Count: 
106
Document Creation Date: 
November 4, 2016
Document Release Date: 
February 13, 2003
Sequence Number: 
1
Case Number: 
Publication Date: 
June 1, 1994
Content Type: 
RS
File: 
AttachmentSize
PDF icon CIA-RDP96-00789R002700010001-1.pdf9.11 MB
Body: 
Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET PRG-TR-1068-SL DEFENSE INTELLIGENCE AGENCY PROJECT STAR GRTE RESEARCH AIID PEER REVIEW PLAR (U) JUnE 1994 NOFORN SECRET LIMDIS STAR GRTE Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET PROJECT STAR GATE RESEARCH AND PEER REVIEW PLAN (U) This document was prepared by the Technology Assessment and Support Activity Office for Ground Forces Directorate for Military Assessments National Military Intelligence Production Center Defense Intelligence Agency Date of Publication June 1994 REPRODUCTION REQUIRES APPROVAL OF ORIGINATOR OR HIGHER DOD AUTHORITY LIMITED DISSEMINATION FUTHER DISSEMINATION CLASSIFIED BY MULTIPLE SOURCES ONLY AS DIRECTED BY DIA/PAG OR HIGHER DOD AUTHORITY DECLASSIFY ON OADR SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 UNCLASSIFIED OUTLINE PAGE EXECUTIVE SUMMARY ................................... 1 1. INTRODUCTION .................................. 2 II. PLAN OBJECTIVES ............................... 3 III. SIGNIFICANCE OF EFFORT ........................ 4 IV. PLAN OVERVIEW ................................. 5 V. BASIC RESEARCH PLAN FOR ANOMALOUS COGNITION... 7 VI. BASIC RESEARCH PLAN FOR ANOMALOUS PERTURBATION. 15 VII. APPLIED RESEARCH PLAN FOR ANOMALOUS COGNITION.. 17 SG1 B IX. POTENTIAL RESEARCH RETURN ...................... 25 X. PROJECT OVERSIGHT ............................. 25 XI. DEVELOPMENT OF EVALUATION CRITERIA ............. 26 XII. BUDGET AND RESOURCE REQUIREMENTS (FYs 95-99)... 26 APPENDICES A. CONGRESSIONALLY-DIRECTED ACTION, DEFENSE AUTHORIZATION CONFERENCE ...................... A-1 B. TERMINOLOGY AND DEFINITIONS ................... B-1 C. POTENTIAL RESEARCH SUPPORT FACILITIES ......... C-1 D. RESOURCE LITERATURE ........................... D-1 E. CURRENT CONTRACTOR SCIENTIFIC OVERSIGHT COMMITTEE MEMBERSHIP .......................... E-1 F. CURRENT CONTRACTOR INSTITUTIONAL REVIEW BOARD ................................. F-1 G. ACADEMIC STUDIES REGARDING THE SCIENTIFIC VALIDITY OF AMP .............................. G-1 H. AN ASSESSMENT OF THE ENHANCED HUMAN PERFORMANCE PROGRAM .......................... H-1 I. IN-HOUSE STAFFING REQUIREMENTS ............... I-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET (U) EXECUTIVE SUMMARY: (S/NF/SG/LIMDIS) In compliance with the congressional conferees' request (Appendix A), DIA proposes to develop a multi- year research and development program, subject to rigorous scientific and technical oversight, to demonstrate the scientific validity of the STAR GATE program, and that results of military and intelligence value can be obtained in a cost-effective manner using anomalous mental phenomena (AMP). (S/NF/SG/LIMDIS) This proposed program, if successfully implemented, will: - Identify the underlying mechanisms of AMP. - Establish the limits of operational usefulness of - Determine the degree to which foreign activities in AMP represents a threat to national security. - Lead to the development of countermeasures to neutralize this threat. - Use research findings to improve operational activities. - Develop data fusion criteria to integrate AMP results with other intelligence sources. (S/NF/SG/LIMDIS) Due to the diversity of the STAR GATE mission/objectives, both external resources and in-house expertise are required. Since this Activity possesses no in- house R&D capability, an absolute need for external R&D support is required to meet Congressional concerns which are addressed in this program plan.. A balance will be maintained between external and in-house activities, and every effort will be made to integrate and link these activities where appropriate. The external aspect permits a wide range. of expertise covering many disciplines to be focused on this area; this also has the benefit of ensuring peer group review and of facilitating a variety of scientific interactions. In-house personnel with a wide-range of expertise in this phenemenology will need to be retained to make this proposed plan work. (S/NF/SG/LIMDIS) In order to fulfill Congressional Direction, the DIA proposes to convene a Scientific Evaluation Panel (SEP) composed of representatives from each of the Service Scientific Advisory Boards. The purpose of the SEP is to review and validate the methodology outlined in the plan in order to address the cost-effectiveness and performance criteria for the SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LIMDIS Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET STAR GATE program's research and development objectives and to propose recommendations as to which objectives should be pursued and the program scope required to achieve those objectives. If the SEP determines that objectives in the plan are viable and executable, the General Defense Intelligence Program (GDIP) Manager will complete this initiative with others for limited available resources remaining in the program. (U) The proposed ongoing R&D effort will be reviewed every two years by the SEP to determine whether the STAR GATE program can show results that are cost-effective and satisfy reasonable performance criteria. (C) An annual report will document the current operational, technical and administrative status of the program. I. (U) INTRODUCTION: (S/NF/SG/LIMDIS) This program plan was developed in response to a Defense Authorization Conference, Congressionally Directed Action (CDA) to prepare a long-term systematic and comprehensive research and peer review plan in order to investigate anomalous mental phenomena (AMP), and to apply program research results to potential operational activities. This plan also describes key in-house activities along with an appropriately integrated basic and applied external research support effort. (S/NF/SG/LIMDIS) Specifically, this program plan represents DIA's view on how best to proceed with both in-house activities and external research support for the period of FY95 through FY99. Research findings, both domestic and foreign, and results from operational activities may lead to updates of this plan in order to reflect improved phenomena understanding and to pursue follow-on research and/or application directions. (S/NF/SG/LIMDIS) A underlying and fundamental premise governing the implementation of this program plan is that a well- integrated interdisciplinary approach is considered to be the most appropriate strategy for conducting research in this diverse field. Consequently, this plan includes a wide variety of research topics which are based on recent findings from leading- edge pursuits in other disciplines that are suspected of being germane for STAR GATE. Other topics are derived from a review of worldwide research, consultations with leading area experts, and on insights gained from previous research and application activities associated with the STAR GATE program. (S/NF/SG/LIMDIS) This program plan also includes recommended proposed FY funding which will allow for the STAR GATE program to show results that are cost effective and will at SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LIMDIS Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET the same time satisfy reasonable program performance criteria. The implementation of this program plan will preclude the reoccurrence of the yearly cyclical activity of project start-up, limited progress, followed by anticipated project shut-down which previously inhibited program activity. (S/NF/SG/LIMDIS) In sum, the implementation of this research and peer review plan will allow DIA to successfully accomplish identified R&D activities which, in-turn, will enhance the capability of STAR GATE personnel to engage in operational activities and to assess the work done by potential adversaries, thereby, reducing the risk potential for a technological surprise. (U) Terminology and definitions are discussed at Appendix B. II. (U) PLAN OBJECTIVES: (S/NF/SG/LIMDIS) The objective of this follow-on research and peer review plan is to further develop phenomena understanding and/or validation, in applications understanding, and in operational feasibility evaluation. This continued work will have a direct bearing on DIA's ability to both assess the significance of foreign research and to perform a systematic review of potential applications regarding this phenomena. (S/NF/SG/LIMDIS) Accomplishment of the various activities identified in this plan will further enhance threat assessment of foreign achievements in this area, and will help achieve the potential for U.S. military/intelligence applications on select tasks as a supplement to HUMINT operations. (U) It is anticipated that this plan will assist decision makers in their review and consideration of future directions for this field, and that this plan.can.begin formal implementation starting in FY95. (S/NF/SG/LIMDIS) In compliance with the Congressional conferees' request, DIA recommends that a period of six to nine months be set aside at the beginning of this new program for the purpose of identifying the most promising and cost-effective experiments to be conducted under the program to meet the overall research objectives outlined below. It is further suggested that a series of small working groups consisting of scientific experts from a variety of pertinent disciplines meet during this time period to accomplish this end. Their suggestions will be presented to the STAR GATE Scientific Oversight Committee for final approval. SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LIMDIS Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET SG1 B III. (U) SIGNIFICANCE OF EFFORT: (S/NF/SG/LIMDIS) STAR GATE is a dynamic approach for pursuing the largely unexplored area of human consciousness and subconsciousness interaction. Its scope is comprehensive; a wide range of phenomenological issues are examined that include psychological, physiological/neurophysiological, physics and other leading-edge scientific areas. Although broad in scope, STAR GATE is well grounded due to its solid independent scientific review base. STAR GATE is based on a dynamic style in all its endeavors, especially in its pursuit of on-going foreign activities in this area. (S/NF/SG/LIMDIS) One of the tasks previously levied on DIA by the FY91 Defense Authorization Act was to develop a long-range comprehensive plan for investigating parapsychological phenomena. This task was one of several objectives included in a new program for this phenomenological area that identified DIA as executive agent. Moreover the FY91 Defense Authorization Act authorized for DIA a funding level of $2 million for DIA in order to initiate this new program. As a result, a balanced and integrated plan to include operations, foreign assessment, and research and development was implemented . In addition, a new DIA limited dissemination (LIMDIS) program, codeword STAR GATE, was established in order to accomplish the objectives that were set forth in this plan. (S/NF/SG/LIMDIS) The external research support conducted under monies appropriated to date comes to a close in the March/April 1994 time-frame. The impact of this is that if research activities utilizing human subjects are interrupted, it has generally been necessary to begin again instead of later resuming activities from the point of termination. Consequently, ,it is important for the STAR GATE program to remain stable. Research involving human use differs considerably from that involving physical systems. For example, data from human subjects cannot be collected nor analyzed as rapidly, in that additional empirical data is often required to reach analytical conclusions. This type of data analysis utilizing human subjects SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LINDIS Appro ed ForRel Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET can only be achieved with an in-place, uninterrupted, multi-year research and development program. Therefore, should it be decided to go forward with this program, it should be done in a timely fashion. (S/NF) The funding allocation for external research received by STAR GATE in FY91 and continued through FY93 permitted several important research areas to be initiated and continued. It is anticipated that results of this research will assist in clarifying some of the possible future research directions; consequently, not all long-range research possibilities can identified in this plan. However, most all of the major investigation areas can be addressed, and many of the specifics can be identified with reasonable confidence. Figure 1 presents an overview of overall research objectives for both Anomalous Cognition (AC) and Anomalous Perturbation (AP) which will be considered for inclusion in this program. (S/NF) Previous basic research activities from FY91 through FY93 focused on the following; (1) validating findings from previous magnetoencephalograph (MEG) research and initiating new work with a variety of conditions and individuals; (2) performing a variety of anomalous cognition (AC) experiments to determine potential correlations (e.g., target type, environmental factors); (3) developing various theoretical constructs that might be testable and that could help explain the phenomena; (4) examining effects of altered states on data quality; (5) initiating review of and research into the energetics area; and (6) examining various application possibilities (e.g., communication, search). (U) Results from previous basic and applied research activity have been factored into this research and development plan and provide the basis upon which further R&D efforts will be built. IV. (U) PLAN OVERVIEW: A. (U) BASIC RESEARCH OBJECTIVES (S/NF/SG/LIMDIS) The objective of basic research is to understand the fundamental, underlying mechanisms for AMP. To achieve this objective in an efficient way, basic research of the detection mechanism should begin in a conservative direction. That is, assume that a putative "sensorial" system exists for AMP and that it most likely will behave similarly to-those common elements which are known through the five senses. This conservative approach generalizes to understand the source of AMP and its propagation mechanisms. SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LINDIS Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 UNCLASSIFIED II Cognlllon 1.0 Detector [2.0 Transmission 3,0 Source 1.1 shill Central Nervous System NeuroNet Models Autonomlo Responses Inter-species Communications Other Mimals 1.2AA.Rlled Other Physiology (Skin) Pets onality (B ehavloraV8 ellf,port-Q-Sort/M871) Perceptual Modell PsychologbtlModels (MOUvaUon/Emollon) Selection (Dired(Corteiauonal) Environment-Physical (OMF) Environment-?sychological (Set and Setting) Environmeht-Phyelolopi aj (Comfort) AM dal Response Type (AudloNldeo/Lef Hand) Redundancy (Multiple Pass/Multlple Detectors) Communication Analysis 1.9 Mixed Internal Noise Source Training (MacroscoplQ(Operant) Session Protocols 2.10?910 '1 Informauon'al (Entropy/Meaning) Other Thermodynamlo Veolor/Scalar Potential a.2.AAAUJA Boundaries OpMltl'one Human Bender Demarcatbn (Coordlnates/8eaoon) Mdemd Nobe Source Inverse (See ch) 2.2 Mind Physical Charabterlallos (She/Composlllon) Type (StaUcNynirnlo) One-ht-'n' (Foroed Choloe/einpry Search III Perturbation 2.is3aiq Decision Augmehtdtbn Theory Worm Holes (4-'Dkrienebns) Vector/Scalar PotentialPropagation Stochasuo Ceusahty Figure 1 (U) Research Overview Approd For IeleaseJ003/0418 : CIj RD I Anomalous Phenotiietia (Mental) I "I' I 700110001-I 2.0 macro 2~19AilA ' Plezoelectrlo Strain Gairge Restive Strain Gauge Metal (Bending) Pendulums (LlnearuibrYbn(Bbtogkat) Mechanical Systems (Balb/Interferometera) 22AAd11ed Inertial Syiterhi 1.1 paflil Atoms Nuclei (Moesbauer Effect) Photons Cells (Algae, Blood) 88410rl4 (Mutation--Salmonella) Quantum Systems (Neutron/Pholon Interference) Crystal'Structure Molecular Strocture(R Spectra of H20) Theory (Quantum M'easuremenliZeno) Random Number Generators (Etedrontc/Nuciear) maneUo p Mag mains 1.3.Mllurl Electrons ARAld LMng Systems (Hanltir slPlsh_.8eha41oral) Mioromichines Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET B. (U) APPLIED RESEARCH OBJECTIVES (S/NF/SG/LIMDIS) The objective of applied research is to improve AMP functioning to its maximum possible limit. To realize this objective, it is critical to define AMP output measures that are consistent with either a laboratory setting and/or an operational environment. The approach should also reflect scientific conservatism. In investigating any single variable (e.g., different training methodologies) all other variables should remain as constant as possible (e.g., use the same individuals and known good target systems). C. (U) FOREIGN ASSESSMENT SUPPORT OBJECTIVES (S/NF) From a research perspective, the objective of foreign assessment is to determine the degree to which claims from foreign laboratories can be confirmed in a U.S.-based setting. In science, replication is critical for understanding. V. (U) BASIC RESEARCH PLAN FOR ANOMALOUS COGNITION: A. (U) BASIC APPROACH (S/NF) The link of basic and applied research with other applications investigations or with research activities is shown on Figure 2. The top of the chart shows that for any research or application task, certain conditions must be met (e.g., a reliable calibrated individual is required; proper scientific procedures need to be developed, etc.). Once these basic foundations are laid, then basic/applied research can be initiated with a reasonable expectation of success and with assurance that results will not be ambiguous or fail scientific scrutiny. (S/NF) This chart also. illustrates the difference between basic and applied research; applied research relates to various methods for collecting, recording, improving and analyzing data output, while basic research is aimed at phenomena understanding. In this chart, the "detector" is the human brain/mind, the "source" is the target or an aspect of the target, and "transmission" refers to notions of how information and/or energy are actually transmitted between source and detector. (U) Figure 3 illustrates the interdisciplinary scope that will be brought to bear on this research problem. Leading- edge researchers in their various fields can provide clues, if not make direct contributions, that will assist in phenomena and applications understanding. Appendix C lists candidate research support facilities that could be involved in this long-range SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LINDIS Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For ?Ngfffi CIA-RDP96-00789R002700010001-1 .baJZ06 - 03=t;a far Spec f3.c Tas7 I7C -cafe ong * Reliable fCa].3l ?o& racoivor ,r App CCP ate waz gnat tir Optisaum Protocol or Data- CSollecf30Zt 'VC Optimum. Data Assssemsxtt rr Integration of Rssult ,ir: So~~ce ,K. Tr xzwsmisaion tir Detector sr zttec, ation * Receiver 5electfon Raceivor Tram ,k Target: Selection Prot-.,cols Analysis ?integration * Counter oassses Figure 2 (U) Research Objectives UKLaSSIFIED 8 Approved For Release 2003Pa5,ip896-00789R002700010001-1 Ge31 1 #3alati ity Quastt Xa3 1 'a e2tt Z`lieitmadyssaalics Statu&t .cs/Signal. xu.atyaris'~ 2fevYV~I.-,Kettrvrks RNUMRLUUS MEN L LPUMCIMENR 9Y 7~ay+cho~i imrsuuoiogy cogs? tiv+e Haraoscience rt f; ci a~. Isiteuigmuco Figure 3 (U) Integration of Scientific Disciplines UNCLASS i f I ED 7CeiLic31 7Lmthropology ?-~" Pa~physi olo Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET effort. Appendix D outlines pertinent research literature applicable to this field. Final selection will be based on how well the activities if these institutions will fit into specific time-lines and priorities to be established in FY95. Figure 4 lists milestones for the anomalous cognition basic research to be conducted under this plan. B. (U) RESEARCH DETAILS 1. (U) Source. (S/NF/SG/LIMDIS) Source research will address those topics that show promise for understanding the characteristics of the target or target area that may play a role in anomalous cognition (AC) occurrence and data quality. Aspects of the target that can be defined by conventional information theory (involving entropy/information content) will be explored in-depth. A wide variety of targets with a wide range of information content, dynamics, or other parameters will be examined to explore this possible link. If not successful, other approaches to investigate the targets' innate nature and its possible link to phenomenon occurrence will be initiated. Definitive data in this area would also have implications for defining those targets which have the highest probability of successful data acquisition in an operational setting, thus establishing operational tasking parameters. 2. (U) Transmission. (S/NF) The pursuit of possible transmission mechanisms for AC phenomena is essentially the most significant basic research task and also the most difficult to formulate. In this effort, a theoretical basis will be developed from extensions of current theory in light of recent advanced physics formulations. Some of these formulations permit unusual "information flows" that may, in fact, have relevance for this phenomenon. Testable models/constructs will be developed and evaluated. A variety of other possible explanations involving extensions of gravitation theory, quantum physics or other areas will be constructed and tested where possible. Some of these tests may require close cooperation of leading-edge researchers using equipment in their facility. (C/NF) Effort in this area will also focus on integrating diverse aspects of the source, transmission, and detector categories. For example, it will examine how "targeting" occurs. Insight will be drawn from in-depth reviews of various unusual physical effects identified by physical sciences researches. These include distant particle coupling (Bell's theorem), ideas from quantum gravity, possible electrostatic/gravity interactions, unusual quantum physics, SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LIMDIS Appro ase 2003/04/18 : CIA-RDP96-00789R00270001 001-1 II . Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 UNCLASSIFIED TIME FRAME ACTT U ITY 1995 1996 1997 1998 1999 Information/Entropy SOURCE - - - - - RESEARCH Analysis Various Target Attributes (TARGET) (Size, Form, Content) TRANSMISSION Four-Dimensional Calculations RESEARCH (Relativity Extensions) Unconventional Waves (MECHANISM) (Laboratory) - (Long-Range Tests) - Variables (Distance, Shielding, Energy) DETECTOR Neuroscience (EEG, Memory, Etc.) RESEARCH Environmental Factors (BRAIN) Other Physiology (Electrical, Infrared) Implications from Medical/Animal Research Physical Sciences (Physics, Statistics, Parallel Processing, Etc.) Psychological Sciences INTEGRATION (Psychology, Anthropology, Cognitive, ttental, Subliminal Perception, Etc.) Medical (Genetics, Etc.) FIGURE 4 (U) BASIC RESEARCH MILESTONES - ANOMALOUS COGNITON UNCLASSIFIED 11 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET observational theories, vacuum "energy" potential, and a variety of other concepts. (S/NF) Perhaps the most promising exploratory model of all is one based on little-understood aspects of the fundamental equations for electromagnetic wave propagation (Maxwell's equations). These equations indicate that forms of "wave propagation" could also exist that do not have the conventional electric or magnetic field components (i.e., vector and scalar waves). These waves would not be blocked by matter and therefore could be leading candidates for AC propagation or for certain aspects of AC phenomenon. Research papersl _j SG1 B indicate that these waves are considered. a lea ing canaiaare or AC transmissions by their researchers. Pilot study investigations in this area were conducted by PAG-TA in FY92 with promising preliminary results. Future research could couple with other DIA exploratory R&D efforts in this area currently being explored. (S/NF/SG/LIMDIS) Research on this topic will be closely integrated with research involving the anomalous phenomena (AP) aspect, since findings in the AP area would have direct implications for phenomena transmission mechanisms in general. Findings from the target (or target source) research area would also provide insight into possible transmission mechanisms. For example, different forms of the same target (e.g., target size, 2D vs 3D, holographic representations) may show patterns in the AC data that might provide-clues regarding phenomena mechanisms. 3. (U) Detector. (U) The most important and promising aspect of understanding the nature of the AC detection system in humans is through modern advances of the neuroscience. Earlier neurophysiological results obtained from magnetoencephalograph (MEG) measurements begun in FY92 will be validated and expanded. This earlier work indicated MEG correlations between visual evoked responses areas of the brain may exist, and that remote stimuli might also be detectable in MEG data. Some of the specific investigations will examine a variety of near and far- field situations, other sensory modes and different types of individuals in order to search for potential variables. It might be possible, with advanced MEG instrumentation, to actually locate the exact brain areas involved in AC phenomena occurrence. Future research in this area could couple with research currently being explored at the National Laboratory. (U) Other physical/psychophysical aspects of the central nervous system (CNS) will also be explored to look for possible correlates. This would include galvanic skin responses SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LIMDIS SG1 B I Appro ed For Rel base 2003/04/18 : CIA-RDP96-00789R00270001 Q001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET (GSR) or other parameters. (U) Related to this overall area are several investigations that relate to possible environmental interactions with the brain that could affect AC data. This would include possible geomagnetic or electromagnetic influences. (S/NF) A spin-off from findings in this basic research area could be for unique communication applications. MEG correlates might exist between remotely located people. If so, the possibility of transmission of remote messages (via a type of code) might be possible. Since AC phenomenon is not degraded by distance or shielding, the potential of transmitting basic "messages" to individuals in submarines would exist. Preliminary exploration of this application by PAG-TA has yielded promising results. (S/NF) Another potential spin-off benefit from detector research in this program is that new insights into brain memory or parallel processing might be achieved. This could lead to new directions in advanced computer developments involving neural networks. For example, recent indicates that SG1B "wave-like" brain activity occurs in addition o usual neuronal processes. This wave-like phenomenon may have some link to the "phase shift" observed in MEG data from the previous MEG project. Further MEG work involving remote stimuli may help clarify such issues. 4. (U) Integration. (U) The basic research activities will liberally avail itself of the existing research communities that specialize in neuroscience, physics and statistics and the broader psychological/social sciences. Direct support with a variety of university departments, national and international, will be explored. PAG-TA contacts with such national laboratories as Los Alamos, Lawrence Livermore, Oak Ridge, and have indicated an interest on their part in supporting the research efforts. Frequent conferences and data exchanges are anticipated. These data exchanges will insure that a proper interdisciplinary approach is maintained, and that findings from other disciplines will be incorporated in this program where appropriate. This peer group dialogue will greatly benefit research sponsored through this plan, new ideas will be generated, and possibly clues regarding phenomena operation will be easier to identify. (U) Some specific interdisciplinary examples that will benefit this program are as follows: - In 1990 The American Anthropological Association (AAA) formed a new division, the Society for the SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LIMDIS Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET Anthropology of Consciousness (SAC). This division has established a technical journal to support interdisciplinary, cross-cultural, experimental, and theoretical approaches to the study of consciousness. This group may be able to contribute this program by providing cross-cultural examples. These members might also assist in the assessment of foreign data in this area. - The psychophysiology of vision has already contributed to the earlier program. This plan calls for a collaborative effort with researcher in an attempt to understand how the central nervous system process subliminal stimuli. This should assist in understanding how MEG correlates occur. - The relationship between mind and body is currently discussed in the research literature as well as in the popular press. Researcher at the California Institute for Transpersonal Psychology (CITP) have.been active in investigating the role of mental attitudes and body chemistry. While there may not be a direct link with AC, and exchange of techniques and experimental designs would be helpful. - The Journal of Cognitive Neuroscience contains at least one article of interest in each issue. This discipline is where most of the cognitive work with the neuromagnetism is conducted. There is the possibility of joint investigations with researchers performing MEG investigations at the National Institutes of Health (NIH). - Stanford University has been conducting research on internal mental imagery. The manipulation and control of this imagery is extremely important in understanding the source of internal noise during an AC session. A collaborative effort with Stanford should lead to methods for noise reduction. - Neural networks are particularly good at recognizing subtle patterns in complex data, and are being applied in the subjective arena of decision making in business. In order to improve AC analysis, the program will conduct a collaborative effort with scientists who are active in neural network research and with selected individuals who have had success with interpreting highly subjective data. - Statistics is the heart of AC research in that most of the results are usually quoted in statistical terms. Hypothesis testing has traditionally been the primary focus, but there are other possible approaches that should be explored. Statistics researchers at Harvard have already expressed interest in contributing to the research effort. - A major portion of the effort will be a SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LINDIS Approved For Release 2003/04/18 : CIA-RDP96-00789R00270001 go SG1 B Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET search for a AC evoked response in the brain. Sophisticated processing is required in that magnetic signals from the brain can not be easily characterized by standard statistical practices. Several research facilities can contribute. - Classical statistical thermodynamics may be the heart of understanding the nature of an AC source of information. A physical property called entropy may be related to what is sensed by AC. The program intends to collaborate with a variety of university physics departments to calculate the appropriate parameters. (S/NF) The specific experiments to be conducted in these research domains will be defined during the first six to nine months of the program utilizing the recommendations of the working groups mentioned above subject to approval by the Scientific Oversight Committee. VI. (U) BASIC RESEARCH PLAN FOR ANOMALOUS PERTURBATION: (S/NF) Figure 5 illustrates the basic approach for investigations "energetics", or anomalous perturbation (AP) phenomenon. Intelligence reporting indicates that this aspect of AMP I Ishould receive attention in is researc pan to prevent technological surprise. Thus, beginning in FY95, acceptance criteria will be establish with which to judge the historical literature for potential AP effects. Using those criteria, a detailed review of the literature will begin in mid FY95 and considering the size of that data base will continue through FY95. Knowledge gained from this review may provide insights for the development of new AP target systems or provide data so that particular experiments can be replicated. Given the complexity of most AP experiments, considerable time is needed to plan and conduct them properly. If the results warrant, then application development may begin as early as FY96; however the primary task of basic research of AP is to attempt to validate its existence. Findings from foreign research will be examined and factored into this activity as appropriate. (S/NF) The keys to investigating this area will be in appropriate personnel selection and, very likely, in proper selection of the AP test device. Thus, the initial phase of this effort will involve identification and solicitation of individuals known or claimed to have such talents. For example, certain expert martial arts or yoga practitioners might do well in such experiments due to their strong mental conditioning and ability for intense mental focus. After locating such individuals, various instruments, such as microcomputer devices, sensitive electronic/sensor devices, or other unique or sensitive equipment would be used as targets in AP experiments. SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LINDIS Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 UNCLASSIFIED ACTI U ITY TIME FRAME -- 1995 1996 1997 1999 l 1999 DEVELOP EURLUATION CRITERIA PERFORM Historical Data Base ANALYSIS EHAMINE Various Technical Targets TARGET Laboratory Setting SYSTEMS CONDUCT Advanced Sensors Complex Components UALIORTION EHPERIMENTS Far-Field Effects (countermeasures) PURSUE APPLICATIONS Solicit Known PERSONNEL Talent Screening/Training (Develop) SELECTION Figure 5 (U) Basic Research Milestones - Anomalous Perturbation (To Include Biological Systems) UNCLASSIFIED ase 2003/04/18 : CIA-RDP96-00789R002700019001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET (S/NF) Some of the unique sensor candidates include devices that are highly sensitive to very weak gravitational effects (such as Mossbauer devices or atomic clocks). Perhaps the most promising device is one that involves detection of an unusual non-electromagnetic wave (A vector/scalar wave). If experiments with such sensors are successful, then significant understanding of AP or AC phenomenon would occur. Experiments with such a device is a distinct near-term possibility; consequently this will be given high priority in the early part of this long-range program. (S/NF) Should these pilot experiments prove successful, then a near and distant experiments would be developed for a wide variety of devices to evaluate application aspects. Potential applications could include, for example, remote switching (in a communication role) or possibly as a countermeasure to minimize effectiveness of threat systems such as sensitive computer components or sensors. Similarly, if these results are successful, they would provide insight regarding potential threats to U.S. systems or security. (S/NF) The specific experiments to be conducted in these research domains will be defined during the first six to nine months of the program utilizing the recommendations of the working groups mentioned above subject to approval by the Scientific oversight Committee. VII. (U) APPLIED RESEARCH PLAN FOR ANOMALOUS COGNITION: (U) Figure 6 illustrates the overall plan for the applied research portion for several main functional categories. a. (U) SELECTION (C) The most promising potential for selecting individuals is to identify ancillary activity that correlates with AC ability. If such a procedure can be identified, then receiver selection can be incorporated as part of other screening tests (e.g., fighter pilot candidacy), and thus large populations can be used. Among the items that will be examined are physiology (e.g., responses of the brain to external stimuli) and hypnotic susceptibility (i.e., an individuals predisposition for being hypnotized). The results of this effort will be examined continuously; however, a decision to end the investigation will occur in mid FY96. Should the results at that time warrant, then refining of the techniques will continue to the end of FY 1998. The reason the initial research spans several years is that to validate even one psychological finding requires long-term testing of candidate individuals. Current statistical methods SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LIMDIS Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 UNCLASSIFIED ACTIVITY TIME FRRME 1995 1996 1997 1998 1999 State Parameters PERSONNEL (Hypnosis, Physiology, Etc.) SELECTION Psychology RESEARCH (Self Report, Behavioral Measures, Etc.) Solicit Known Talent Empirical bass Screenina) State Parameters (Altered States Subliminal Threshold t feasures. Etc.)- PERSONNEL Empirical Evaluation TRAINING RESEARCH Practical Application Tests (Increasing Project Difficulty) Target Characteristics (Entropy, Size, Etc.) APPLICATION Other Aspects (Target Function, Dynamics, Degree of EURLURTION RESERRCH Importance, Etc.) Operational Conditions (Targets, Feedback, Etc.) PROTOCOL Search/Location Projects DEVELOPMENT - - - - New Applications/Procedures ANALYSIS ' Response Definition Written Drawn, Physiological tieasures, Etc.) METHOD Artificial Intelligence (Fuzzy Sets Etc.) DEVELOPMENT Neural Network Analogies Combination of Methods DATR Intelli nce Data Fusion tiethods INTEGRATION/ Training/Seminars ASSIMILATION Advanced Training DEVELOPMENT Various Customers Figure 6 (U) Rpplied Research Milestones - Anomalous Cognition UNCLASSIFIED 18 Approved ForRel ase 2003/04/18 : CIA-RDP96-00789R00270001 001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET require many AC sessions, and experience has shown that only a few sessions can be conducted per week for any single individual. (C) The previous program was able to estimate that approximately one percent of the general population possessed a high-quality, natural AC ability. Because the empirical method (i.e., asking large groups to attempt AC) is labor intensive and very inefficient, it is included in the research plan only.as an alternate approach. b. (U) TRAINING (S/NF) Training has been a major part of the previous program; however, results of training approaches have been difficult to evaluate and have not been examined systematically. Systematic review of this issue was begun in FY 92. One of the methods that will be examined involves lowering an individual's visual subliminal threshold (i.e., the level below which an individual is not consciously aware of visual material). This could enhance the individual's sensitivity to AC data. Other forms of altered states, such as dreaming and hypnosis, will also be evaluated to see if such states can enhance AC data quality. (U) Results on these issues should be available at the close of FY95. If no progress has been observed and if there have been no positive results from the basic research, the task ends. However, should any of the variables examined appear promising then the task will be continued. (S/NF) It is anticipated that all laboratory successes must be validated by simulating operational tasks. These experiments involve identifying the specialty to be tested, the acceptance criteria, and conducting sessions in which the complete target systems are know. This three-year activity runs concurrently with the other tasks but with a one-year offset to allow for planning. c. (U) TARGET/APPLICATION SELECTION (C) Based on earlier research, the most promising approach to target selection appears to be a single physical characteristic called entropy (i.e., a measure of inherent target information). Beginning in FY95, two and one half years have been allocated for the detailed study of this aspect of target properties. Initially, little experimentation is-required; rather, a retrospective examination of previous target systems should indicate if this approach is valid. Included in this examination are detailed calculations of the information content of natural target scenes. SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LIMDIS Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET (S/NF) Beginning in mid FY96, other potential intrinsic target properties will be examined. For example, a target may be more readily sensed by AC if the collection of elements at the site (e.g., landmark, buildings, roads) constitute a conceptually coherent unit as opposed to a collage of unrelated items. Quantitative definition of targets will also be developed that include non-physical target parameters such as function, meaning, or relationships. These aspects are highly important in most operational projects and need to be quantified. (S/NF) Part of this effort will involve investigations that serve two purposes: (1) add insight into the phenomenon; and (2) help evaluate the feasibility of certain potential applications. For example, long distance experiments could be conducted to or from deep caves or submarines in deep water to test communication potential and transmission theories. Experiments could also be. conducted to targets on board space platforms to test distance and gravitational effects. Experiments to or from magnetically shielded rooms or certain earth locations (e.g., the magnetic pole) might indicate if magnetic fields influence the phenomenon. Experiments to opposite sides of the earth might also indicate if a mass or gravity effect can be noted. (S/NF/SG/LIMDIS) This area of investigation will be integrated with a variety of applications in coordination with findings/investigations pursued by the in-house effort. Figure 9 identifies the main application or operational areas. Along with types of data desired. This activity will be integrated, where possible, into in-house pursuits that will explore these areas in a systematic fashion. Initial emphasis will be in counternarcotics and counterterrorism areas. (S/NF/SG/LIMDIS) Specific types of applications that will be explored in-depth include the search problem. Search tasks are expected. to remain .as high priority operational tasks (e.g., hostage location, lost equipment or system location). Search tasks are complicated by timing issues, especially if the missing target is being moved frequently. Related to this will be examination of predictive capability in order to evaluate feasibility of detecting hostile plans and intentions in advance. Pilot studies of other areas (e.g., code breaking, medical diagnostics, low intensity conflict support) will also be initiated. (S/NF/SG/LIMDIS) Another application area that will be examined is "communications". Previous research indicates that with proper protocols, basic or coded messages can be sent and received via AC procedures. Redundant coding methods can readily enhance probability of success, and new statistical methods can also improve success rates. Communication SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LINDIS Appro ase 2003/04/18 : CIA-RDP96-00789R00270001 001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET applications may have significant value for search problems by providing additional information on location of kidnapped or hostage victims. Such techniques might also help in determining hostage or POW state-of-health or other significant issues. d. (U) PROTOCOLS (U) Given the laboratory success of AC experimentation, the protocol task can build upon a substantial literature. Determining optimal, specialty-dependent protocols only require extending current concepts. Several years are required due to the statistical nature of analysis that:is required to determine the effects of environment, receiver, target and feedback conditions. Several high-interest application areas (such as search/location) will be examined in detail. A variety of session procedures will be evaluated to determine those that are beneficial to improving data quality. (S/NF) Protocol effectiveness may be measured by quality, quantity, and/or usefulness of the AC information elicited by its use. The requirements for protocols that are designed for laboratory settings are considerably more restrictive than those required for operational settings. For example, providing limited information to a receiver while an operational session is in progress (i.e., intermediate feedback) might facilitate the acquisition of the desired data. This kind of feedback is strictly prohibited, however, in most protocols designed for laboratory experiments. Protocols may also vary depending on nature of the data required. For example, for some search projects, only general data may be adequate. For such cases would not require development of highly specific details and protocols the sessions would not be as complex. (U) A detailed protocol will need to consider a variety of potential session variables such as the individuals' physical environment, mental state and attitude, and how the target or task is designated (e.g., coordinates, abstract terms). Other data includes specifics of the session (monitor present or not), type of feedback, type of response data (e.g., predictive), and mode and method of response (e.g., drawings, verbal). (S/NF) Concurrently, the only known way to resolve the above issues is to conduct a large number of trials for a given individual with as many of the potential variables as possible held constant. Standard statistical methods can then be used to identify trends, patterns, and operational constraints. e. (U) DATA ANALYSIS (U) This area requires extensive review of leading analysis tools, such as those required for describing SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LIMDIS Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET imprecise concepts or data (i.e., artificial intelligence techniques, fuzzy sets). This work will be combined with findings from neural network analysis and research, or possibly combinations of other emerging advanced analysis methods. (S/NF) Various approaches that are anticipated to directly benefit operational evaluations. One promising technique involves procedures based on an adaptive (frequent data base update) approach. This will permit an individual's progression, and possibly time dependent data variables in an individual's track record, to be identified. (S/NF) In addition to the search for new analysis methods, the current methods will also be reexamined. Laboratory requirements differ from those for operational activities in that the target can be controlled and well defined. For operational activities, uncertainties in tasking may arise, especially if operational requirements are changing or if some of the initial "known" data are incorrect. Such uncertainties complicate later analyses. (S/NF) Analysis methods will also be developed that can make predictions on data quality for any given task. This will require development of an extensive track record for each individual based on both controlled and operational projects. (S/NF) These analysis methods will also address certain practical issues. For example, a detailed, high-quality example of AC data may have little value to an intelligence analyst if that information was known from other sources. Likewise, a poor example of AC data might provide a single element as a tip-off for other assets, or provide the missing piece in a complex analysis, and thus be quite valuable. The intelligence utility of AC data. may in some cases be only weakly connected to the AC quality. Therefore a data fusion analysis procedure is needed for AC-derived operational data. Methods that permit appropriate data analysis from an accuracy and utility viewpoint will be developed. f. (U) INTEGRATION (U) This activity would be an on-going review/ integration effort in order to identify patterns or clues useful for understanding practical aspects of this phenomenological area. (S/NF) Identifying approaches and procedures that permit assimilation of AC data from operational support projects into all-source intelligence analysis procedures will also be SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LIMDIS Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET part of this support activity. Depending on results of applied research findings and operational pursuits, a basic seminar/ training program for other applications-oriented elements might be established. Such a training/seminar program would focus on basic techniques and would augment possible operational training activity that might become part of the in-house effort. This would require several years to develop and establish. (S/NF) The specific experiments to be conducted in these research domains will be defined during the first six to SG1B nine months of the program utilizing the recommendations of the working groups mentioned above subject to approval by the Scientific Oversight Committee. SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LINDIS Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SG1B Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SG1 B Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET IX. (U) POTENTIAL RESEARCH RETURN: (S/NF/SG/LIMDIS) The research pursuits identified in the overall research and peer review plan have the potential for achieving highly significant results using AMP to address problems of national security by pushing the phenomena to their natural limits. This overall result can be achieved by accomplishing the aforementioned program plan goals. X. (U) PROGRAM OVERSIGHT A. (U) PROJECT OVERSIGHT METHODOLOGY: 1. (U) PROGRAM MANAGEMENT/OVERSIGHT (S/NF) DIA, as executive agent, proposes to implement a management structure that fosters a proactive, responsive, and creative environment for this activity. Both the external research and in-house activities will be centered in the Technology Assessment and Support Activity under the supervision of the Chief, Office for Ground Forces (DIA/PAG). 2. (U) SCIENTIFIC OVERSIGHT (S/NF) Scientific oversight will be provided by the 3. (U) CONTRACTOR OVERSIGHT a. (U) A contractor sponsored Scientific Oversight Committee (SOC), consisting of scientists from the following disciplines: physics, astronomy, statistics, neuroscience, and psychology, will be tasked with the following: -- (U) Reviewing and approving all SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LIMDIS Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET experimental protocols prior to the collection of experimental data. -- (U) Reviewing all experimental final reports as if they were submissions to technical scientific journals. -- (U) Proposing directions for further -- (U) Conducting un-announced drop-in privileges to view experiments in progress. b. (U) An contractor sponsored Human Use Review Board will also be formed and charged with the responsibility of assuring compliance with all U.S. and DoD regulations with regard to the use of humans in experimentation and assuring their safety. Members should represent the health, legal, and spiritual professions IAW government guidelines. XI. (U) DEVELOPMENT OF EVALUATION CRITERIA: A. (U) SCIENTIFIC VALIDITY (S/NF) A thorough review of DoD's activities in AMP was conducted in 1987 to evaluate the use of AMP for intelligence gathering purposes. The overall findings of this evaluation were that "...the Project Review Group has determined to its satisfaction that the work of the Enhanced Human Performance Group is scientifically sound...and is providing valuable insight into the nature of an anomaly which have a significant impact on the DoD." This research and development program will both draw from and add to this extensive data base to further demonstrate the scientific validity and practicality of AMP. B. (U) PERFORMANCE (S/NF) The ability of the STAR GATE program to produce results that have an intelligence value can only be measured by customer feedback evaluations. STAR GATE has developed feedback mechanisms and procedures for customers that should result in a method of quantifying this subjective feedback data so that operational value added and cost-effectiveness can be measured. XII. (U) BUDGET AND RESOURCE REQUIREMENTS (FYs 95-99): (S/NF/SG/LIMDIS) Due to the diversity of the STAR GATE mission/objectives, both external resources and in-house expertise are required. Since this Activity possesses no in- house R&D capability, an absolute need for external R&D support is required to meet Congressional concerns which are addressed in SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LIMDIS ase 2003/04/18 : CIA-RDP96-00789R00270001 001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET this program plan. A balance will be maintained between external and in-house activities, and every effort will be made to integrate and link these activities where appropriate. The external aspect permits a wide range of expertise covering many disciplines to be focused on this area; this also has the benefit of ensuring peer group review and of facilitating a variety of scientific interactions. In-house personnel with a wide-range of expertise in this phenemenology will need to be retained to make this proposed plan work. (S/NF/SG/LIMDIS) In order to fulfill Congressional Direction, the DIA proposes to convene a Scientific Evaluation Panel (SEP) composed of representatives from each of the Service Scientific Advisory Boards. The purpose of the SEP is to review and validate the methodology outlined in the plan in order to address the cost-effectiveness and performance criteria for the STAR GATE program's research and. development objectives and to propose recommendations as to which objectives should be pursued and the program scope required to achieve those objectives. If the SEP determines that objectives in the plan are viable and executable, the General Defense Intelligence Program (GDIP) Manager will complete this initiative with others for limited available resources remaining in the program. (U) The proposed ongoing R&D effort will be reviewed every two years by the SEP to determine whether the STAR GATE program can show results that are cost-effective and satisfy reasonable performance criteria. (C) An annual report will document the current operational, technical and administrative status of the program. SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LIMDIS Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET APPENDIX A CONGRESSIONALLY-DIRECTED ACTION DEFENSE AUTHORIZATION CONFERENCE (S/NF) REQUEST: "The conferees are concerned that insufficient funds have been spent on research and development to establish the scientific basis for the STAR GATE program. The conferees direct the Director of DIA to prepare a program plan and to submit an appropriate budget request for a research effort, over several years, to determine whether the STAR GATE program can show results that are cost-effective and satisfy reasonable performance criteria. This plan, and any research under this program, should be subject to peer review by neutral scientific experts. The Director of DIA is directed to prepare this research and peer review plan within existing program funds." SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LINDIS Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET APPENDIX B TERMINOLOGY AND DEFINITIONS (U) PHENOMENA TERMINOLOGY: (U) This phenomenological area has had a variety of descriptive terms over the years, such as paranormal, parapsychological, or as psychical research. Foreign researchers use other terms: "psychoenergetics" in the USSR; "extraordinary human function" in the People's Republic of China (PRC). In general, this field is concerned with a largely unexplored area of human consciousness/subconsciousness interactions associated with unusual or underdeveloped human capabilities. (U) Recently, researchers have shown a preference for terms that are neutral and that emphasizes the anomalous or enigmatic nature of this phenomena. The term anomalous mental phenomena (AMP), is generally preferred. (U) This area has two aspects; information access and energetics influence. Information access refers to a mental ability to describe remote areas or to access concealed data that are otherwise shielded from all known sensory channels. A recent term for this ability is anomalous cognition (AC). This term places emphasis on potential understanding that might be available from advances in sensory/brain functioning research or other related research. Older terms for this aspect have included extra-sensory perception (ESP), remote viewing (RV), and in some cases, precognition. (U) The energetics aspect refers to the ability to influence, via mental volition, physical or biological systems by an as yet unknown physical mechanism. An example of physical system influence would include affecting the output of sensors or electronic devices; biological systems influence would include affecting physiological parameters of an individual. A recent descriptive term for this ability is anomalous perturbation (AP). Older terms for this phenomenon included psychokinesis (PK) or telekinesis. (U) GENERAL DEFINITIONS: (S/NF) For this program, basic research is-defined to mean any investigation or experiment for determining fundamental SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LIMDIS Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET processes or for uncovering underlying parameters that are involved in this phenomenon. Basic research is primarily oriented toward understanding the physical, physiological , and psychological mechanisms of anomalous mental phenomena (AMP). (S/NF) Applied research refers to any investigation directed toward developing particular applications or for improving data quality and reliability. For anomalous cognition (AC) phenomenon, research is primarily directed toward improving the output quality of AC data. This would include ways to develop/improve utility of AC data for variety of potential application. For example, examination of spatial and temporal relationships of AC data could assist in developing a reliable search capability useful for locating missing people or equipment. SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LIMDIS Approved For Release 2003/04/18 : CIA-RDP96-00789R00270001 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET APPENDIX C POTENTIAL RESEARCH SUPPORT FACILITIES Science Applications International Corp. Mind Science Foundation Princeton Engineering Anomalies Laboratory American Society for Psychical Research St. John's University Foundation for Research into the Nature of Man ARE/Atlantic University University of Virginia Psychophysical Research Laboratories Edinburgh University OTHER RELATED DISCIPLINES. Psychology Stanford University Cornell University Anthropology University of California University of Arizona Psychophysiology SRI International Langly-Portor Neuropsychiatric Institute Menninger Foundation Psychoimmunology California Institute for Transpersonal Psychology Cognitive Neuroscience Los Alamos National Laboratory Sandia National Laboratory University of California Los Altos, CA San Antonio, TX Princeton Univ, NJ New York, NY Long Island, NY Durham, NC Virginia Beach, VA Charlottesville, VA Edinburgh, Scotland Edinburgh, Scotland Stanford, CA Ithaca, NY Berkeley, CA Tucson, AZ Menlo Park, CA San Francisco, CA Topeka, KS Menlo Park, CA Los Alamos, NM Albuquerque, NM San Diego, CA SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LINDIS Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET Cognitive Psychology Psychology Department, Princeton Univ Princeton, NJ Psychology Department, City College of New York, NY New York Artificial Intelligence Massachusetts Institute of Technology Stanford University Neural Networks Massachusetts Institute of Technology Science Applications International Corp Statistics/Signal Analysis University of California Harvard University Thermodynamics Rochester University Physics Department, Stanford University Quantum Measurement International Business Machines, Research Laboratories Cambridge, MA Stanford, CA Cambridge, MA Los Altos,' CA Davis, CA Cambridge, MA Rochester, NY Stanford, CA College Park, MD General Relativity California Institute of Technology Pasadena, CA University of Texas at Austin Austin, TX Electromagnetic/Basic Research Electronetics Corp Buffalo, NY Battelle Corp Columbus, OH Institute for Advanced Study Austin, TX W eo SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LINDIS Approved ForRel base 2003/04/18 : CIA-RDP96-00789R00270001 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET APPENDIX D RESOURCE LITERATURE 1. A.R.E. Journal 2. Abnormal hypnotic Phenomena 3. American Anthropologist 4. American Ethnologist 5. American Journal of Clinical Hypnosis 6. American Journal of Physiology 7. American Journal of Sociology 8. American Psychologist 9. American Society for Psychical Research 10. Annals of Eugenics 11. Annals of Mathematical Statistics 12. Annales de Sciences Psychiques 13. Archivo di Psicologica Neurologic e Psychiatra 14. Association for the Anthropological Study of Consciousness tt N l ews e er 15. Behavioral and Brain Science 16. Behavioral Science 17. Bell System Technical Journal 18. Biological Psychiatry 19. Biological Review 20. British Journal for the Philosophy of Science 21. British Journal of Psychology 22. Bulletin of the American Physical Research 23. Bulletin of the Boston Society for Psychic Research 24. Bulletin of the Los Angeles Neurological Societies 25. Contributions to Asian Studies 26. Electroencephalography and Clinical Neurophysiology 27. Endeavour 28. Ethnology 29. Exceptional Human Experience 30. Experientia 31. Experimental Medicine and Surgery 32. Fate 33. Fields within Fields 34. Foundations of Physics 35. Hibbert Journal 36. Human Biology 37. International Journal of Clinical and Experimental Hypnosis 38. International Journal of Comparative Sociology SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LIMDIS Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET is 39. International Journal of Neuropsychiatry 40. International Journal of Parapsychology 41. International Journal of Psychoanalysis 42. Journal of Abnormal and Social Psychology 43. Journal of Altered States of Consciousness 44. Journal of Applied Physics 45. Journal of Applied Psychology 46. Journal of Asian and African Studies 47. Journal of Biophysical and Biochemical Cytology 48. Journal of Cell Biology 49. Journal of Communication 50. Journal of Comparative and Physiological Psychology 51. Journal of Consulting Psychology 52. Journal of Existential Psychiatry 53. Journal of Experimental Biology 54. Journal of Experimental Psychology 55. Journal of General Psychology 56. Journal of Genetic Psychology 57. Journal of Mind and Behavior 58. Journal of Nervous and Mental Diseases 59. Journal of Personality 60. Journal of Personality and Social Psychology 61. Journal of Research in PSI Phenomena 62. Journal of Scientific Exploration 63. Journal of the American Academy of Psychoanalysis 64. Journal of the London Mathematical Society 65. Journal of the Royal Anthropological Institute of Great Britain and Ireland 66. Metapsichica 67. Mind-Brain Bulletin 68. Motivation and Emotion 69. Nature 70. Naturwissenschaftliche Rundschau 71. New Horizons 72. New scientist 73. New Sense bulletin 74. Newsletter of the Parapsychology Foundation 75. Parapsychology Bulletin 76. Parapsychology Abstracts International 77. Parapsychology Review 78. Perceptual and Motor Skills 79. Philosophy of Science 80. Physiology and Behavior 81. Proceedings of the Society for Psychical Research 82. Psychedelic Review 83. Psychic SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LIMDIS Appro ed For Rel ase 2003/04/18 : CIA-RDP96-00789R00270001 001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET 84. Psychic science 85. Psychoanalytic Quarterly 86. Psychoanalytic Review 87. Psychological Bulletin 88. Psychometrika 89. Psychophysiology 90. Physics Today 91. 92. 93. 94. 95. Renti Teyigongneng (EFHB Research) [PRC] Revue Metapsychique Revue Philosophique Revue Philosophique de la France et de L'Etranger Revue Philosophique Applique 96. Science 97. Skeptical Inquirer 98. Social Studies of science 99. Subtle Energies 100. The Humanistic Psychology Institute 101. The Journal of Parapsychology 102. The Journal of the American Society for Psychical Research 103. Theta 104. Tijdschrif voor Parapsychologie 105. Tomorrow 106. Voprosy Filosofi (Questions of Philosophy) [RUSSIA] 107. Western Canadian Journal of Anthropology 108. Zeitschrift fur die Gesamte Neurologie and Psychiatrie 109. Zietschrift fur Parapsychologie and Grenzgebeite der Psychologie 110. Zietschrift fur Tierpsychologie 111. Zietschrift fur Vergleichende Physiologie 112. Zetetic Scholar 113. Zhongguo Shebui Kexue (China Social Sciences) [PRC] 114. Ziran Zazhi (Nature) [PRC] SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LIMDIS Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET APPENDIX E CURRENT CONTRACTOR SCIENTIFIC OVERSIGHT COMMITTEE MEMBERSHIP Steven A. Hillyard - Professor of Neurosciences, Department of Neurosciences, University of California, San Diego. - Author or coauthor of 118 technical neuroscience publications. - Eighty-two invited presentations at technical conferences. - Ph.D., Yale University, 1968 (Psychology). S. James Press - Professor of Statistics, Department of Statistics, University of California, Riverside. - Author or coauthor of 132 statistics publications. - Author of 12 books and/or monographs. - Ph.D., Stanford University, 1964 (Statistics). Garrison Rapmund - Responsible for facilitating transfer of Strategic Defense Initiative technologies to health care industries. - Major General, USA retired in 1986 as Assistant Surgeon General (R&D) and Commander, Army Medical R & D Command. - M.D., Columbia University, 1953 (Pediatrics). Melvin Schwartz - Associate Director for High Energy and Nuclear Physics, Brookhaven National Laboratory. - Author or coauthor of 40 technical publications in high energy physics, author of "Principles of Electrodynamics." - Nobel Prize, Physics (1988). - Ph.D., Columbia University, 1958 (Physics). Yervant Terzian - Professor of Physical Sciences, Chairman of the Department of Astronomy, Cornell University. - Author/coauthor of numerous technical publications and books. - Ph.D., Indiana University, 1965 (Astronomy). Phillip G. Zimbardo - Professor of Psychology, Department of Psychology, Stanford University. - Author/coauthor of numerous experimental psychology publications. - - Ph.D., Yale University, 1959 (Psychology). SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LIMDIS Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET APPENDIX F CURRENT CONTRACTOR INSTITUTIONAL REVIEW BOARD MEMBERSHIP Byron Wm. Brown, Jr., Ph.D. - Biostatistics, Stanford University Gary R. Fujimoto, M. D. - Occupational Medicine, Palo Alto Medical Foundation John Hanley, M. D. - Neuropsychiatry, University of California, Los Angeles Robert B. Livingston,, M. D. - Neuroscience, University of California, San Diego Robin P. Michelson, M. D. - Otolaryngology, University of California, San Francisco Ronald Y. Nakasone, Ph.D. - Buddhist Studies, Institute of Buddhist Studies, Berkeley, CA Garrison Rapmund, M. D. (Chair) - Air Force Science Advisory Board Louis J. West, M. D. - Neuropsychiatry, University of California, Los Angeles SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LIMDIS Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 SECRET APPENDIX G ACADEMIC STUDIES REGARDING THE SCIENTIFIC VALIDITY OF AMP SECRET NOT RELEASABLE TO FOREIGN NATIONALS STAR GATE LIMDIS Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Psychological Bulletin (January, 1994) Version 4.7 October 1, 1993 Does Psi Exist? Replicable Evidence for an Anomalous Process of Information Transfer Daryl J. Bem and Charles Honorton Most academic psychologists :do not yet accept the existence of psi, anomalous processes of in- formation or energy transfer (such. as telepathy or other forms of extrasensory perception) that are currently unexplained in terms of known physical or biological mechanisms. We believe that the replication rates and effect-sizes achieved,by one particular experimental method, the ganzfeWd procedure, are now.suflicient to warrant bringing -this .body of data to the attention of the wider psychological community. Competing meta-analyses of the ganzfeld database are re- viewed, 1 by R. Hyman (1985), .a skeptical critic of psi research, and the other by C. Honorton (1985), a parapsychologist and major contributor to the ganzfeld-database. Next-the results of 11 new ganzfeld studies that.comply with guidelines jointly authored by IL Hyman and C. CPYRGHT Honorton (1986) are summarized. Finally, issues ofreplication and theoretical explanation are discussed. The term psi denotes anomalous processes of informa- tion or energy transfer, processes such as telepathy or other forms of extrasensory perception that are currently unexplained in terms of known physical or biological mechanisms. The term is purely descriptive: It neither implies that such anomalous phenomena are paranormal nor connotes anything about their underlying mecha- nisms. Does psi exist? Most academic psychologists don't think so. A survey of more than 1,100 college :professors in the United States found that 55% of natural scientists, 66% of social scientists (excluding psychologists), and 77% of aca- demics in the arts, humanities, and education believed that ESP is either an established fact or a likely possibil- ity. The comparable figure for psychologists was only 34%. Moreover, an equal number of psychologists declared ESP to be an impossibility, a view expressed by only 2% of all other respondents (Wagner & Monnet,1979). Daryl J. ?Bem, Department of Psychology, Cornell University. Charles Honorton, Department of Psychology. University of Ed- inburgh. Edninburgh, Scotland. 199 Sadly. Charles Honorton died of a heart attack on November 4, days before this article was accepted for publication. He was 46. Parapsychology has lost one of its most valued contribu- tors. I have lost a valued friend. This collaboration had its origins in a 1983 visit I made to Honorton's Psychophysical Research Laboratories (PRL) in Princeton, New Jersey, as one of several outside consultants brought in to examine the design and implementation of the ex- perimental protocols. Preparation ofthis article was supported, in .part, by grants to Charles Honorton from the American Society for Psychical Re- search and the Parapsychology Foundation, both of New York City. The work at PRL summarized in the second half of this ar- ticle was supported by the James S. McDonnell Foundation of St. Louis, Missouri, and by the John E. Fetzer Foundation of Kala- mazoo, Michigan. Helpful comments on drafts of this article were received from Deborah Delany, Edwin May. Donald McCarthy, Robert Morris, John Palmer, Robert Rosenthal, Lee Ross, Jessica Utts, Philip Zimbardo, and two anonymous reviewers. Correspondence concerning this article should be addressed to Daryl J. Bem, Department of Psychology, Uris Hall, Cornell University, Ithaca, New York 14853. (Electronic mail may be sent to d bem?oornelLedu). Psychologists are probably more skeptical about psi for several reasons. First, we believe that extraordinary claims require extraordinary proof. And although our col- leagues from other disciplines would probably agree with this dictum, we are more likely to be familiar with the methodological and statistical requirements for sustaining such claims, as well as with previous claims that failed ei- ther to meet those requirements or to survive the test of successful replication. Even for ordinary claims, our con- ventional statistical criteria are conservative. The sacred p a .05 threshold is a constant reminder that it is far more sinful to assert that an effect exists when it does not (the Type I error) than to. assert that an effect does not exist when it does (the Type II error). Second, most of us distinguish sharply between phe- nomena whose explanations are merely obscure or contro- versial.(e.g., hypnosis) and.phenomena such as psi that would appear to fall outside our current explanatory framework altogether. (Some would characterize this as the difference between the unexplained and the inexplica- ble.) 'In contrast, many laypersons treat all exotic psycho- logical phenomena as epistemologically equivalent; many even consider d6jh vu to be a psychic phenomenon. The blurring of this critical distinction is aided and abetted by the mass media, 'new age books and mind-power courses, and , psychic' entertainers who present both genuine hyp- nosis and fake `mind reading" in the course of a single performance. Accordingly, most laypersons would not have to revise their conceptual model of reality as radi- cally as we would to assimilate the existence of psi. For us, psi is simply more extraordinary. Finally, research in cognitive and social psychology has sensitized us to the errors and biases that plague intuitive attempts to draw valid inferences from the data of every. day experience (Gilovich, 1991; Nisbett & Ross, 1980; Tversky & Kahneman, 1971). This leads us to give virtu- ally no probative weight to anecdotal or journalistic re- ports of psi, the main source cited by our academic col- leagues. as evidence for their beliefs about psi (Wagner & Monnet, 1979). Ironically, however, psychologists are probably not more familiar than others with recent experimental research on psi. Like most psychological research, parapsychological research is reported primarily in specialized journals; un- like most psychological research, however, contemporary parapsychological research is not usually reviewed or Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT summarized in psychology's textbooks, handbooks, or mainstream journals. For example, only 1 of 64 introduc- tory psychology textbooks recently surveyed even men- tions the experimental procedure reviewed in this article, a procedure that has been in widespread use since the early 1970a (Roig, Ieochea, & Cuzzucoli,1991). Other sec- ondary sources for nonspecialists are frequently inaccu- rate in their descriptions of parapsychological research. (For discussions of this problem, see Child, 1985; and Palmer, Honorton, & Utts,1989.) This situation may be changing. Discussions of modern psi research have recently appeared in a widely used in- troductory textbook (Atkinson, Atkinson, Smith, & Bem, 1990, 1993), two mainstream psychology journals (Child, 1985; Rao & Palmer, 1987), and a scholarly but accessible book for nonspecialists (Broughton, 1991). The purpose of the present article is to supplement these broader treat- ments with a more detailed, meta analytic presentation of evidence issuing from a single experimental method: the ganzfeld procedure. We believe that the replication rates and effect sizes achieved with this procedure are now suf- ficient to warrant bringing this body of data to the atten- tion of the wider psychological community. The Ganzfeld Procedure By the 1960s, a number of parapsychologists had be- come dissatisfied with the familiar ESP testing methods pioneered by J. B. Rhine at Duke University in the 1930s. In particular, they believed that the repetitive forced- choice procedure in which a subject repeatedly attempts to select the correct `target' symbol from a set of fixed alter- natives failed to capture the circumstances that character- ize reported instances of psi in everyday life. Historically, psi has often been associated with medita- tion, hypnosis, dreaming, and other naturally occurring or deliberately induced altered states of consciousness. For example, the view that psi phenomena can occur during meditation is expressed in most classical texts on medita- tive techniques; the belief that hypnosis is a psi-conducive state dates all the way back to the days of early mes- merism (Dingwall, 1968); and cross-cultural surveys indi- cate that most reported 'real-life psi experiences are me- diated through dreams (Green, 1960; Prasad & Stevenson, 1968; L. E. Rhine, 1962; Sannwald, 1959). There are now reports of experimental evidence consis- tent with these anecdotal observations. For example, sev- eral laboratory investigators have reported that medita- tion facilitates. psi performance (Honorton, 1977). A meta- analysis of 25 experiments on hypnosis and psi conducted between 1945 and 1981 in 10 different laboratories sug- gests that hypnotic induction may also facilitate psi per- formance (Schechter, 1984). And dream mediated psi was reported in a series of experiments conducted at Mai- monides Medical Center in New York and published be- tween 1966 and 1972 (Child, 1985; Ullman, Krippner, & Vaughan, 1973). In the Maimonides dream studies, two subjects-a `receiver" and a `sender'_apent the night in a sleep labo- ratory. The receiver's brain waves and eye movements were 'monitored as he or she slept in an isolated room. When the receiver entered a period of REM sleep, the ex- perimenter pressed a buzzer that signaled the sender- under the supervision of a second experimenter-to begin a sending period. The sender would then concentrate on a randomly chosen picture (the "target") with the goal of in- fluencing the content of the receiver's dream. Toward the end of the REM period, the receiver was awakened and asked to describe any dream just experi- enced. This procedure was repeated throughout the night with the same target. A transcription of the receiver's dream reports was given to outside judges who blindly rated the similarity of the night's dreams to several pic- tures, including the target In some studies, similarity rat- ings were also obtained from the receivers themselves. Across several variations of the procedure, dreams were judged to be significantly more similar to the target pic- tures than to the control pictures in the judging sets (failures to replicate the Maimonides results were also re- viewed by Child, 1985). These several lines of evidence suggested a working model of psi in which psi-mediated information is concep- tualized as a weak signal that is, normally masked by in- ternal somatic and external sensory `noise.' By reducing ordinary sensory input, these diverse psi-conducive states are presumed to raise the signal-to-noise ratio, thereby enhancing a person's ability to detect the psi-mediated in- formation (Honorton, 1969, 1977). To test the hypothesis that a reduction of sensory input itself facilitates psi per- formance, investigators turned to the ganzfeld procedure (Brand, Wood, & Braud, 1975; Honorton & Harper, 1974; Parker, 1975), a procedure originally introduced into ex- perimental psychology during the 1930s to test proposi- tions derived from Gestalt theory (Avant, 1965; Metzger, 1930). Like the dream studies, the psi ganzfeld procedure has most often been used to test for telepathic communication between a sender and a receiver. The receiver is placed in a reclining chair in an acoustically isolated room: Translucent ping-pong ball halves are taped over the eyes and headphones are placed over the ears; a red floodlight directed toward the eyes produces an undifferentiated vi- sual field and white noise played through the headphones produces an analogous auditory field. It is this homoge- neous perceptual environment that is called the Ganzfeld ("`total field"). To reduce internal somatic 'noise,' the re- ceiver typically also undergoes aseries of progressive re- laxation exercises at the beginning of the ganzfeld period. The sender is sequestered in a separate acoustically iso-' lated room, and a visual stimulus (art print, photograph, or brief videotaped sequence) is randomly selected from a large pool of such stimuli to serve as the target for the session. While the sender concentrates on the target, the receiver provides a continuous verbal report of his or her ongoing imagery and mentation, usually for about 30 minutes. At the completion of the ganzfeld period, the re- ceiver is presented with several stimuli (usually four) and, without knowing which stimulus was the target, is asked to rate the degree to which each matches the imagery and mentation experienced during the ganzfeld period. If the receiver assigns the highest rating to the target stimulus, it is scored as a'hit.' Thus, if the experiment uses judging sets containing four stimuli (the target and three decoys or control stimuli), the hit rate expected by chance is .25. The ratings can also be analyzed in other ways; for exam- ple, they can be converted to ranks or standardized scores within each set and analyzed parametrically across ses- sions. And, as with the dream studies, the similarity rat- ings can also be made by outside judges using transcripts of the receiver's mentation report. pr Appro ed ForRel ase 2003/04/18 : CIA-RDP96-00789R00270001 9001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 ANOMALOUS INFORMATION TRANSFER Meta-Analyses of the Ganzfeld Database In 1985 and 1986, the Journal of Parapsychology de- voted two entire issues to a critical examination of the ganzfeld database. The 1985 issue comprised two contri- butions: (a) a meta-analysis and critique by Ray Hyman (1985), a cognitive psychologist and skeptical critic of parapsychological research, and (b) a competing meta. analysis and rejoinder by Charles Honorton (1985), a parapsychologist and major contributor to the ganzfeld database. The 1986 issue contained four commentaries on the Hyman-Honorton exchange, a joint communique by Hyman and Honorton, and six additional commentaries on the joint communique itself. We summarize the major issues and conclusions here. Replication Rates Rates by study. Hyman's meta-analysis covered 42 psi ganzfeld studies reported in 34 separate reports written or published from 1974 through 1981. One of the first problems he discovered in the database was multiple analysis. As noted earlier, it is possible to calculate sev- eral indexes of psi performance in a ganzfeld experiment and, furthermore, to subject those indexes to several kinds of statistical treatment. Many investigators reported mul- tiple indexes or applied multiple statistical tests without adjusting the criterion significance level for the number of tests conducted. Worse, some may have `shopped' among the alternatives until finding one that yielded a signifi- cantly successful outcome. Honorton agreed that this was a problem. Accordingly, Honorton applied a uniform test on a common index across all studies from which the pertinent datum could be extracted, regardless of how the investiga- tors had analyzed the data in the original reports. He se- lected the proportion of hits as the common index because it could be calculated for the largest subset of studies: 28 of the 42 studies. The hit rate is also a conservative index . because it discards most of the rating information; a sec- ond place ranking-a `near 'miss =receives no more credit than a last place ranking. Honorton then calculated the exact binomial probability and its associated z score for each study. Of the 28 studies, 23 (82%) had positive z scores (p = 4.6 x 10-4, exact binomial test with p = q = .5). Twelve of the studies (43%) had z scores that were independently significant at the 5% level (p = 3.5 x 10-9, binomial test with 28 studies, p = .05, and q = .95), and 7 of the studies (25%) were independently significant at the 1% level (p = 9.8 x 10-9). The composite Stouffer z score across the 28 studies was 6.60 (p = 2.1 x 10-11).1 A more conservative estimate of significance can be obtained by including 10 additional studies that also used the relevant judging pro- cedure but did not report hit rates. If these studies are as- signed a mean z score of zero, the Stouffer z across all 38 studies becomes 5.67 (p = 7.3 x 10-9). Thus, whether one considers only the studies for which the relevant information is available or includes a null es- timate for the additional studies for which the information is not available, the aggregate results cannot reasonably 1Stouffer's z is computed by dividing the sum of the r scores for the individual studies by the square root of the number of studies (Rosenthal, 1978). CPYRGHT 3 be attributed to chance. And, by design, the cumulative outcome reported here cannot be attributed to the infla- tion of significance levels through multiple analysis. Rates by laboratory. One objection to estimates such as those just described is that studies from a common labora- tory are not independent of one another (Parker, 1978). Thus, it is possible for one or two investigators to be dis. proportionately responsible for a high replication rate whereas other, independent investigators are unable to obtain the effect. The ganzfeld database is vulnerable to this possibility. The 28 studies providing hit rate information were con- ducted by investigators in 10 different laboratories. One laboratory contributed 9 of the studies, Honorton's own laboratory contributed 5, 2 other laboratories contributed 3 each, 2 contributed 2 each, and the remaining 4 labora- tories- each contributed 1. Thus, half of the studies were conducted by only 2 laboratories, 1 of them Honorton's own. Accordingly, Honorton calculated a separate Stouffer z score for each laboratory. Significantly positive outcomes were reported by 6 of the 10 laboratories, and the com- bined z score across laboratories was 6.16 (p = 3.6 x 10-10). Even if all of the studies conducted by the 2 most prolific laboratories are discarded from the analysis, the Stouffer z across the 8 other laboratories remains signifi- cant (z = 3.67, p = 1.2 x 10-4). Four of these studies are significant at the 1% level (p = 9.2 x 10"6, binomial test with 14 studies, p = .01, and q = .99), and each was con- tributed by a different laboratory. Thus, even though the total number of laboratories in this database is small, most of them have reported significant studies, and the significance of the overall effect does not depend on just one or two of them. Selective Reporting In recent years, behavioral scientists have become in- creasingly aware of the "file-drawer" problem: the likeli- hood that successful studies are more likely to be pub- lished than unsuccessful studies, which are more likely to be consigned to the file drawers of their disappointed in- vestigators (Bozarth & Roberts, 1972; Sterling, 1959). Parapsychologists were among the first to become sensi- tive to the problem, and, in 1975, the Parapsychological Association Council adopted a policy opposing the selec- tive reporting of positive outcomes. As a consequence, negative findings have been routinely reported at the as- sociation's meetings and in its affiliated publications for almost two decades. As has already been shown, more than half of the ganzfeld studies included in the meta- analysis yielded outcomes whose significance falls short of the conventional .05 level. A variant of the selective reporting problem arises from what Hyman (1985) has termed the -retrospective study.- An investigator conducts a small set of exploratory trials. If they yield null results, they remain exploratory and never become part of the official record; if they yield posi- tive results, they are defined as a study after the fact and are submitted for publication. In support of this possibil- ity, Hyman noted that there are more significant studies in the database with fewer than 20 trials than one would expect under the assumption that, all other things being equal, statistical power should increase with the square root of the sample size. Although Honorton questioned the Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 ANOMALOUS INFORMATION TRANSFER assumption that 'all other things" are in fact equal across the studies and disagreed with Hyman's particular statis- tical analysis, he agreed that there is an apparent cluster- ing of significant studies with fewer than 20 trials. (Of the complete. ganzfeld database of 42 studies, 8 involved fewer than 20 trials, and 6 of those studies reported statistically significant results.) Because it is impossible, by definition, to know how many unknown studies-exploratory or otherwise-are languishing in file drawers, the major tool for estimating the seriousness of selective reporting problems has be- come some variant of Rosenthal's file drawer statistic, an estimate of how many unreported studies with z scores of zero would be required to exactly cancel out the signifi- cance of the known database (Rosenthal, 1979). For the 28 direct-hit ganzfeld studies alone, this estimate is 423 fugi- tive studies, a ratio of unreported-to-reported studies of approximately 16:1. When it is recalled that a single ganzfeld session takes over an hour to conduct, it is not surprising that-despite his concern with the retrospec- tive study problem Hyman concurred with Honorton and other participants in the published debate that selective reporting problems cannot plausibly account for the over- all statistical significance of the psi ganzfeld database (Hyman & Honorton, 1986).2 Methodological Flaws If the most frequent criticism of parapsychology is that it has not produced a replicable psi effect, the second most frequent criticism is that many, if not moat, psi experi- ments have inadequate controls and procedural safe- guards. A frequent charge is that positive results emerge primarily from initial, poorly controlled studies and then vanish as better controls and safeguards are introduced. Fortunately, meta-analysis provides a vehicle for empir- ically evaluating the extent to which methodological flaws may have contributed to artifactual positive outcomes across a set of studies. First, ratings are assigned to each study that index the degree to which particular method- ological flaws are or are not present; these ratings are then correlated with the studies' outcomes. Large positive correlations constitute evidence that the observed effect may be artifactual. In psi research, the most fatal flaws are those that might permit a subject to obtain the target information in normal sensory fashion, either inadvertently or through deliberate cheating. This is called the problem of uensory leakage. Another potentially serious flaw is inadequate randomization of target selection. Sensory leakage. Because the ganzfeld is itself a percep- tual isolation procedure, it goes a long way toward elimi- nating potential sensory leakage during the ganzfeld por- tion of the session. There are, however, potential channels of sensory leakage after the ganzfeld period. For example, if the experimenter who interacts with the receiver knows the identity of the target, he or she could bias the re- ceiver's similarity ratings in favor of correct identification. Only one study in the database contained this flaw, a study in which subjects actually performed slightly below ?A 1980 survey of parapsychologists uncovered only 19 com- pleted but unreported ganzfeld studies. Seven of these had achieved significantly positive results, a proportion (.37) very similar to the proportion of independently significant studies in the meta-analysis (.43) (Blackmore, 1980). chance expectation. Second, if the stimulus set given to the receiver for judging contains the actual physical target handled by the sender during the sending period, there might be cues (e.g., fingerprints, smudges, or temperature differences) that could differentiate the target from the decoys. Moreover, the process of transferring the stimulus materials to the receiver's room itself opens up other po- tential channels of sensory leakage. Although contempo- rary ganzfeld studies have eliminated both of these possi- bilities by using duplicate stimulus sets, some of the ear- lier studies did not. Independent analyses by Hyman and Honorton agreed that there was no correlation between inadequacies of se- curity against sensory leakage and study outcome. Honor- ton further reported that if studies that failed to use du- plicate stimulus sets were discarded- from the analysis, the remaining studies are still highly significant (Stouffer z=4.36,p=6.8x10'6) Randomization. In many psi experiments, the issue of target randomization is critical because systematic pat- terns in inadequately randomized target sequences might be detected by subjects during a session or might match subjects' preexisting response biases. In a ganzfeld study, however, randomization is a much less critical issue be- cause only one target is selected during the session and most subjects serve. in only one session. The primary con- cern is simply that all the stimuli within each judging set be sampled uniformly over the course of the study. Simi- lar considerations govern the* secondsrandomization,. which takes place after the ganzfeld period and deter- . mines the sequence in which the target and decoys are presented to the receiver (or external judge) for judging. Nevertheless, Hyman and Honorton disagreed over the findings here. Hyman claimed there was a correlation be- tween flaws of randomization and study outcome; Honor- ton claimed there was not. The sources of this disagree. ment were in conflicting definitions of flaw categories, in the coding and assignment of flaw ratings to individual studies, and in the subsequent statistical treatment of those ratings. Unfortunately, there have beeni;ao ratings of fl awn by independent raters who were unaware of the studies' out- comes (Morris, 1991). Nevertheless, none of the contn'bu- tors to the subsequent debate concurred with Hyman's conclusion, whereas four nonparapsychologists-two statisticians and two psychologists-explicitly concurred with Honorton's conclusion (Harris & Rosenthal, 1988b; Saunders, 1985; Utts, 1991a). For example, Harris and Rosenthal (one of the pioneers in the use of meta-analysis in psychology) used Hyman's own flaw ratings and failed to find any significant relationships between flaws and study outcomes in each of two separate analyses: `Our analysis of the effects of flaws on study outcome lends no support to the hypothesis that Ganzfeld research results are a significant function of the set of flaw variables" (1988b, p. 3; for a more recent exchange regarding Hy. man's analysis, we Hyman, 1991; Utts, 1991a, 1991b). Effect Size . Some critics of parapsychology have argued that even if current laboratory-produced psi effects turn out to be replicable and nonartifactual, they are too small to be of theoretical interest or practical importance. We do not be- lieve this to be the case for the psi ganzfeld effect. W_ Appro ed For,Rel Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 ANOMALOUS INFORMATION TRANSFER In psi ganzfeld studies, the hit rate itself provides a straightforward descriptive measure of effect size, but this measure cannot be compared directly across studies be- cause they do not all use a four-stimulus judging set and, hence, do not all have a chance baseline of .25. The next most obvious candidate, the difference in each study be- tween the hit rate observed and the hit rate expected un- der the null hypothesis, is also intuitively descriptive but is not appropriate for statistical analysis because not all differences between proportions that are equal are equally detectable (e.g., the power to detect the difference between .55 and .25 is different from the power to detect the differ- ence between .50 and .20). To provide a scale of equal delectability, Cohen (1988) devised the effect size index h, which involves an arceine transformation on the proportions before calculation of their difference. Cohen's h is quite general and can assess the difference between any two .proportions drawn from independent samples or between a single proportion and any specified hypothetical value. For the 28 studies exam- ined in the meta-analyses, h was .28, with a 95% confi- dence interval from .11 to .45. But because values of h do not provide an intuitively descriptive scale, Rosenthal and Rubin (1989; Rosenthal, 1991) have recently suggested a new index, a,, which ap- plies specifically to one-sample, multiple-choice data of the kind obtained in ganzfeld experiments. In particular, it expresses all hit rates as the proportion of hits that would have been obtained if there had been only two equally likely alternatives- essentially a coin flip. Thus, xr ranges from 0 to 1, with .5 expected under the null hy- pothesis. The formula is x = P(k -1) P(k - 2) + I where Pis the raw proportion of hits and k is the number of alternative choices available. Because it has such, a straightforward intuitive interpretation, we use. it (or its. conversion back to an equivalent four-alternative hit rate) throughout this article whenever it is applicable. For the 28 studies examined in the meta-analyses, the mean value of Yrwas .62, with a 95% confidence interval from .55 to .69. This corresponds to a four-alternative hit rate of 35%, with a 95% confidence interval from 28% to 43%. Cohen (1988, 1992) has also categorized effect sizes into small, medium, and large, with medium denoting an effect size that should be apparent to the naked eye of a careful observer. For a statistic such as n which indexes the de- viation of a proportion from .5, Cohen considers .65 to be a medium effect size: A statistically unaided observer should be able to detect the bias of a coin that comes up heads on 65% of the trials. Thus, at .62, the psi ganzfeld effect size falls just short of Cohen's naked-eye criterion. From the phenomenology of the ganzfeld experimenter, the corresponding hit rate of 35% implies that he or she will see a subject obtain a hit approximately every third session rather than every fourth. It is also instructive to compare the psi ganzfeld effect with the results of a recent medical study that sought to determine whether aspirin can prevent heart attacks (Steering Committee of the Physicians' Health Study Re- search Group, 1988). The study was discontinued after 6 CPYRGHT years because it was already clear that the aspirin treat- ment was effective (p < .00001) and it was considered un. ethical to keep the control group on _ placebo medication. The study was widely publicized as a major medical breakthrough. But despite its undisputed reality and practical importance, the size of the aspirin effect is quite small: Taking aspirin reduces the probability of suffering a heart attack by only .008. The corresponding effect size (A) is .068, about one third to one fourth the size of the psi ganzfeld effect (Atkinson et al., 1993, p. 236; Utte, 1991b). In sum, we believe that the psi ganzfeld effect is large enough to be of both theoretical interest and potential practical importance. Experimental Correlates of the Psi Ganzfeld Effect We showed earlier that the technique of correlating variables with effect sizes across studies can help to as- sess whether methodological flaws might have produced artifactual positive outcomes. The same technique can be used more affirmatively to explore whether an effect varies systematically with conceptually relevant varia- tions in experimental procedure. The discovery of such correlates can help to establish an effect as genuine, sug- gest ways of increasing replication rates and effect sizes, and enhance the chances of moving beyond the simple demonstration of an effect to its explanation. This strat- egy is only heuristic, however. Any correlates discovered must be considered quite tentative, both because they emerge from post hoc exploration and because they neces- sarily involve comparisons across heterogeneous studies that differ simultaneously on many interrelated variables, known and unknown. Two such correlates emerged from the meta-analyses of the psi ganzfeld effect. Single- versus multiple-image targets. Although most of the 28 studies in the meta-analysis used single pictures as targets, 9 (conducted by three different investigators) used View Master stereoscopic slide reels that presented multiple images focused on a central theme. Studies using the View Master reels produced significantly higher hit rates than did studies using the single-image targets (50% vs. 34%), t(26) = 2.22, p -.035, two-tailed. Sender-giver pairing. In 17 of the 28 studies, partici- pants were free to bring in friends to serve as senders. In 8 studies, only laboratory-assigned senders were used. (Three studies used no sender.) Unfortunately, there is no record of how many participants in the former studies ac- tually brought in friends. Nevertheless, those 17 studies (conducted by six different investigators) had significantly higher hit rates than did the studies that used only labo- ratory-assigned senders (44% vs. 26%), t(23) = 2.39, p = .025, two-tailed. The Joint Communique After their published exchange in 1985, Hyman and Honorton agreed to contribute a joint communique to the subsequent discussion that was published in 1986. First they set forth their areas of agreement and disagreement: We agree that there is an overall significant effect in this data base that cannot reasonably be explained by selective reporting or multiple analysis. We continue to differ over the degree to which the effect constitutes evidence for psi, but we agree that the final verdict awaits the outcome of fu- ture experiments conducted by a broader range of investiga- Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT The National Research Council Report In 1988, the National Research Council (NRC) of the Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 ton and according to more stringent standards. (Hyman & The Autoganzfeld Studies Honorton, 1986, p. 351) They then spelled out in detail the 'more stringent In 1983, Honorton and-his colleagues initiated a new standards' they believed should govern future expert- series of ganzfeld studies designed to avoid the method. treats. These standards included strict security precau- studies ological (Honorton, h1979; and others had 979). flee studies tions against sensory leakage, testing and documentation complied with with all 1 Kennedy, guidelines hese studies of randomization methods for selecting targets and Be- Hyman publish of the detailed gelinthat he and quencing the judging set, statistical correction for multiple were to plater in their joint comm en u analyses, advance specification of the status of the ex- The program continued until September, 1989, , when a periment (e.g:, pilot study or confirmatory experiment), loss of funding forced the laboratory to close. The major and full documentation in the published report of the ex. innovations of the new studies were the computer control perimental procedures and the status of statistical tests of the experimental protocol-hence the name auto- (e.g., planned or post hoc), ganzfeld--end the introduction of videotaped film clips as t t t it would be implausible to entertain the null a flo J y Zak' At that point, the sender moved to the re- givn the ce m arge s unuh. Method The basic design of the autoganzfeld studies was the y p cued report commissioned by the U.S. Army that assessed same as that described earlier4:- A receiver and sender several controversial technologies for enhancing human were sequestered in separate, acoustically-isolated chain- Performance, including accelerated learning, neurolin- bars- After a 14-minute period of progressive relaxation, guistic programming, mental practice, biofeedback, and the receiver underwent ganzfeld stimulation while de- parapsychology (Druclniaa & Swats, 1988; summarized in scribing his or her thoughts and images aloud for 30 min- Swets & Bjork, 1990). The report's conclusion concerning mss- Meanwhile, the sender concentrated on a randomly parapsychology was quite negative: Me Committee finds selected target. At the end of the ganzfeld period, the re- no scientific justification from research conducted over a ceiver was shown four stimuli and, without knowing period of 130 years for the existence of parapsychological which of the four had been the target, rated each stimulus phenomena' (Druckman & Sweta,1988, p. 22). for its similarity to his or her mentation during the An extended refutation strongly protesting the commit- ganzfeld. tee's treatment of parapsychology has been published The targets consisted of 80 still pictures (static targets) elsewhere (Palmer at al., 1989). The pertinent point here and 80 short video segments complete with soundtracks is simply that the NRC's evaluation of the ganzfeld stud- (dynamic targets), all recorded on videocassette.' The ies does not reflect an additional, independent examine- static targets included art prints, photographs, and maga- tion of the ganzfeld database but is based on the same zine advertisements; the dynamic targets included ex- meta-analysis conducted by Hyman that we have die. cerpts of approximately 1-min duration from motion pic- cussed in this article. tures, TV shows, and cartoons. The 160 targets were ar- Hyman chaired the NRC's Subcommittee on Parapsy. ranged in judging sets of four static or four dynamic tar- chology, and, although he had concurred with Honorton 2 gets each, constructed to minimize similarities among years earlier in their joint communique that `there is an targets within a set. overall significant effect in this data base that cannot yea- Target selection and Presentation. The VCR containing sonably be explained by selective reporting or multiple the taped targets was interfaced. to the controlling com- analysis' (p. ?351) and that "significant outcomes have Puter, which selected the target and controlled its re- been produced by a number of different investigators' (p, peated presentation to the sender. during the ganzfeld pe- 352), neither of these points is acknowledged in the tom- nod, thus eliminating the need for a second experimenter mittee's report. to accompany the sender. After the ganzfeld period, the The NRC also solicited a background report fiom Harris computer randomly sequenced the four-clip judging set and Rosenthal (1988a), which provided the committee and presented it to the receiver on a TV monitor for judg- with a comparative methodological analysis of the five ~- The receiver used a computer game paddle to make controversial areas just listed. Harris and Rosenthal noted his or her ratings on a 40-point scale that appeared on the that, of these areas, "only the Ganzfeld ESP studies (the TV monitor after each clip was shown. The receiver was only psi studies they evaluated] regularly meet the basic permitted to see each clip and to change the ratings re- n d exper requirements of sound experimental design' (p. 63), and Peaky until he or she was satisfied. The computer then they concluded that wrote these and other data from the session into a file on National Academy of Sciences released a widel ubli- p em- both the receiver and the experimenter. Note that the ex- or timate flaws the obtained ined out by Hyman accuracy rate rate and to Hbe aboutonortoa , ... V3 we w tthhes- e perimenter did not even know the identity of the four-clip . set until it was displayed to the receiver for judg- accuracy rate expected under the null is 114. (p. 51)3in. - oerve s chamber and revealed the identity of the target to biped p firom these 28 studies. Gives the various robl 3jn a troubling development, the chair of the NRC Committee phoned Rosenthal and asked him to delete the parapsychology section of the paper (R..Rosenthal, personal communication, September 15, 1992). Although Rosenthal refused to do so, that section of the Harris Rosenthal paper is nowhere cited in the NRC report. 4Because llonorton and his colleagues have complied with the Hyman-Honorton specification that experimental reports be suf- ficiently complete to permit others to reconstruct the investiga- tors procedures, readers who wish to know more detail than we provide here are likely to find whatever they need in the archival publication of these studies in the Journal of Parapsychology (Honorton et al.. 1990). Appro ed ForRel ase 2003/04/18 : CIA-RDP96-00789R00270001 C'PYRGHT Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 ANOMALOUS INFORMATION TRANSFER Randomization. The random selection of the target and sequencing of the judging net were controlled by a noise- based random number generator interfaced to the com- puter. Extensive testing confirmed that the generator was providing a uniform distribution of values throughout the full target range (1-160). Tests on the actual frequencies observed during the experiments confirmed that targets were, on average, selected uniformly from among the 4 clips within each target set and that the 4 judging se- quences used were uniformly distributed across sessions. Additional control features. The receiver's and sender's rooms were sound isolated, electrically shielded chambers with single-door access that could be continuously moni- tored by the experimenter. There was two-way intercom communication between the experimenter and the re- ceiver but only one-way communication into the sender's room; thus, neither the experimenter nor the receiver could monitor events inside the. sender's room. The archival record for each session includes an audiotape containing the receiver's mentation during the ganzfeld period and all verbal exchanges between the experimenter and the-receiver throughout the experiment. The automated ganzfeld protocol has been examined by several dozen parapsychologists and behavioral re- searchers from other fields, including well-known critics of parapsychology. Many have participated as subjects or observers. All have expressed satisfaction with the han- dling of security issues and controls. Parapsychologists have often been urged to employ ma- gicians as consultants to ensure that the experimental protocols are not vulnerable either to inadvertent sensory leakage or to deliberate cheating. Two `mentalists,' magi- cians who specialize in the simulation of psi, have exam- ined the autoganzfeld system and protocol. Ford Kress, a professional mentalist and officer of the mentalist's pro- fessional organization, the Psychic Entertainers Associa- tion, provided the following written statementIn my pro- fessional capacity as a mentalist, I have reviewed Psy- chophysical Research Laboratories' automated ganzfeld system and found it to provide.excellent. security against deception by subjects" (personal communication, May, 1989). Daryl J. Bern has also performed as a mentalist for many years and is a member of the Psychic Entertainers Association. As mentioned in the author note, this article had its origins in a 1983 visit he made to Honorton's labo- ratory, where he was asked to critically examine the re- search protocol from the perspective of a mentalist, a re- search psychologist, and a subject. Needless to say, this article would not exist if he did not concur with Ford Kross'a assessment of the security procedures. Experimental Studies5 Altogether, 100 men and 140 women participated as re- ceivers in 354 sessions during the research program. The participants ranged in age from 17 to 74 years (iia - 37.3, SD = 11.8), with a mean formal education of 15.6 years (SD = 2.0). Eight separate experimenters, including Hon- orton, conducted the studies. 5A recent review of the original computer files uncovered a uplicate record in the autoganzfeld database. This has now been liminated, reducing by one the number of subjects and sessions. a result, some of the numbers presented in this article differ lightly from those in Honorton et al. (1990). The experimental program included three pilot an eight formal studies. Five of the formal studies use novice (first-time) participants who served as the receive in one session each. The remaining three formal studies used experienced participants. Pilot studies. Sample sizes were not preset in the three pilot studies. Study 1 comprised 22 sessions and was con- ducted during the initial development and testing of the autoganzfeld system. Study 2 comprised 9 sessions testing a procedure in which the experimenter, rather than the receiver, served as the judge at the end of the session. Study 3 comprised 35 sessions and served as practice for participants who had completed the allotted number of sessions in the ongoing formal studies but who wanted additional ganzfeld experience. This study also included several demonstration sessions when TV film crews were present. Novice Studies. Studies 101-104 were each designed to test 50 participants who had had no prior ganzfeld experi- ence; each participant served as the receiver in a single ganzfeld session. Study 104 included 16 of 20 students re- cruited from the Juilliard School in New York City to test an artistically gifted sample. Study 105 was initiated to accommodate the overflow of participants who had been recruited for Study 104, including the four remaining Juil- liard students. The sample size for this study was set to 25, but only 6 sessions had been completed when the labo- ratory closed. For purposes of exposition, we divided the 56 sessions from Studies 104 and 105 into two parts: Study 104/105(a) comprises the 36 nonJuilliard partici- pants and Study 104/105(b) comprises the 20 Juilliard students. Study 201. This study was designed to retest the most promising participants from the previous studies. The number of trials was set to 20, but only 7 sessions with 3 ~~~cipants had been completed when the laboratory Study 301. This study was designed to compare static and dynamic targets. The sample size was set to 50 ses- sions. Twenty-five experienced participants each served as the-receiver in 2 sessions. Unknown to the participants, the computer control program was modified to ensure that they would each have 1 session with a static target and 1 session with a dynamic target. Study 302 This study was designed to examine a dy- namic target set that had yielded a particularly high hit rate in the previous studies. The study involved experi- enced participants who had had no prior experience with this particular target set and who were unaware that only one target set was being sampled. Each served as the re- ceiver in a single session. The design called for the study to continue until 15 sessions were completed with each of the targets, but only 25 sessions had been completed when the laboratory closed. The 11 studies just described comprise all sessions con- ducted during the 6.6 years of the program. There is no "file drawer" of unreported sessions. Results Overall hit rate. As in the earlier meta-analysis, re- ceivers' ratings were analyzed by tallying the proportion of hits achieved and calculating the exact binomial proba- bility for the observed number of hits compared with the chance expectation of .25. As noted earlier, 240 partici- Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT Approved For Release 2003/04/18 CIA-RDP96-00789R002700010001-1 ANOMI~I,O"iTS INFORMA7`ION TRANSFER Table 1 Outcome by Study Study/subject N N N % Effect size Study description subjects trials hits. hits ' ` A 1 Poot 19. 22 8 36?. .62' z 0.99 2 ' Pilot . 33 :60 0.25 Pilot 24 35, 10W 29 ,55 ..:. .032 101- Novice .. 50 so 12. 24 .47 -0.30 102 Novice 50 50 .18 36 .63... '1.60 103 Novice so 50 15 30 ? .067 1041105(a) Novice. 36 36 12 33 .60 0.97 1041105(b) Juilliard -sample 10 50 .75 2.20 201 Experienced 3 7 3 .69 0.69 301 Experienced 25. 50 .15 30 .56 0.67 _302 Experienced . . ... 25' ' 25 16 .. 54a 78a a .. . . 3.04 Overall (Studies 1-301). Note. All z scores are based on the exact binomial pro bability. with .p - 25 and ;q 775. pants contributed .354. sessions. For reasons . discussed later, Study 302 is analyzed separately, reducing the number of sessions is the primary analysis to 329.' As Table 1' shows,' there were 106 lifts :in the -329 ses- sions, a hit'rate' of 32% (z = 2.89, p = .002, one-tailed), with a 95% confidence interval from 30% to 35%. This cor- responds to an effect size (sr) of .69, with a:95%?confidence interval from .63 to.64. . Table 1 also shows that when -Studies 104 and .105 are combined and re-divided into Studies 104/105(a) and 104/105(b), 9 of the 10 studies yield positive. effect sizes, with a. mean effect size (a) of .61, t(9) = 4.44,.p a .0008 one-tailed. This effect size is equivalent to a four alterna- Live hit rate of 34%. Alternatively, if Studies. 104 and -105 are retained as separate studies, 9 of the 10 studies again yield 3.73, p = .002, one-taia mean effect size led. This e$ ( size s s equivalent to a four-alternative.hit rate of 35% and is identical to that found across the 28 studies of the earlier meta-analysis.s Considered together, sessions with novice .participants (Studies 101-105) yielded a statistically significant bit ' rate of 32.5% (p = .009), which is not significantly differ- ent from the 31.6% bit rate achieved by. experienced par- ticipants in Studies 201 and 301. And finally, each of the 6Ae noted above, the laboratory was forced to close before three of the formal studies could be completed. If we assume that the remaining trials in Studies 105 and 201 would have yielded only chance results, this would reduce the overall x for the first 10 autoganzfeld studies from 2.89 to 2.76 (p -.003). Thus,inclusion of the two incomplete studies does not pose an optional stopping problem. The third -incomplete study. Study 302, is discussed below. eight experimenters also achieved aipositive eff i ect s ze, with a mean jr ..of .60, t(7) = 3.44;.p.- M5,:one-tailed. The .41Ur sample. .There are several reports in the literature of a relationship between creativity or artistic ability and psi performance (Schmeidler,1988): To explore this pcesi'bYlity in. the ganzfeld setting to male and 10 fe- male' undergraduates, were recruited from the Jmlliard School. Of.#heee, 8. were music:.atudents, 10 were drama students, and 2 were dance students. Each served as the receiver in a.single session in Study.104 or 105. As shown in Table: 1, these students achieved a hit rate of 60% (p = .014), one of 'the five highest bit rates ever reported for. a single sample in a ganzfeld study.. The musicians were particularly successful: 6 of the 8 (75%) successfully iden- tified their targets (p = .004, further details about this sample and their ganzfeld performance were reported in Schlitz & Honorton,1992). Study size and erect size. There .is a significant negative correlation across the 10 studies listed in Table 1 between the number of sessions included in a study and the study's effect size (a), r = -.64, 0) = 2X6, p < .05, two-tailed. This is reminiscent ofHyman's discovery that the smaller stud- . ies in the original ganzfeld database were disproportion. ately likely to -report statistically significant results. He interpreted this finding as evidence for a bias against the reporting of: small studies that fail to achieve significant results. A simflar interpretation cannot'be applied to the autoganzfeld studies, however, because there are no unre- ported sessions. One reviewer of this article suggested that the negative . correlation might reflect a decline effect in which earlier ?. Approved For Release 2003/04/18 : CIA-RDP96-00789R00270001 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT ANOMALOUS INF.OR M2ION TRANSFER ,f_ 9 Table 2 Study 302: Expelled Hit Rate and Proportion of Sessions in . which Each Video when it was a Deaoy? was Ranked Fast when it was a Target and Relative Relative Frequency of Fre ue fi fl Ranked First Ranked First Fisher's Video Cli q ncy rst ace T Expected when when Exact p as arget Ranking Hit Rate (%) Target Decoy Difference P Tidal Wave .28 .24 6.72 .57 .11 . .46 .032 (7/25) .(6125) j4/7) (2/18) Snakes' .12 .12 1.44 .67 .05 .62 .029 (325) (325) (2/3) (1/22) Sex Scene .16 .08 1.28 .25 .05 ; .20 300 () (2r25) {114) (1121) . Bugs Bunny .44 .56 24.64 .82 .36 .46 .027 (1125) (1425) 49/11) (5/14) Overall 34.08 .58 .14 .44 sessions of a study are more sueoessful than later aes- sions. If there were such an effect, then studies with :fewer sessions would show larger effect sizes because they would end before a decline could set in. To check this pos- sibility, we computed point-biserial correlations between hits (1) or misses (0) and the session number within each of the 10 studies. All of the correlations hovered-around zero, six were positive, four were negative, andtheoverall mean was. 01. An inspection of Table 1 reveals that the negative corre- lation derives .primarily from the two..studies with the largest effect sizes: the 20 sessions with the Juilliard stu- dents and the 7 sessions of Study 2044he study specifi- cally: designed to retest the most promising-participants from the =previous studies. Accordingly, it - seems likely that the larger effect sizes of these two studies-and hence .the significant negative - correlation between the number of sessions and the effect size-reflect genuine performance differences between these two small, highly selected samples and other autoganzfeld participants. Study 302. All of the studies except Study 302 randomly sampled from a pool of 160 static and dynamic targets. Study 302 sampled from a single, dynamic target set that had yielded a particularly high hit rate in the previous studies. The four film clips in this set consisted of a scene of a tidal wave from the movie Clash .of the Titans. a high. speed sex. scene from A Clockwork Orange, a scene of crawling snakes from a TV documentary, anda scene from a Bugs Bunny cartoon. The experimental design called for this study to con- tinue until each of the clips had served as the target 15 times. Unfortunately, the premature termination of this study at 25 sessions left an imbalance in the frequency with which each clip had served as the target. This means that the high hit rate observed (64%) could well be in- flated by response biases. As an illustration, waterimagery is frequently reported by receivers in ganzfeld sessions whereas sexual imagery is rarely reported. (Some participants are probably reluc- tent both to report sexual imageryand to give the highest rating to the sex-related clip.) If a video clip containing popular imagery (such as water) happens to appear as a target mom Avquently than a'clip containing unpopular imagery (ouch -as sex),:a high hit rate might simply reflect the coincidence of those frequencies of occurrence with perUcipants' response biases. And, as the second column of Table 2 reveals, the tidal wave .clip did in fact appear more frequently as the target than did the sex clip. More generally. the second and third columns of Table 2 show that the frequency with which each film clip was ranked first closely matches the frequency with which each ap. peared as the target. One can adjust for this problem by using the observed frequencies in these two columns to compute the hit rate expected if there were no psi effect. In particular, one can multiply eech.proportion in the second column by-the cor- responding.proportion:in the third column-yielding the joint probability that the clip was the target and that it was ranked first-and then an across the four clips. As shown in the fourth column of Table 2, this computation yields an overall expected hit rate of 34.08%. When the observed hit -rate of 64% is compared with this baseline, the effect size (h) is .61. As shown in Table 1, this is equivalent to a four-alternative hit rate of 64%, or a xr value of .78,-and is statistically significant (z =.3:04, p = .0012). The psi effect can be seen even more dearly in the re- maining columns of Table 2, which control for the differ- ential popularity of the imagery in the clips by displaying how frequently each was ranked first when it was the tar- get compared with how frequently it was ranked first when it was one of the control clips (decoys). As can be seen, each of the four clips was selected as the target rel- atively more frequently when it was the target than when it was a decoy, a difference that is significant for three of the four dips. On average, a clip was identified as the tar- get 58% of the time when it was the target and only 14% of the time when it was a decoy. Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 ANOMAL uS INFORMATION TRANSFER Dynamic versus static targets. The success of Study 302 raises the question of whether dynamic targets are, in general, more effective than static targets. This possibility was also suggested by the earlier meta-analysis, which revealed that studies using multiple-image targets (View Master stereoscopic slide reels) obtained significantly higher hit rates than did studies using single-image tar- gets. By adding motion and sound, the video clips might be thought of as high-tech versions of the View Master reels. The 10 autoganzfeld studies that randomly sampled from both dynamic and static target pools yielded 164 ses- sions with dynamic targets and 165 sessions with static targets.. As predicted, sessions using dynamic targets yielded significantly more hits than did sessions using static targets (37% vs. 27%; Fisher's exact p < .04). Sender-receiver pairing. The earlier meta-analysis re- vealed that studies in which participants were free to bring in friends to serve as senders produced significantly higher hit rates than studies that used only laboratory-as- signed senders. As noted, however, there is no record of how many of the participants in the former studies actu- ally did bring in friends. Whatever the case, sender-re- ceiver pairing was not a significant correlate of psi per- formance in the autoganzfeld studies: The 197 sessions in which the sender and receiver were friends did not yield a significantly higher proportion of hits than did the 132 sessions in which they were not (35% vs. 29%; Fisher's ex- act p Correlations between receiver characteristics and psi performance Most of the autoganzfeld participants were strong believers in psi: On a 7-point scale, ranging from strong disbelief in,.psi (1) to strong belief in psi (7), the mean was 62 .(SD = 1.03); only 2 participants rated their belief in psi below the midpoint of the scale. In addition, 88% of the participants reported personal experiences suggestive of psi,.and 80% had some training in medita- tion or other techniques involving internal focus of atten- tion. All of these appear to be important variables. The corre- lation between belief in psi and psi performance is one of the most consistent findings in the parapsychological -liit- erature (Palmer, 19781 And within the autoganzfeld stud- ies, successful performance of novice (first-time) partici- pants was significantly predicted by reported personal psi experiences, involvement with meditation or other mental disciplines, and high scores on the Feeling and Perception factors, of the Myers Briggs Type Inventory (Honorton, 1992; Honorton & Schechter, 1987; Myers & McCaulley, 1985). This recipe for success has now been independently replicated in. another laboratory (Broughton, Kanthamani, & MUM 1990). The personality trait of extraversion is also associated with better psi performance. A meta-analysis of 60 inde- pendent studies with nearly 3,000 subjects revealed a small but reliable .positive correlation between extraver- sion and psi performance, especially in studies that used free-response methods of the kind used in the ganzfeld experiments (Honorton, Ferrari, & Bem,1992). Across 14 free-response studies conducted by four independent in- vestigators, the correlation for 612 subjects was .20 (z a 4.82. p = 1.5 x 10-6). This correlation was replicated in the autoganzfeld studies, in which extraversion scores were available for 218 of the 240 subjects, r = .18, t(216) _ 2.67, p = .004, one-tailed. Finally, there is the strong psi performance of the Juil- liard students, discussed earlier, which is consistent with other studies in the parapsychological literature suggest- ing a relationship between successful psi performance and creativity or artistic ability. Discussion Earlier in this article we quoted from the abstract of the Hyman Honorton communique: "We agree that the final verdict awaits the outcome of future experiments con- ducted by a broader range of investigators and according to more stringent standards" (p. 351). We believe that the `stringent standards" requirement has been met by the autoganzfeld studies. The results are statistically signifi- cant and consistent with those in the earlier database. The mean effect size is quite respectable in comparison with other controversial research areas of human perfor- mance (Harris & Rosenthal, 1988a). And there are reli- able relationships between successful psi performance and conceptually relevant experimental and subject variables, relationships that also replicate previous findings. Hyman (1991) has also commented on the autoganzfeld studies: 'Honorton's experiments have produced intriguing re- sults. If...independent laboratories can produce similar results with the same relationships and with the same at- tention to rigorous methodology, then parapsychology may indeed have finally captured its elusive quarry' (p. 392): Issues of Replication ' As Hyman's comment implies, the autoganzfeld studies by themselves cannot satisfy the requirement that repli. cations be conducted by a "broader range of investigators " Accordingly, we hope the findings reported here will be sufficiently provocative to prompt others to try replicating the psi ganzfeld effect. We believe that it is essential, however, that future studies comply with the methodological, statistical, and reporting standards set forth in the joint communique and achieved by the autoganzfeld studies. It is not necessary for studies to be as automated or as heavily instrumented as the autoganzfeld studies in order to satisfy the methodological guidelines, but they are still likely to be labor intensive and potentially expensive .7 Statistical Power and Replication Would-be replicators also need to be reminded of the power requirements for replicating small effects. Although many academic psychologists do not believe in psi, many apparently do believe in miracles when it comes to repli- cation. Tveraky and Kahneman (1971) posed the following problem to their colleagues at meetings of the Mathemati- cal Psychology Group and the American Psychological As= sociation: Suppose you have run an experiment on 20 subjects and have obtained a significant result which confirms your the-, . 7As the closing of the autoganzfeld laboratory exemplifies, it is also difficult to obtain funding for psi research. The trhditional,, peer-refereed sources of funding familiar to psychologists have almost never funded proposals for psi research. The widespread skepticism of psychologists toward psi is almost certainly a con- tributing factor. Appro ase 2003/04/18 : CIA-RDP96-00789R00270001 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 ANOMALOUS INFORMATION TRANSFER ory (z = 2.23, p < .05, two-tailed). You now have cause to run an additional group of 10 subjects. What do you think the probability is that the results will be significant, by a one- tailed test, separately for this group? (p. 105) The median estimate was .85, with 9 out of 10 respon- dents providing an estimate greater than .60. The correct answer is approximately .48. As Rosenthal (1990) has warned: "Given the levels of statistical power at which we normally operate, we have no right to expect.the proportion of significant results that we typically do expect, even if in nature there is a very real and very important effect" (p. 16). In this regard, it is again instructive to consider the medical study that found a highly significant effect of aspirin on the incidence of heart attacks. The study monitored more than 22,000 subjects. Had the investigators monitored 3,000 subjects, they would have had less than an even chance of finding a conventionally significant effect. Such is life with small ef- fect sizes. Given its larger effect size, the prospects for success. fully replicating the psi ganzfeld effect are not quite so daunting, but they are probably still grimmer than intu- ition would suggest. If the true hit rate is in fact about 34% when 25% is expected by chance, then an experiment with 30 trials (the mean for the 28 studies in the original meta-analysis) has only about I chance in 6 of finding an effect significant at the .05 level with a one-tailed teat. A 50-trial experiment boosts that chance to about I in 3. One must escalate to 100 trials in order to come close to the break even point, at which one has a 60-60 chance of finding a statistically significant effect (Utte, 1986). (Recall that only 2 of the 11 autoganzfeld studies yielded results that were individually significant at the conven- tional .05 level.) Those who require that a psi effect be statistically significant every time before they will seri- ously entertain the possibility that an effect really exists know not what they ask. Significance Versus Effect Size The preceding discussion is unduly pessimistic, how- ever, because it perpetuates the tradition of worshipping the significance level. Regular readers of this journal are likely to be familiar with recent arguments imploring be- havioral scientists to overcome their slavish dependence on the significance level as the ultimate measure of virtue and instead to focus more of their attention on effect sizes: "Surely, God loves the .06 nearly as much as the .05" (Roanow & Rosenthal, 1989, p. 1277). Accordingly, we suggest that achieving a respectable effect size with a methodologically tight ganzfeld study would be a perfectly welcome contribution to the replication effort, no matter how untenurable the p level renders the investigator. Career consequences aside, this suggestion may seem quite counterintuitive. Again, Tversky and Kahneman (1971) have provided an elegant demonstration. They asked several of their colleagues to consider an investiga- tor who runs 15 subjects and obtains a significant t value of 2.46. Another investigator attempts to duplicate the procedure with the same number of subjects and obtains a result in the same direction but with a nonsignificant value of t. Tversky and Kahneman then asked their col- leagues to indicate the highest level of t in the replication study they would describe as a failure to replicate. The majority of their colleagues regarded t =1.70 as a failure to replicate. But if the data from two such studies (t = 2A6 CPYRGHT and t - 1.70) were pooled, the t for the combined data would be about 3.00 (assuming equal variances): Thus, we are faced with a paradoxical state of affairs, in which the same data that would increase our confidence in the finding when viewed as part of the original study, shake our confidence when viewed as an independent study. (Tversky & Habaemen,1971, p. 108) Such is the iron grip of the arbitrary .05. Pooling the data, of course, is what meta-analysis is all about. Ac- cordingly, we suggest that two or more laboratories could collaborate in a ganzfeld replication effort by conducting independent studies and then pooling them in meta-ana- lytic fashion, what one might call real-time meta-analy- sis. (Each investigator could then claim the pooled p level for his or her own curriculum vitae.) Maximizing Effect Size Rather than buying or borrowing larger sample sizes, those who seek to replicate the psi ganzfeld effect might find it more intellectually satisfying to attempt to maxi- mize the effect size by attending to the variables associ- ated with successful outcomes. Thus researchers who wish to enhance the chances of successful replication should use dynamic rather than static targets. Similarly we ad- vise using participants with the characteristics we have reported to be correlated with successful psi performance. Random college sophomores enrolled in introductory psy- chology do not constitute the optimal subject pool. Finally, we urge ganzfeld researchers to read carefully the detailed description of the warm social ambiance that Honorton et al. (1990) sought to create in the autoganzfeld laboratory. We believe that the social climate created in psi experiments is a critical determinant of their success or failure. The Problem of "Other" Variables This caveat about the social climate of the ganzfeld ex- periment prompted one reviewer of this article to worry that this provided "an escape clause" that weakens the falsifiability of the psi hypothesis: "Until Bem and Hon- orton can provide operational criteria for creating a warm social ambiance, the failure of an experiment with otherwise adequate power can always be dismissed as due to a lack of warmth." Alas, it is true; we devoutly wish it were otherwise. But the operation of unknown variables in moderating the success of replications is a fact of life in all of the sci- ences. Consider, for example, an earlier article in this journal by Spence (1964). He reviewed studies testing the straightforward derivation from Hullian learning theory that high-aaxiety subjects should condition more strongly than low-anxiety subjects. This hypothesis was confirmed 94% of the time in Spence's own laboratory at the University of Iowa but only 63% of the time in labo- ratories at other universities. In fact, Kimble and his as- sociates at Duke University and the University of North Carolina obtained results in the opposite direction in two of three experiments. In searching for a post hoc explanation, Spence (1964) noted that "a deliberate attempt was made in the Iowa studies to provide conditions in the laboratory that might elicit some degree of emotionality. Thus, the experi- menter was instructed to be impersonal and quite formal ... and did not try to put [subjects] at ease or allay any Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 ANOMALOUS INFORMATION TRANSFER expressed fears" (pp. 135-136). Moreover, he pointed out, his subjects sat in a dental chair whereas Kimble's sub- jecta sat in a secretarial chair. Spence even considered 'the possibility that cultural backgrounds of southern and northern students may lead to a difference in the manner in which they respond to the different items in the (Manifest Anxiety] scale" (p. 136). If this was the state of affairs in an area of research as well established as classical conditioning, then the suggestion that the so- cial climate of the psi laboratory might affect the out- come of ganzfeld experiments in ways not yet completely understood should not be dismissed as a devious attempt to provide an escape clause in case of replication failure. The beet the original researchers can do is to communi- cate as complete a knowledge of the experimental condi- tions as possible in an attempt to anticipate some of the relevant moderating variables. Ideally, this might include direct training by the original researchers or videotapes of actual sessions. Lacking these, however, the detailed de- scription of the autoganzfeld procedures provided by Hon- orton et al. (1990) comes as close as current knowledge permits in providing for other researchers the `operational criteria for creating a warm social ambiance." Theoretical Considerations 'Up to this point, we have confined our discussion to strictly empirical matters. We are sympathetic to the view that one should establish the existence of a phenomenon, anomalous or not, before attempting to explain it. So sup- pose for the moment that we have a genuine anomaly of information transfer here. How can it be understood or explained? The Psychology of Psi In attempting to understand psi, parapsychologists have typically begun with the working assumption that, whatever its underlying mechanisms, it should behave like other, more familiar psychological phenomena. In particular, they typically assume that target information behaves like an external sensory stimulus that is encoded, processed, and experienced in familiar information-pro- ceasing ways. Similarly, individual psi performances should covary with experimental and subject variables in psychologically sensible ways. These assumptions are em- bodied in the model of psi that motivated the ganzfeld studies in the first place. The ganzfeld procedure. As noted in the introduction, the ganzfeld procedure was designed to test a model in which psi-mediated information is conceptualized as a weak signal that is normally masked by internal somatic and external sensory 'noise.' Accordingly, any technique that raises the signal-to-noise ratio should enhance a per- sons ability 'to detect psi mediated information. This noise-reduction model of psi organizes a large and diverse body of experimental results, particularly those demon- strating the psi-conducive properties of altered states of consciousness such as meditation, hypnosis, dreaming, and, of course, the ganzfeld itself(Rao & Palmer, 1987). Alternative theories propose that the ganzfeld (and al- tered states) may be psi-conducive because it lowers resis- tance to accepting alien imagery, diminishes rational or contextual constraints on the encoding or reporting of in- formation, stimulates more divergent thinking, or even just serves as a placebolike ritual that participants per- ceive as being psi conducive (Stanford, 1987). At this point, there are no data that would permit one to choose among these alternatives, and the noise-reduction model remains the most widely accepted. The target. There are also a number of plausible hy- potheses that attempt to account for the superiority of dy- namic targets over static targets, Dynamic targets contain more information, involve more sensory modalities, evoke more of the receiver's internal schemata, are more lifelike, have a narrative structure, are more emotionally evoca- tive, and are 'richer' in other, unspecified ways. Several psi researchers have attempted to go beyond the simple dynamic-static dichotomy to more refined or theory-based definitions of a good target. Although these efforts have involved examining both psychological and physical prop- erties of targets, there is as yet not much progress to re- port (Delany, 1990). The receiver. Some of the subject characteristics asso- ciated with good psi performance also appear to have psy- chologically straightforward explanations. For example, garden-variety motivational explanations seem sufficient to account for the relatively consistent finding that those who believe in psi perform significantly better than those who do not. (Less straightforward, however, would be an explanation for the frequent finding that nonbelievers ac- tually perform significantly worse than chance (Broughton, 1991, p. 109].) The superior psi performance of creative or artistically gifted individuals-like the Juilliard students-may re- flect individual differences that parallel some of the hy- pothesized effects of the ganzfeld mentioned earlier. Ar ds- tically gifted individuals may be more receptive to alien imagery, be better able to transcend rational or contextual constraints on the encoding or reporting of information, or be more divergent in their thinking. It has also been sug- gested that both artistic and psi abilities might be rooted in superior right-brain functioning. The observed relationship between extraversion and psi performance has been of theoretical interest for many years. Eysenck (1966) reasoned,,-that extraverts should perform well in psi tasks because they are easily bored and respond favorably to novel stimuli. In a setting such as the ganzfeld, extraverts may become `stimulus starved' and thus be highly sensitive to any stimulation, including weak incoming psi information. In contrast, in- troverts would be more inclined to entertain themselves with their own thoughts and thus continue to mask psi in- formation despite the diminished sensory input. Eysenck also speculated that psi might be a primitive form of per- ception antedating cortical developments in the course of evolution, and, hence, cortical arousal might suppress psi functioning. Because extraverts have a lower level of cor- tical arousal than introverts, they should perform better in psi tasks (the evolutionary biology of psi has also been discussed by Broughton, 1991, pp. 347-352). But there are more mundane possibilities. Extraverts might perform better than introverts simply because they are more relaxed and comfortable in the social setting of the typical psi experiment (e.g., the `warm social am- biance' of the autoganzfeld studies). This interpretation is strengthened by the observation that introverts outper- formed extraverts in a study in which subjects had no con- tact with an experimenter but worked alone at home with- materials they received in the mail (Schmidt &`Schlitz, 1989). To help decide among these interpretations, ganzfeld experimenters have begun to use the extraver- sion scale of the NEO Personality Inventory (Costa & Mc- Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 ANOMALOUS INFORMATION TRANSFER Crae, 1992), which assesses six different facets of the ex- traversion-introversion factor. The sender. In contrast to this information about the re- ceiver in psi experiments, virtually nothing is known about the characteristics of a good sender or about the ef- fects of the sender's relationship with the receiver. As has been shown, the initial suggestion from the meta-analysis of the original ganzfeld database that psi performance might be enhanced when the sender and receiver are friends was not replicated at a statistically significant level in the autoganzfeld studies. A number of parapsychologists have entertained the more radical hypothesis that the sender may not even be a necessary element in the psi process. In the terminology of parapsychology, the sender-receiver procedure tests for the existence of telepathy, anomalous communication be- tween two individuals; however if the receiver is somehow . picking up the information from the target itself; it would be termed clairvoyance, and the presence of the sender would be irrelevant (except for possible psychological rea- sons such as expectation effects). At the time of his death, Honorton was planning a se- ries of autoganzfeld studies that would systematically compare sender and no-sender conditions while keeping both the receiver and the experimenter blind to the condi- tion of the ongoing session. In preparation, he conducted a meta-analytic review of ganzfeld studies that used no sender. He found 12 studies with a median of 33.5 ses- sions, conducted by seven investigators. The overall effect size 00 was .56, which corresponds to a four-alternative hit rate of 29%. But this effect size does not reach statisti- cal significance (Stouffer z = 1.31, p = .095). So far, then, there is no firm evidence for psi in the ganzfeld in the ab- sence of a sender. (There are, however, -nonganzfeld stud- ies in the literature that do report significant evidence for clairvoyance, including a classic card-guessing experiment conducted by J. B. Rhine and Pratt [1954].) The Physics of Psi The psychological level of theorizing discussed earlier does not, of course, address the-, conundrum that makes psi phenomena anomalous in the first place: their presumed incompatibility with our current conceptual model of physical reality. Parapsychologists differ widely from one another in their taste for theorizing at this level, but sev- eral whose training lies in physics or engineering have proposed physical (or biophysical) theories of psi phenom- ena (an extensive review of theoretical parapsychology was provided by Stokes, 1987). Only some of these theo- ries would force a radical revision in our conception of physical reality. Those who follow contemporary debates in modern physics, however, will be aware that several phenomena predicted by quantum theory and confirmed by experi- ment are themselves incompatible with our current con- ceptual model of physical reality. Of these, it is the 1982 empirical confirmation of Bell's theorem that has created the most excitement and controversy among philosophers and the few physicists who are willing to speculate on such matters (Cushing & McMullin, 1989; Herbert, 1987). In brief, Bell's theorem states that any model of reality that is compatible with quantum mechanics must be non- local: It must allow for the possibility that the results of observations at two arbitrarily distant locations can be correlated in ways that are incompatible with any physi- cally permissible causal mechanism. CPYRGHT Several possible models of reality that incorporate non- locality have been proposed by both philosophers and physicists. Some of these models clearly rule out psi-like information transfer, others permit it, and sorne actually require it. Thus, at a grander level of theorizing, some parapsychologists believe that one of the more radical models of reality compatible with both quantum mechan- ics and psi will eventually come to be accepted. If and when that occurs, psi phenomena would cease to be anomalous. But we have learned that all such talk provokes most of our colleagues in psychology and in physics to roll their eyes and gnash their teeth. So let's just leave it at that. Skepticism Revisited More generally, we have learned that our colleagues' tolerance for any kind of theorizing about psi is strongly determined by the degree to which they have been con- vinced by the data that psi has been demonstrated. We have further learned that their diverse reactions to the data themselves are strongly determined by their a priori beliefs about and attitudes toward a number of quite gen- eral issues, some scientific, some not. In fact, several statisticians believe that the traditional hypothesis test- ing methods used in the behavioral sciences should be abandoned in favor of Bayesian analyses, which take into account a person's a priori beliefs about the phenomenon under investigation (e.g., Bayarri & Berger, 1991; Daw- son,1991). In the final analysis, however, we suspect that both one's Bayesian a prioris and one's reactions to the data are ultimately determined by whether one was more severely punished in childhood for Type I or Type II er- rors. References Atkinson, It, Atkinson, it. C., Smith, E. E., & Bem, D. J. (1990). Introduction to psychology (10th ed.). San Diego, CA: Harcourt Brace Jovanovich. Atkinson, it., Atkinson, R. C., Smith, E. E., & Bem, D. J. (1993). Introduction to psychology (11th ed.). San Diego, CA: Harcourt Brace Jovanovich. Avant, L. L. (1965). Vision in the ganzfeld. Psychological Bulletin, 64,246-258. Bayarri, M. J., & Berger, J. (1991). Comment. Statistical Science, 6, 379-382. Blackmare, S. (1980). The extent of selective reporting of ESP GanzfeId studies. European Journal of Parapsy- chology, 3, 213-219. Bozarth, J. D., & Roberts, R. it. (1972). Signifying signifi- cant significance. American Psychologist, 27, 774-775. Braud, W. G., Wood, it., & Brazed, L. W. (1975). Free-re- sponse GESP performance during an experimental hypnagogic state induced by visual and acoustic ganzfeld techniques. A Replication and extension. Jour- nal of the American Society for Psychical Research, 69, 105-113. Broughton, it. S. (1991). Parapsychology: The controver- sial science. New York: Ballantine Books . Broughton, R. S., Kanthamani, H., & Khilji, A. (1990). As- sessing the PRL success model on an independent ganzfeld data base. In L. Henkel & J. Palmer (Eds.), Re- search in parapsychology 1989 (pp. 32-35). Metuchen, NJ: Scarecrow Press. Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT Affemftom 14 Child, I. L. (1985). Psychology and anomalous observa- Honorton, C., Ferrari, D. C., & Bern, D. J. (1992). Ex. tions: The question of ESP in dreams. American Pay- traversion and ESP performance: Meta-analysis and a chologist, 40, 1219-1230. new confirmation. In L. A. Henkel & G. R. Schmeidler Cohen, J. (1988). Statistical power analysis for the behav- (Eds.), Research in ra cholo 1990 ioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum. Metuchen, NJ: Scarecrow Press. (pp. 3538). Cohen, J. (1992). Statistical power analysis. Current Di- Honorton, C., & Harper, re ttions in Psychological Science,1, 98-101. and ideation in Si enta procedure for imagery an experimental l procedure for regular: Costa, P. T. J., & McCrae, R. R. (1992). Revised NEO Per. ing perceptual input. Journal of the American Society sonality Inventory (NEO-PI-R) and NEO Five Factor In- for Psychical Research, 68, ]L56-168. ventory (NEO-FFI) Manual. Odessa, FL: Psychological Honorton, C., & Schechter, K. I. (1987). Ganzfeld target Assessment Resources. retrieval with an automated testing system: A model for Cushing, J. T., & McMullin, E. (Eds.). (1989). Philosophi- initial ganzfeld success. noes of quantum theory: Reflections on Bell's n pIn D. B. Weiner & R. D. Nelson theorem. Notre Dame, IN: University of Notre Dame Metuchen, Research in psychology 1986 (pp. 36-39). Press. Hyman, R. (1985). The appraisal garapsy hoi experiment: A critical Dawson, R. (1991). Comment. Statistical Science, 6y 382- 385. . Journal of Parapsychology, 49, 3,49. IA 385oy, D. L. (1990). Hyman, R. (1991). Comment. Statistical Science, 6, 389- Approaches to the target: A time for 392 reevaluation. In L. A. Henkel, & J. Palmer (Eds.), Re- Hyman, It, & Honorton, C. 0.986). A joint communique: search in Parapsychology 1989 (pp. 89-92). Metuchen, The psi ganzfeld controversy. Journal of Parapsychol- NJ: Scarecrow Press. ogy, 50,351-364. Dingwall, E. J. (Ed.). (1968). Abnormal hypnotic phenom- Kennedy, J. E. (1979). Methodological problems in free-re- ena (4 vols.). London: Churchill. spouse ESP experiments. Journal of the American Soci- D.ruckman, D., & Swats, J. A. (Eds.). (1988). Enhancing ety for Psychical Research, 78, 1-15. human performance. Issues, theories, and techniques. Metzger, W. (1930). Optische Untersuchungen am Washington, DC: National Academy Press. Ganzfeld: IL Zur phanomenologie des homogenen Eysenck, H. J. (1966). Personality and extra-sensory per- Ganzfelds [Optical investigation of the Ganzfeld: II ception. Journal of the Society for Psychical Research, Toward the phenomenology of the homogeneous G.44, 65 T. (1991). How we know what isn't so: The Ganzfeld]. Psychologisehe Forvchung,13, 6-29. ovich, bility of human reason in e falli- Morris, R. L. (1991). Comment. Statistical Science, 6,393. ~~- vayday life. New York: Free 395. Green, C. E. (1960). Analysis of spontaneous cases. Pro- Mto trehe der elopmeennt and use Consulting the Myers Briggs T~ ceedings of the Society for Psychical Research, 53, 97- Indicator. Palo Alto, CA: Consulting Psychologists 161. Press. Harris, M. J., & Rosenthal, It. (1988a). Human perfor- Nisbett, It. E., & Ross, L. (1980). Human inference: rnance research: An overview. Washington, DC: National Strategies and shortcomings of social judgment. Engle- Academy Pres& wood Cliffs, NJ: Prentice-Hall Harris, M. J., & Rosenthal, R. (1988b). Postscript to Palmer, J. (1978). Extrasensory' perception: Research find- `Human performance research: An overview.' Washing- i ton, DC:National Academy Press. ~. In S. ~PPn~' (Ed.), Advances in Parapsychologi- Herbert, National. cal research (Vol. 2, pp. 59-243). New York: Plenum. N. (1987) Quantum reality: Beyond the new Palmer, J. A., Honorton, C., &. U'tte; J. (1989). Reply to the physics. Garden City, NY: Anchor Books. National Research Council Study on Parapsychology. Honorton, C. (1969). Relationship between EEG alpha ac- Journal of the American Society for Psychical Research, tivity and ESP card-guessing performance. Journal of 83,31-49. the American Society for Psychical Research, 63, 365- Parker, A. (1975). Some findings relevant to the change in 374. state hypothesis. In J. D. Morris, W. G. Roll, & R. L. Honorton, C. (1977). Psi and internal attention states. In Morris B. B. Wolman (Ed.), Handbook o (Eds.), Research a parapsychology, 1974 (pp. 40- . (pp. 42). Metuchen, NJ: Scarecrow 'Press. 435-472). New York: Van Nostrand Reinhold. Parker, A. (1978). A holistic methodology ' Honorton, C. (1979). Methodological issues in free-re- Parapsychology Review,-9, 1.6. psi research. spouse experiments. Journal of the American Society for Prasad, J., & Stevenson, I. (1968). A survey of s Psychical Research, 73, 381394. neous s chical Y Ponta- Pradesh, y experiences chfPar s chol. Honorton, C. (1985). Meta-analysis of psi ganzfeld re- search: A response to Hyman. India. International Journal of arapy ymaa. Journal of Parapsychol- ogy, 10, 241-261. ogy, 49,51-91. Rao, K. It., & Palmer, J. Honorton, C. (1992). The ganzfeld novice: Four predictors (1987). The anomaly called pin. of initial ESP performance. Proceedings of the rRecent research and criticism. Behavioral and Brain Psy- Sciencie,10, 539-551. chological Association 35th Annual Convention, Las Ve. Rhine, J. B., & Pratt, J. G. (1954). A review of the Pearce- gas, NV, 51-58. Pratt distance series of ESP tests. Journal o Para Honorton, C., Berger, R. E., Varvoglis, M. P., Quant, M., chology,18, 165-177. of Pry- Derr, P., Schechter, E. L, & Ferrari, D. C. (1990). Psi Rhine, L. E. (1962). Psychological processes in ESP expe-, communication in the ganzfeld: Experiments with an riences. L Waking experiences. Journal ofParapsychol?' automated testing system and a comparison with a ogy, 26,88-111. meta-analysis of earlier studies. Journal of Parapsy- chology, 54, 99-139. Appro Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 ed ForAel ase 2003/04/18 : CIA-RDP96-00789R00270001 001-1 W. sa Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT Roig, M., Icochea, H., & Cuzzucoli, A. (1991). Coverage of parapsychology in introductory psychology textbooks. Teaching of Psychology,18, 157-160. Rosenthal, R. (1978). Combining results of independent studies. Psychological Bulletin, 85, 185-193. Rosenthal, R. (1979). The 'Me drawer problem" and toler- ance for null results. Psychological Bulletin, 86, 638- 641. Rosenthal, it (1990). Replication in behavioral research. Journal of Social Behavior and Personality, 5,1-30. Rosenthal, it (1991). Meta-analytic procedures for social research (Rev. ed.). Newbury Park, CA. Sage. Rosenthal, It., & Rubin, D. B. (1989). Effect size estima- tion for one-sample multiple-choice-type data: Design, analysis, and meta-analysis. Psychological Bulletin, 106,332-337. Rosnow, It L., & Rosenthal, R.-(1989). Statistical proce- dures and the justification of knowledge in psychologi- cal science. American Psychologist, 44, 1276-1284. Sannwald, G. (1959). Statistische untersuchungen an Spontanph6nomene (Statistical investigation of sponta- neous phenomena]. Zeitschrif frlr Parapsychologie and Grenzgebiete der Psychologse, 3, 59-71. Saunders, D. R. (1985). On Hyman's factor analyses. Journal of Parapsychology, 49, 86-88. Schechter, E. I. (1984). Hypnotic induction vs. control conditions: Illustrating an approach to the evaluation of replicability in parapsychology. Journal of the American Society for Psychical Research, 78, 1-27. Schlitz, M. J., & Honorton, C. (1992). Ganzfeld psi per- formance within an artistically gifted population. Jour- nal of the American Society for Psychical Research, 86, 83-98. Schmeidler, G. R. (1988). Parapsychology and psychology; Matches and Mismatches. Jefferson, NC: McFarland. Schmidt, H., & Schlitz, M. J. (1989). A large scale pilot PK experiment with prerecorded random events. In L. A. Henkel & R. E. Berger (Eds.), Research in Parapsychol- ogy 1988 (pp. 6-10). Metuchen, NJ: Scarecrow Press. Spence, K. W. (1964). Anxiety (drive) level and perfor- mance in eyelid conditioning. Psychological Bulletin, 61, 129-139. Stanford, R. G. (1987). Ganzfeld and hypnotic-induction procedures in ESP research: Toward understanding their success. In S. Krippner (Ed.), Advances in para- psychological research (Vol. 5, pp. 39-76). Jefferson, NC: McFarland. Steering Committee of the Physicians' Health Study Re- search Group. (1988). Preliminary report: Findings from the aspirin component of the ongoing Physicians' Health Study. New England Journal of Medicine, 318, 262-264. Sterling, T. C. (1959). Publication decisions and their pos- sible effects on inferences drawn from tests of signifi- cance-or vice versa. Journal of the American Statisti- cal Association, 54, 3034. Stokes, D. M. (1987). Theoretical parapsychology. In S. Krippner (Ed.), Advances in parapsychological research (Vol. 5, pp. 77-189). Jefferson, NC: McFarland. Swets, J. A., & Bjork, R. A. (1990). Enhancing human per- formance: An evaluation of 'new age" techniques con- sidered by the U. S. Army. Psychological Science, 1, 85- 96. Tveraky, A., & Kahneman, D. (1971). Belief in the law of small numbers. Psychological Bulletin, 2, 105-110. Ullman, M., Krippner, S., & Vaughan, A. (1973). Dream telepathy. New York Macmillan. Utts, J. (1986). The ganzfeld debate: A statistician's per- spective. Journal of Parapsychology, 50, 393-402. Utts, J. (1991a). Rejoinder. Statistical Science, 6, 396-403. Utta, J. (1991b). Replication and meta-analysis in para. psychology. Statistical Science, 6, 363-378. Wagner, M. W., & Monnet, M. (1979). Attitudes of college professors toward extra-sensory perception. Zetetic Scholar, 5, 7-17. Received September 28, 1992 Revision received March 10, 1993 Accepted March 14, 1993 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1W3 r statistical science 1991. Vol. 6. No. 4.363-403 Replication and,Meta-Analysis in Parapsychology Jessica Utts Abstract. Parapsychology, the laboratory study of psychic.phenomena, has had its 'history interwoven with that of statistics. Many of the controversies in :parapsychology have focused on statistical issues, and statistical models have played, an integral role - in the experimental work. Recently, parapsychologists have been using meta-analysis as a tool for ,synthesizing large bodies of work. This paper presents an overview of the 'use of statistics in parapsychology and offers a summary of the meta-analyses that have been conducted. It begins with some anecdotal information about the -involvement of statistics and statisti- cians with the early history of parapsychology. Next, it is argued that most nonstatisticians do not appreciate the -connection between power and "successful" replication of experimental effects. Returning to para- psychology, a particular experimental regime is examined by summariz- ing an extended debate over the interpretation of the results. A new set of experiments designed to resolve the debate is then reviewed. Finally, meta-analyses from several areas of parapsychology are summarized. It is concluded that the overall evidence indicates that there is -an anoma- lous effect in need of an explanation. Key words and phrases: Effect size, psychic research, statistical contro- versies, randomness, vote-counting. 1. INTRODUCTION In a June 1990 Gallup Poll, 49% of the 1236 respondents claimed to believe in extrasensory per- ception (ESP), and one in four claimed to have had a personal experience involving telepathy (Gallup and Newport, 1991). Other surveys have shown even higher percentages; the University of Chicago's National Opinion Research Center re- cently surveyed 1473 adults, of which 67% claimed that they had experienced ESP (Greeley, 1987). Public opinion is a poor arbiter -of science, how- ever, and experience is a poor substitute for the scientific method. For more than a century, small numbers of-scientists have been conducting labora- tory experiments to study phenomena such as telepathy, clairvoyance and precognition, collec- tively known as "psi" abilities. This paper will examine some of that work, as well as some of the statistical controversies it has generated. Jessica Utts is Associate Professor, Division of Statistics, University of California at Davis, 469 Kerr Hall, Davis, California 95616. CPYRGHT Parapsychology, as this field is called, has been a. source of controversy throughout its history. Strong beliefs tend to be resistant to.change even in the face of data, and many people, scientists included, seem to have made up their minds on the question without examining any empirical data at all. A critic of parapsychology recently acknowledged that "The level of the debate during the past 130 years has been an embarrassment for anyone who would like to believe that scholars and scientists adhere to standards of rationality and fair play" (Hyman, 1985a, page 89). While much of the controversy has focused on poor experimental design and potential fraud, there have been attacks and defenses of the statistical methods as well, sometimes calling into question the very foundations of probability and statistical inference. Most of the criticisms have been leveled by psy- chologists. For example, a 1988 report of the U.S. National Academy of Sciences concluded that "The committee finds no scientific justification from research conducted over a period of 130 years for the existence of . parapsychological phenomena" (Druckman and Swets, 1988, page 22). The chapter on parapsychology was written by a subcommittee Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT* ments, offered one of the earliest treatises on the statistical evaluation of forced-choice experiments- in two articles published in the Proceedings of the Society for Psychical Research (Edgeworth, 1885, 1886). Unfortunately., as noted by -Mauskopf and McVaugh (1979) in their historical account of the period, Edgeworth's papers were "perhaps too diffi- cult for their immediate audience" (page 105). Edgeworth began. his analysis by using Bayes' theorem to derive; the formula for the posterior probability that chance was operating, given the data. He then continued with an argument "savouring more of Bernoulli than Bayes" in which "it is consonant, I submit, to experience, to put 1 /2 both for a and a," that is, for both the prior proba- bility that chance alone was operating, and the prior.probability that "there should have been some additional agency." He then reasoned (using a Taylor series expansion of the posterior prob- ability formula) that if there were a large prob- ability: of observing the data given that some- additional agency was at work, and a small objec- tive probability of the data under chance, then the: latter (binomial) probability "may be taken as a rough measure. of the sought a posteriori probabil..: ity in favour of mere chance" (page 195). Edge-. worth concluded his article by applying his method to some data published previously in the same journal. He found the probability against chance to be 0.99996, which he said "may fairly be regarded as physical certainty" (page 199). He concluded: chaired by a psychologist who had published a similar conclusion prior to his appointment to the committee (Hyman, 1985a, page 7). There were no parapsychologists involved with the writing of the report. Resulting accusations of bias (Palmer, Hon- orton and Utts, 1989) led U.S. Senator Claiborne Pell to request that the Congressional Office of Technology Assessment (OTA) conduct an investi- gation with a more balanced group.., A Hone-day workshop was held on September 30,.1988, bring- ing together parapsychologists, critics and experts in some related fields (including the author of this paper). The report concluded that parapsychology needs "a fairer hearing across a broader spectrum of the scientific community, so that emotionality does not impede objective assessment of experimen- tal results" (Office of Technology Assessment, 1989). It is in the spirit of the OTA report that this article is written. After Section 2, which offers an anecdotal account of the role of statisticians and statistics in parapsychology, the discussion turns to the more general question of replication ofexperi- mental results. Section 3 illustrates how. replica- tion has been (mis)interpreted by scientists in many fields. Returning to parapsychology in Section 4, a particular experimental regime called the "ganz- feld" is described, and an extended debate about the interpretation of the experimental results is discussed. Section 5 examines a meta-analysis of recent ganzfeld experiments designed to resolve the debate..Finally, Section 6 contains a brief account of meta-analyses. that have been conducted in other areas of parapsychology, and conclusions are given in Section 7. 2. STATISTICS AND PARAPSYCHOLOGY Parapsychology had its beginnings in the investi- gation of purported mediums and other anecdotal claims in the late 19th century. The Society for Psychical Research was founded in Britain in 1882, and its American. counterpart was founded in Boston in 1884.. While these organizations and their members were primarily involved with investigat- ing Anecdotal material, a few of the early re- searchers were .already conducting "forced-choice" experiments such as card-guessing. (Forced-choice experiments are like multiple choice tests; on each trial the subject must guess from a small, known set of possibilities.) Notable among these was Nobel Laureate, Charles Richet, who is generally credited with being the first to recognize that prob- ability theory could be applied to card-guessing experiments (Rhine, 1977, page 26; Richet, 1884). F. Y. Edgeworth, .partly in response to what he - --_-_J___.7 e.. 1.., ;.,.,..,...nn+ -ewcee ^f +U, as PvnPri. Such is the evidence whicthe calculus of probabilities affords as to the 'existence of an agency other than mere chance The calculus is silent as to the nature of that agency-whether it is more likely to be vulgar illusion or ex- traordinary law. That is a question to be decided, not by formulae and figures, but by general philosophy and common sense (page 199]. Both the statistical arguments and the experi- mental controls in these early experiments were somewhat loose. For example, Edgeworth treated= as binomial an experiment in which one person??'= chose a string of eight letters and another at- tempted =-r. to guess the string. Since it has long been ' understood that people are poor random number (or letter) generators, there is no statistical basis for analyzing such an experiment. Nonetheless, Edge- worth and his contemporaries set the stage for the use of controlled experiments with statistical evalu- ation in laboratory parapsychology. An interesting historical account of Edgeworth's involvement and the role telepathy experiments played in the early. history of randomization and experimental design-` is nrnvidM(l by Hacking '(1988) Appro ed For=Ref ase 2003/04/18 : CIA-RDP96-00789R00270001 001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT One of the first American researchers to use statistical methods in parapsychology was John Edgar Coover, who was the Thomas Welton Stanford Psychical Research Fellow in the Psychol- ogy Department at Stanford University from 1912 to 1937 (Dommeyer, 1975). In 1917, Coover pub- lished a large volume summarizing his work (Coover, 1917). Coover believed that his results were consistent with chance, but others have ar- gued that Coover's definition of significance was too strict (Dommeyer, 1975). For example, in one evaluation of his telepathy experiments, Coover found a two-tailed p-value:. of 0.0062. He concluded, "Since this value, then, ;lies within the field of. chance deviation, although the probability of its occurrence by chance is fairly low, it cannot be accepted as a decisive indication of some cause beyond chance which operated in favor of success in guessing" (Coover, 1917, page 82). On the next page, he made it explicit that he would require a p-value of 0.0000221 to declare that something other than chance was operating. It was during the summer of 1930, with the card-guessing experiments of J. B. Rhine at Duke University, that parapsychology began to take hold as a laboratory science. Rhine's laboratory still exists under the name of the Foundation for Re- search on the Nature of Man, housed at the edge of the Duke University campus. It wasn't long after Rhine published his first book, Extrasensory Perception in 1934, that the attacks on his methodology began. Since his claims were wholly based on statistical analyses of his experiments, the statistical methods were closely scrutinized by critics anxious to find a conventional explanation for Rhine's positive results. The most persistent critic was a psychologist from McGill University named Chester Kellogg (Mauskopf and McVaugh, 1979). Kellogg's main argument was that Rhine was using the binomial distribution (and normal approximation) on a se- ries of trials that were not independent. The experi- ments in question consisted of having a subject guess the order of a deck of 25 cards, with five each of five symbols, so technically Kellogg was correct. By 1937, several mathematicians and statis- ticians had come to Rhine's aid. Mauskopf and McVaugh (1979) speculated that since statistics was itself a young discipline, "a number of statisticians were equally outraged by Kellogg, whose argu- ments they saw as discrediting their profession" (page 258). The major technical work, which ac- knowledged that Kellogg's criticisms were accurate but did little to change the significance of the results, was conducted by Charles Stuart and Joseph A. Greenwood and published in the first and Greenwood, 1937). Stuart, who had been an undergraduate in mathematics at Duke, was one of Rhine's early subjects and continued to work with him as a researcher until Stuart's death in 1947. Greenwood was a Duke mathematician, who appar- ently converted to a statistician at the urging of Rhine. Another prominent figure who was distressed with Kellogg's attack was E. V.. Huntington, a mathematician at Harvard. After corresponding with.Rhine, Huntington decided that, rather than further confuse the public with a technical reply to Kellogg's arguments, a simple statement should be made to the effect that the mathematical issues in Rhine's work had been resolved. Huntington must have successfully convinced his former student, Burton Camp of Wesleyan, that this was a wise approach. Camp was the 1937 President of EMS. When the annual meetings were held in December of 1937 (jointly with AMS and AAAS), Camp released a statement to the press that read: Dr. Rhine's investigations have two aspects: experimental and statistical. On the exper- imental side mathematicians, of course, have nothing to say. On the statistical side, however, recent mathematical work has established the fact that, assuming that the experiments have been properly performed, the statistical analysis is essentially valid. If the Rhine investigation is to be fairly attacked, it must be on other than mathematical grounds (Camp, 1937). One statistician who did emerge as a critic was William Feller. In a talk at the Duke Mathemati- cal Seminar on April 24, 1940, Feller raised three criticisms to Rhine's work (Feller, 1940). They had been raised before by others (and continue to be raised even today). The first was that inadequate shuffling of the cards resulted in additional infor- mation from one series to the next. The second was what is now known as the "file-drawer effect," namely, that if one combines the results of pub- lished studies only, there is sure to be a bias in favor of successful studies. The third was that the results were enhanced by the use of optional stop- ping, that is, by not specifying the number of trials in advance. All three of these criticisms were ad- dressed in a rejoinder by Greenwood and Stuart (1940), but Feller was never convinced. Even in its third edition published in 1968, his book An Intro- duction to Probability Theory and Its Applications still contains his conclusion about Greenwood and Stuart: "Both their arithmetic and their experi- ments have a distinct tinge of the supernatural" (Feller, 1968, page 407). In his discussion of Feller's ,.? ..nnn, __.......L ...] "T hPlieve Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Feller was confused ... he seemed to have decided the opposition was wrong and that was that." Several statisticians have contributed to the literature in parapsychology to greater or lesser degrees. T. N. 'E. Greville developed applicable statistical methods for many of the experiments in parapsychology and was Statistical Editor of the Journal of Parapsychology, (with J. A: Greenwood) from its start in 1937 through Volume 31 in 1967; Fisher (1924, 1929) addressed some specific prob- lems in card-guessing experiments; Wilks (1965a, b) described various statistical methods for parapsy- chology; Lindley (1957) presented a Bayesian anal- ysis of some parapsychology data; and Diaconis (1978) pointed out some problems with certain ex- periments and presented a method for analyzing experiments when feedback is given. Occasionally, - attacks on parapsychology have taken the form of attacks on statistical inference in general, at least as it is applied to real data. Spencer-Brown (1957) attempted to show that true randomness is impossible, at least in finite se- quences, and that this could be the explanation for the results in parapsychology. That argument re- emerged in. a recent debate on 'the role of random. ness in parapsychology, initiated by psychologist J. Barnard Gilmore (Gilmore, 1989, 1990; Utts, 1989; Palmer, 1989, 1990). Gilmore stated. that "'The ag- nostic statistician, advising on research in psi, should take account of the possible inappropriate- ness of classical inferential statistics" (1989, page 338). In his second paper, Gilmore reviewed several non-psi studies showing purportedly random sys- tems that do not behave as they should under randomness (e.g., Iversen, Longcor, Mosteller, Gilbert and Youtz, 1971; Spencer-Brown, 1957). Gilmore concluded that "Anomalous data ... should not be found nearly so often if classical statistics offers a valid model of reality" (1990, page 54), thus rejecting the use of classical statisti- cal inference for real-world applications in general. 3. REPLICATION Implicit and explicit in the literature on parapsy- chology is the assumption that, in order to truly establish itself, the field needs to find a repeat- able experiment. For example, Diaconis (1978) started the summary of his article in Science with the words "In search of repeatable ESP experi. ments, modern investigators ... " (page 131). On October 28-29, 1983, the 32nd International Con- ference of the Parapsychology Foundation was held in San Antonio, Texas, to address "The Repeatabil- ity Problem in : Parapsychology." The Conference Proceedings (Shapin and Coly, 1985) reflect the diverse views among parapsychologists on the na- ture of the problem. Honorton (1985a) and Rao (1985), for example, both argued that strict replica tion is uncommon in most branches of science and that parapsychology should not be singled out as unique in this regard. Other authors expressed disappointment in the lack of a single repeatable experiment in parapsychology, with titles such' as "Unrepeatability- Parapsychology's Only Find- ing" (Blackmore, 1985), and "Research Strategies for Dealing with Unstable Phenomena" (Beloff, 1985). It has never been clear, however, just exactly what would constitute acceptable evidence of a re- peatable experiment. In the early,,days of investiga. tion, the major critics "insisted that it would be sufficient for Rhine and Soal to convince them of ESP if a parapsychologist could perform success- fully a single 'fraud-proof' xperiment" (Hyman, 1985a, page 71). However, as soon as well-designed experiments showing statistical significance emerged, the critics realized that a single experiment could be statistically significant just by, chance. British psychologist C. E. M. Hansel quan-' tified the new expectation, that the experiment. should be repeated a few times, as follows: If a result is significant at the .01 level and this result is not due to chance but to informa- tion reaching the subject, it may be expected that by making two further sets of trials the antichance odds of one hundred to one will be increased to around a million' to one, thus en- abling the effects of ESP-or`' whatever is re- sponsible for the original result-to manifest' itself to such an extent than there will be little doubt that the result is not due to chance (Hansel, 1980, page 2981. In other words, three consecutive experiments at p:5 0.01 would convince Hansel that something other than chance was at work. This argument implies that if a particular experi- ment produces a statistically significant result, but subsequent replications fail to attain significance, then the original result was probably due to chance, or at least remains unconvincing. The problem with this line of reasoning is that there is no consid- eration given to sample size or power. Only an experiment with extremely high power should be expected to be "successful" three times in succession. It is perhaps a failure of the way statistics is taught that many scientists do not understand the importance of power in defining successful replica- tion. To illustrate this point, psychologists Tversky, and Kahnemann (1982) distributed a questionnaire Appro ed For'Rel Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT to their colleagues at a professional meeting, with the question: An investigator has reported a result that you consider implausible. He ran 15 subjects, and reported a significant value, t = 2.46. Another investigator has attempted to duplicate his pro- cedure, and he obtained a nonsignificant value of t with the same number of subjects. The direction was the same in both sets of data. You are reviewing the literature. What is the highest value of t in the second set of data that you would describe as a failure to replicate? (1982, page 28]. In reporting their results, Tversky and Kahne- mann stated: The majority of our respondents regarded t = 1.70 as a failure to replicate. If the data of two such studies (t = 2.46 and t = 1.70) are pooled, the value of t for the combined data is about 3.00 (assuming equal variances). Thus, we are faced with a paradoxical state of affairs, in which the same data that would increase our confidence in the finding when viewed as part of the original study, shake our confidence when viewed as an independent study [1982, page 28]. At a recent presentation to the History and Phi- losophy of Science Seminar at the University of California at Davis, I asked the following question. Two scientists, Professors. A .and B,.. each. have a. theory they would like to demonstrate.. Each plans to run a fixed number of Bernoulli trials .and then test Ho: p = 0.25 versus Ha: p > 0.25. Professor A has access to large numbers of students each semester to use as subjects. In his first experiment, he runs 100 subjects, and there are 33 successes (p = 0.04, one-tailed). Knowing the importance of replication, Professor A runs an additional 100 sub- jects as a second experiment. He finds 36 successes (p = 0.009, one-tailed). Professor B only teaches small classes. Each quarter, she runs an experiment on her students to test her theory. She carries out ten studies this way, with the results in Table 1. I asked the audience by a show of hands to indicate whether or not they felt the scientists had successfully demonstrated their theories. Professor A's theory received overwhelming support, with approximately 20 votes, while Professor B's theory received only one vote. If you aggregate the results of the experiments for each professor, you will notice that each con- ducted 200 trials, and Professor B actually demon- strated a higher level of success than Professor A, with 71 as opposed to 69 successful trials. The one-tailed p-values for the combined trials are 0.0017 for Professor A and 0.0006 for Professor B. To address the question of replication more ex- plicitly, I also posed the following scenario. In December of 1987, it was decided to prematurely terminate a study on the effects of aspirin in reduc- ing heart attacks because the data were so convinc- ing (see, e.g., Greenhouse and Greenhouse, 1988; Rosenthal, 1990a). The physician-subjects had been randomly assigned to take aspirin or a placebo. There were 104 heart attacks among the 11,037 subjects in the aspirin group, and 189 heart attacks among the 11,034 subjects in the placebo group (chi-square = 25.01, p < 0.00001). After showing the results of that study, I pre- sented the audience with two hypothetical experi- ments conducted to try to replicate the original result, with outcomes in Table 2. I asked the audience to indicate which one they thought was a more successful replication. The au- dience chose the second one, as would most journal editors, because of the "significant p-value." In fact, the first replication has almost exactly the same proportion of heart attacks in the two groups as the original study and is thus a very close repli- cation of that result. The second replication has TABLE 1 Attempted repkiations for professor B A Number of successes One-tailed p-value 10 4 0.22 15 6 0.15 17 6 0.23 25 8 0.17 30 10 0.20 40 13 0.18 18 7 0.14 10 5 0.08 15 5 0.31 20 7 0.21 TABLE 2 Hypothetical replications of the aspirin/ heart attack study Replication #1 Heart attack Replication #2 Heart attack Aspirin 11 1156 20 2314 Placebo 19 1090 48 2170 Chi-square 2.596, p = 0.11 13.206. p = 0.0003 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 a 95% confidence interval for relative risk from the. original study. The magnitude. of the effect has been much more. closely matched by the "nonsig- nificant'.' replication. Fortunately, psychologists are beginning to .no- tice.. that ,replication is not as straightforward as they were originally led to. believe. A special issue. of the Journal. of Social Behavior and Personality was entirely devoted. to the question of replication (Neuliep, 1990). In one of the. articles, Rosenthal cautioned his. colleagues: "Given the levels of sta- tistical power at which we normally operate, we have no right to expect the proportion-of significant. results that we typically. do expect, even. if in na- ture there. is a very real and very important effect" (Rosenthal, 1990b, page 16). Jacob Cohen, in his insightful article titled "Things.I Have Learned (So Far),'" identified an- other misconception common . among social scien- tists: ``Despite widespread misconceptions to the contrary, the rejection of a given null hypothesis gives us.no basis for estimating the probability that a. replication of the research will again, result in rejecting .that null hypothesis" (Cohen, 1990, page 1307). very different proportions,. and in fact the relative, been consistent effects of the same magnitude. risk from the second study is not even contained in Rosenthal also advocates this view of replication:. effect sizes as opposed to significance levels when defining the strength of an experimental effect. In general, effect sizes measure the amount by which the data deviate from the null hypothesis in terms of standardized units. For instance, the effect size for a two-sample t-test is usually defined to be the difference in the two means, 'divided by the stan- dard deviation for the control group. This measure can be compared across studies without the depen- dence on sample size inherent in significance lev- els. (Of course there will still be variability in the sample effect sizes, decreasing as a function of sam- ple size.) Comparison of effect sizes across studies is one of the major components of meta-analysis. Similar arguments have recently been made in the medical. literature. For example, Gardner and Altman (1986) stated that the use of p-values "to define two alternative outcomes-significant and not significant-is not helpful and encourages lazy thinking" (page 746). They advocated the use of confidence intervals instead. As discussed in the next section, the arguments used to conclude that parapsychology has failed to demonstrate a replicable effect hinge on these mis. conceptions of replication and failure to examine power. A more appropriate analysis would compare the effect sizes for similar experiments across ex- perimenters and..across time to see if there have Cohen and Rosenthal both advocate the use of Appro The traditional view of replication focuses on, significance level as the relevant summary statistic of a study and evaluates the success of a replication ' in a dichotomous fashion The newer, more useful view of replication'focuses on effect size as the more important summary'.,statistic of a study and evaluates the success of . a replication not in a dichotomous but in a continuous fashion (Rosenthal, 1990b, page 28). The dichotomous view of, replication has been used throughout the history of parapsychology, by both parapsychologists and critics (Utts, 1988). For example, the National Academy of Sciences report critically evaluated "significant" experiments, but entirely ignored "nonsignificant" experiments. In the next three sections, we will examine some of the results in parapsychology using the broader, more appropriate definition of replication. In doing so, we will show that the results are far, more interesting than the critics would have us believe. 4. THE GANZFELD DEBATE IN PARAPSYCHOLOGY An extensive debate took place in the mid-1980s between a parapsychologist and critic, questioning, whether or not a particular body of parapsychologi- cal data had demonstrated psi abilities. The experi- ments in question were all conducted using the ganzfeld setting (described below). Several authors were invited to write commentaries on the debate. As a result, this data base liar been more thor- oughly analyzed by both critic Jss and proponents than any other and provides a good source for studying replication in parapsychology. The debate concluded with a detailed series' of recommendations for further experiments, and left open the question' of whether or not psi abilities had been demonstrated. A new series of experi- ments that followed the recommendations were conducted over the next: few years. The results of the new experiments will be presented in Section 5. 4.1 Free-Response Experiments Recent experiments in parapsychology tend to use more complex target material than the cards and dice used in the early investigations, partially` to alleviate boredom on the part of the subjects and partially because they are thought to "more nearly resemble the conditions of spontaneous psi ";occur= rences" (Burdick and Kelly, 1977, page 109). These ' experiments fall under the general heading of "free-response" experiments, because the subject is asked to give a verbal or written description of the.,` ed For,Rel base 2003/04/18 :CIA-RDP96-00789R00270001 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT REPLICATION IN PARAPSYCHOLOGY 369 target, rather than being forced to make a choice from a small discrete set of possibilities. Various types of target material have been used, including pictures, short segments of movies on video tapes, actual locations and small objects. Despite the more complex target material, the statistical methods used to analyze these experi- ments are similar to those for forced-choice experi- ments. A typical experiment proceeds as follows. Before conducting any trials, a large pool of poten- tial targets is assembled, usually in packets of four. Similarity of targets within a packet is kept to a minimum, for reasons made clear below. At the start of an experimental session, after the subject is sequestered in an isolated room, a target is selected at random from the pool. A sender is placed in another room with the target. The subject is asked to provide a verbal or written description of what he or she thinks is-in the target, knowing only that it is a photograph, an object, etc. After the subject's description has been recorded and secured against the potential for later alter- ation, a judge (who may or may not be the subject) is given a copy of the subject's description and the four possible targets that were in the packet with the correct target. A properly conducted experi- ment either uses video tapes or has two identical sets of target material and uses the duplicate set for this part of the, process, to ensure that clues such as fingerprints don't give away the answer. Based on the subject's description, and of course on a blind basis, the judge is asked.. to either. rank the four choices from most to least likely to have been the target, or to select the one from the four that seems to best match the subject's description. If ranks are used, the statistical analysis proceeds by summing the ranks over a series of trials and comparing the sum to what would be expected by chance. If the selection method is used, a "direct hit" occurs if the correct target is chosen, and the number of direct hits over a series of trials is compared to the number expected in a binomial experiment with p = 0.25. Note that the subjects' responses cannot be con- sidered to be "random" in any sense, so probability assessments are based on the random selection of the target and decoys. In a correctly designed ex- periment, the probability of a direct hit by chance is 0.25 on each trial, regardless of the response, and the trials are independent. These and other issues related to analyzing free-response experiments are discussed by Utts (1991). 4.2 The Psi Ganzteid Experiments The ganzfeld procedure is a particular kind of free-response experiment utilizing a perceptual isolation technique originally developed by Gestalt psychologists for other purposes. Evidence from spontaneous case studies and experimental work had led parapsychologists to a model proposing that psychic functioning may be masked by sensory in- put and by inattention to internal states (Honorton, 1977). The ganzfeld procedure was specifically de- signed to test whether or not reduction of external "noise" would enhance psi performance. In these experiments, the subject is placed in a comfortable reclining chair in an acoustically shielded room. To create a mild form of sensory deprivation, the subject wears headphones through which white noise is played, and stares into a constant field of red light. This is achieved by taping halved translucent ping-pong balls over the eyes and then illuminating the room with red light. In the psi ganzfeld experiments, the subject speaks into a microphone and attempts to describe the target material being observed by the sender in a distant room. At the 1982 Annual Meeting of the Parapsycho- logical Association, a debate took place over the degree to which the results of the psi ganzfeld experiments constituted evidence of psi abilities. Psychologist and critic Ray Hyman and parapsy- chologist Charles Honorton each analyzed the re- sults of all known psi ganzfeld experiments to date, and they reached strikingly different conclusions (Honorton, 1985b; Hyman, 1985b). The debate con- tinued with the publication of their arguments in separate articles in the March 1985 issue of the Journal of Parapsychology. Finally, in the Decem- ber 1986 issue of the Journal of Parapsychology, Hyman and Honorton (1986) wrote a joint article in which they highlighted their agreements and disagreements and outlined detailed criteria for future experiments. That same issue contained commentaries on the debate by 10 other authors. The data base analyzed by Hyman and Honorton (1986) consisted of results taken from 34 reports written by a total of 47 authors. Honorton counted 42 separate experiments described in the reports, of which 28 reported enough information to determine the number of direct hits achieved. Twenty three of the studies (55%) were classified by Honorton as having achieved statistical significance at 0.05. 4.3 The Vote-Counting Debate Vote-counting is the term commonly used for the technique of drawing inferences about an experi- mental effect by counting the number of significant versus nonsignificant studies of the effect. Hedges and O1kin (1985) give a detailed analysis of the inadequacy of this method, showing that it is more and more likely to make the wrong decision as the Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 number of studies increases. While Hyman' ac- knowledged that "vote-counting raises many prob- lems"' (Hyman, 1985b, page 8), he nonetheless spent half of his critique of the ganzfeld studies showing why Honorton's count of 55% was wrong. Hyman's first complaint was that several of the studies contained multiple conditions, each of which should be considered as a separate study. Using this definition he counted 80 studies (thus further reducing the sample sizes of the individual studies), of which 25 (31%) were "successful." Honorton's response to this was to invite readers to examine the studies and decide for themselves if the varying conditions constituted separate experiments. Hyman next postulated that there was selection bias, so that significant studies were more likely to be reported. He raised some important issues about how pilot studies may be terminated and not re- ported if they don't show significant results, or may at least be subject to optional stopping, allowing the experimenter to determine the number of tri- als. He also presented a chi-square analysis that "suggests a tendency to report studies with a small sample only if they have significant results" (Hyman, 1985b, page 14), but I have questioned his analysis elsewhere (Utts, 1986, page 397). Honorton refuted Hyman's argument with four rejoinders (Honorton, 1985b, page 66). In addition to reinterpreting Hyman's chi-square analysis, Honorton pointed out that the Parapsychological Association has an official policy encouraging the publication of nonsignificant results in its journals and proceedings, that a large number of reported ganzfeld studies did not achieve statistical signifi- cance and that there would have to be 15 studies in the "file-drawer" for every one reported to cancel out the observed significant results. The remainder of Hyman's vote-counting analy- sis consisted of showing that the effective error rate for each study was actually much higher than the nominal 5%. For example, each study could have been analyzed using the direct hit measure, the sum of ranks measure or one of two other measures used for free-response a nalyses. Hyman carried out a simulation study that showed the true error rate would be 0.22 if "significance" was defined by re- quiring at least one of these four measures to achieve the 0.05 level. He suggested several other ways in which multiple testing could occur and concluded that the effective error rate in each ex- periment was not the nominal 0.05, but rather was probably close to the 31% he had determined to be the actual success rate in his vote-count. Honorton acknowledged that there was a multi- ple testing problem, but he had a two-fold response. First, he applied a Bonferroni correction and found that the number of significant studies (using his definition of a study) only dropped from 55% to 45%. Next, he proposed that a uniform index of success be applied to all studies. He used the num- ber of direct hits, since it was by far the most: commonly reported measure and was the measure used in the. first published psi ganzfeld study. He ?. then conducted a detailed analysis of the 28 studies.. reporting direct hits and found that 43% were sig- nificant at 0.05 on that measure alone. Further, he showed that significant effects were reported by six- of the 10 independent investigators and thus were not due to just one or two investigators or laborato- . ries. He also noted that success rates were very similar for reports published in.. -refereed journals and those published in unrefereed monographs and- abstracts. While Hyman's arguments identified issues such as selective reporting and optional stopping that should be considered in any meta-analysis, the de- pendence of significance levels on sample size makes the vote-counting technique almost useless for as- sessing the magnitude of the effect. Consider, for. example, the 24 studies where the direct hit meas-. . ure was reported and the chance probability of a direct hit was 0.25, the most common type of study - in in the data base. (There were four direct hit studies- with other chance probabilities and 14 that did not report direct hits.) Of the 24 studies, 13 (54%) were. "nonsignificant" at a = 0.05, one-tailed. But if the. 367 trials in these "failed replications" are com- bined, there are 106 direct hits,..z = 1.66, and p = 0.0485, one tailed. This is, reminiscent of the dilemma of Professor B in Section 3. Power is typically very low for.these studies. The median sample size for the studies reporting direct hits was 28. If there is a real effect and it increases the success probability from the chance 0.25 to an actual 0.33 (a value whose rationale will be made clear below), the power for a study with 28 trials is only 0.181 (Utts, 1986). It should be no surprise that there is a "repeatability" problem in parapsychology. 4.4 Flaw Analysis and Future Recommendations The second half of Hyman's paper consisted of a "Meta-Analysis of Flaws and Successful Outcomes"-t. (1985b, page 30), designed to explore whether or not various measures of success were related to specific flaws in the experiments. While many crit- ics have argued that the results in parapsychology can be explained by experimental flaws, Hyman's analysis was the first to attempt to quantify the relationship between flaws and significant results.. Hyman identified 12 potential flaws in the ganzfeld experiments, such as inadequate random- Appro Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT ization, multiple tests used without adjusting the significance level (thus inflating the significance level from the nominal 5%) and failure to use a duplicate set of targets for the judging process (thus allowing possible clues such as fingerprints). Using cluster and factor analyses, the 12 binary flaw variables were combined into three new variables, which Hyman named General Security, Statistics and Controls. Several analyses were then conducted. The one, reported with the most detail is a factor analysis utilizing 17 variables for each of 36 studies. Four factors emerged from the= analysis., From these, Hyman concluded that security had increased over the years, that the significance level tended to be inflated the most for the most complex studies and that both effect size and level of significance were correlated with the existence of flaws. Following his factor analysis, Hyman picked the three flaws that seemed to be most highly corre- lated with success, which were inadequate atten- tion to both randomization and documentation and the potential for ordinary communication between the sender and receiver. A regression equation was then computed using each of the three flaws as dummy variables, and the effect size for the experi- ment as the dependent variable. From this equa- tion, Hyman concluded that a study without these three flaws would be predicted to have a hit rate of 27%. He concluded that this is "well within the statistical neighborhood of the 25% chance rate" (1985b, page 37), and thus "the ganzfeld psi data base, despite initial impressions, is inadequate ei- ther to support the contention of a repeatable study or to'demonstrate the reality of psi" (page 38). Honorton discounted both Hyman's flaw classifi- cation and his analysis. He did not deny that flaws existed, but he objected that Hyman's analysis was faulty and impossible to interpret. Honorton asked psychometrician David Saunders to write an Ap- pendix to his article, evaluating Hyman's analysis. Saunders first criticized Hyman's use of a factor analysis with 17 variables (many of which were dichotomous) and only 36 cases and concluded that "the entire analysis is meaningless" (Saunders, 1985, page 87). He then noted that Hyman's choice of the three flaws to include in his regression anal. ysis constituted a clear case of multiple analysis, since there were 84 possible sets of three that could have been selected (out of nine potential flaws), and Hyman chose the set most highly correlated with effect size. Again, Saunders concluded that "any interpretation drawn from [the regression analysis] must be regarded as meaningless" (1985, page 88). Hyman's results were also contradicted by Harris and Rosenthal (1988b) in an analysis requested by Hyman in his capacity as Chair of the National Academy of Sciences' Subcommittee on Parapsy- chology. Using Hyman's flaw classifications and a multivariate analysis, Harris and Rosenthal con- cluded that "Our analysis of the effects of flaws on study.outcome lends no support to the hypothesis that ganzfeld research results are a significant function of the set of flaw variables" (1988b, page 3). Hyman and .Honorton were .in the process of preparing papers for a second round of debate when they were invited to lunch together at the 1986 Meeting of the Parapsychological Association. They discovered that they were in general agreement on several major issues, and they decided to coauthor a "Joint Communique" (Hyman and Honorton, 1986). It is clear from their paper that they both thought it was more important to set the stage for future experimentation than to continue the techni- cal arguments over the current data base. In the abstract to their paper, they wrote: We agree that there is an overall significant effect in this data base that cannot reasonably be explained by selective reporting or multiple analysis.. We continue to differ over the degree to which the effect constitutes evidence for psi, but we agree that the final verdict awaits the outcome of future experiments conducted by a broader range of investigators and according to more stringent standards (page 351]. The paper then outlined what these standards should be. They included controls against any kind of sensory leakage, thorough testing and documen- tation of randomization methods used, better re- porting of judging and feedback protocols, control for multiple analyses and advance specification of number of trials and type of experiment. Indeed, any area of research could benefit from such a careful list of procedural recommendations. 4.5 Rosenthal's Meta-Analysis The same issue of the Journal of Parapsychology in which the Joint Communique appeared also car- ried commentaries on the debate by 10 separate authors. In his commentary, psychologist Robert Rosenthal, one of the pioneers of meta-analysis in psychology, summarized the aspects of Hyman's and Honorton's work that would typically be in- cluded in a meta-analysis (Rosenthal, 1986). It is worth reviewing Rosenthal's results so that they can be used as a basis of comparison for the more recent psi ganzfeld studies reported in Section 5. Rosenthal, like Hyman and Honorton, focused only on the 28 studies for which direct hits were known. He chose to use an effect size measure Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 called Cohen's h, which is the difference between the arcsin transformed proportions of direct hits that were observed and expected: h = 2(aresin - rp-). One advantage of this measure over the difference in raw proportions- is that it can be used to compare experiments with different chance hit rates. If the observed and expected numbers of hits were identical; the effect' size would be zero. Of the 28 studies, 23 t82%) had effect sizes greater than zero, with a median effect size of 0.32 and a mean of 0.28. These correspond to direct hit rates of 0.440 and 0.38 respectively, when 0.25 -is expected by chance. A 95% ' confidence interval for the true effect size 'is from 0.11 to 0.45, corresponding to direct hit rates of from 0.30 to 0.46 when chance is 0.25. A common technique in meta-analysis is to calcu- late a "combined z," found by summing the indi- vidual z scores and dividing by the square root of the number .of studies. The result should. have a standard normal distribution if each z score. has a standard normal ..distribution. For the ganzfeld studies, Rosenthal reported a combined z of 6.60 with a. p-value.of 3.37 x 10' 11. He also reiterated Honorton's file-drawer assessment. by calculating that there would have to be 423. studies unreported to negate .the, significant effect in the 28 direct hit studies. Finally, Rosenthal acknowledged that, because of the :flaws. in the-.data' base and the potential for at least a -small file-drawer ' 'effect, the true average effect; size was probably closer to 0.18 than 0.28. He concluded, "Thus, when the accuracy rate expected under the null is ?1/4, we might estimate the ob- tained accuracy rate to be about 1/3" (1986, page 333): This is the value used for the earlier power calculation: It is worth mentioning that Rosenthal was com- missioned by the National Academy-of Sciences to prepare a background paper to accompany its 1988 report on . parapsychology. That paper (Harris and Rosenthal, 1988a) contained much of the same analysis as his commentary summarized above. Ironically, the discussion of the ganzfeld work in the National Academy Report focused on Hyman's 1985 analysis, but never mentioned the work it had commissioned Rosenthal to perform, which contra. dicted the final conclusion in the report. 5..A META-ANALYSIS OF RECENT GANZFELD EXPERIMENTS After the , initial exchange with Hyman at the 1982 Parapsychological Association Meeting, Honorton and his colleagues developed an auto- mated ganzfeld experiment that was designed to,. eliminate the methodological flaws identified by, Hyman.. The execution and reporting of the experi ments .followed .the detailed guidelines agreed upon. , by Hyman and Honorton. Using. this."autoganzfeld" experiment, 11 'experi- mental series , were conducted. by eight expert-%. menters between February 1983 and September,:. 1989, when the equipment had to be dismantled due; to lack of funding. In this .section, the results- of these experiments are summarized and com- pared to the earlier ganzfeld studies. Much of the. information is derived from Honorton et al. (1990). 5.1 The Automated Ganzfeld Procedure Like earlier ganzfeld studies, the "autoganzfeld" experiments require four participants. The first is the . Receiver (R), who attempts to identify the tar.-. get material being observed by the Sender (S). The. Experimenter (E) prepares R for the task, elicits., the response from R and supervises R's judging of the response against the four, _ potential targets.,, (Judging is double blind; E does not know which is,,,., the correct target.) The fourth participant is the labs assistant (LA) whose only task is to.instruct the computer to randomly select the target. No one;,, involved in the experiment knows the identity of .,, the target. . Both R and S are sequestered in sound-isolated,:... electrically shielded rooms. k is prepared as in earlier ganzfeld studies, with white noise and a. field of red light. In a nonadjacent room, S.watches the target material on a television and can hear R's target description ("mentation") as it is being. given. The mentation is also tape recorded. The judging process takes place immediately af- ter the 30-minute sending period. On a TV monitor in the isolated room, R views the four choices from . _ the target pack that contains the actual target. R is asked to rate each one according to how closely it matches the ganzfeld mentation. The ratings are. converted to ranks and, if the correct target is, ranked first, a direct hit is scored. The entire proc ess is automatically recorded by the computer. The,. computer then displays the. correct choice to R as'-'. feedback: There were 160 preselected targets, used with , replacement, in 10 of the 11 series. They were arranged in packets of four, and the decoys for a given target were always the remaining three in the same set. Thus, even if a particular target in a set were consistently favored by Rs, the probability'"'." a direct hit under the null hypothesis would,.( remain at 1/4. Popular targets should be no more.. Appro ed For IRel base 2003/04/18 : CIA-RDP96-00789R00270001 ON Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT likely to be. selected by the computer's random number generator than any of the others in the. set. The selection of the target by the computer is the only source of randomness in these experiments. This is an important point,.,and one that is often misunderstood. (See Utts, 1991, for elucidation.) Eighty of the targets were "dynamic," consisting of scenes from movies, documentaries, And cartoons; 80 were "static," consisting of photographs, art prints and..advertisements. The four targets within each set were all of the same type. Earlier studies indicated that dynamic targets were more likely to produce successful results, and.one of the ;goals of the new. experiments was -to test that theory. The randomization procedure used to select the target and the order of presentation for.judging.was thoroughly tested before and during the experi- ments..A detailed description is given by Honorton et al. (1990, pages 118-120). Three of the 11 series were pilot series, five were formal series with novice receivers, and three were formal series with experienced receivers. The last series with experienced receivers was the, only one that did not use the 160 , targets. Instead, it used only one set of four dynamic targets in which one target had previously received several fast place ranks and one had never received a first place rank. The receivers, none of whom had had prior exposure to that target pack, were not aware that only one target pack was being used. They each contributed one session only to the series. This will be called the "special series" in what follows. Except for two of the pilot series, numbers of trials were planned in advance for each series. Unfortunately, three of the formal series were not yet completed when the funding ran out, including the special series, and one pilot study with advance planning was terminated early when the experi- menter relocated. There were no unreported trials during the 6-year period under review, so there was no "file drawer." Overall, there were 183 Rs who contributed only one trial and 58 who contributed more than one, for a total of 241 participants and 355 trials. Only 23 Rs had -previously participated in ganzfeld experi- ments, and 194 Rs (81%) had never participated in any parapsychological research. 5.2 Results While acknowledging that no probabilistic con- clusions can be drawn from qualitative data, Hon- orton et al. (1990) included several examples of session excerpts that Rs identified as providing the basis for their target rating. To give a flavor -for the dream-like quality of the mentation and the amount of information that can be lost by only assigning a rank, the first example is reproduced here. The target was a painting by Salvador Dali called "Christ Crucified." The correct target received a first place rank. The part of the mentation R used to make this assessment read:, ... I think, of guides, like spirit guides, leading me and I. come into a court with a king. 'It's quiet.... It's like heaven. The king is some- thing like Jesus. Woman. Now I'm just sort of summersaulting through heaven .... Brooding .... Aztecs, the Sun . God .... High priest ....Fear .... Graves. Woman. Prayer . . . . Funeral . . . . Dark. Death .... Souls .... Ten Commandments. Moses .... [Honorton et al., 1990). Over all 11 series, there were 122 direct hits in the 355 trials, for a hit rate of 34.4% (exact bino- mial p-value = 0.00005) when 25% were expected ' by chance. Cohen's h is 0.20, and a 95% confidence : interval for the overall hit rate is from 0.30 to 0.39 This calculation assumes, of course, that the proba- bility of a direct hit is constant and independent across trials, an assumption that may be question- able except under the null hypothesis of no psi abilities. Honorton et al. ;(1990) also calculated effect sizes for each of the 11 series and each of the eight experimenters. All but one of the series (the first novice series) had positive effect sizes, as did all of the experimenters. The special series with experienced Rs had an exceptionally high effect size with h = 0.81, corre- sponding to .16 direct hits out of 25 trials (64%), but the remaining series -and the experimenters had relatively homogeneous effect sizes given the - amount of variability expected by chance. If the special series is removed, the overall hit rate is 32.1%, h = 0.16. Thus, the positive effects are not due to just one series or one experimenter. Of the 218 trials contributed by novices, 71 were direct hits (32.5%, h = 0.17), compared with 51 hits in the 137 trials by those with prior ganzfeld experience (37%, h = 0.26). The hit rates and effect sizes were 31% (h = 0.14) for the combined pilot series, 32.5% (h = 0.17) for the combined formal novice. series, and 41.5% (h = 0.35) for the com- bined experienced series. The last figure drops to 31.6% if the outlier series is removed. Finally, without the outlier series the hit rate for the com- bined series where all of the planned trials were completed was 31.2% (h = 0.14), while it was 35% (h = 0.22) for the combined series that were termi- nated early. Thus, optional stopping cannot account for the positive effect. Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Therewere two interesting comparisons that had been suggested by earlier work and were pre- planned in these experiments. The first was -to compare results for trials with dynamic -targets with those for static targets. In the-190 dynamic target sessions there were 77 direct hits (40%, h = 0.32) and for the. static : targets there were 45 T hits in 165 trials (27%, h = 0.05), thus indicating that dynamic :targets produced far more successful results. The second comparison of interest was whether or not the sender was a friend of the receiver. This was a choice the receiver could make. If he or she did not bring a friend, a lab member acted as sender: There were 211 trials with friends ' as senders (some of whom were also lab-staff), result- ing in 76 direct hits (36%, h = 0.24). Four trials used no sender. The remaining 140 trials used nonfriend lab staff as senders and resulted in 46 direct hits (33%, h = 0.18). Thus, trials with friends as senders were slightly more successful than those without. - Consonant with the definition of replication based on consistent effect sizes, it is informative to com- pare the ? autoganzfeld experiments with the direct hit studies in the_previous data base. The `overall success rates are extremely similar. The overall direct hit rate was 34.4% for the autoganzfeld stud- ies and was 38% for the comparable direct hit studies in 'the earlier 'meta-analysis. Rosenthal's (1986) adjustment .for flaws had placed a more con- servative estimate at 33%, very close -to - the observed 34.4% in--the new studies. One. limitation of ? this -work is that the auto- ganifeld studies, while conducted by eight experi- menters,-all. used the same equipment in-the same laboratory. Unfortunately, the. level of fund- ing available in parapsychology and the cost in time and equipment to conduct proper experiments make it difficult to amass large amounts of data across laboratories. Another autoganzfeld labora- tory is currently being constructed at the Univer- sity of Edinburgh in Scotland, so interlaboratory comparisons may be possible in the near future: Based on the ..effect size observed to date, large samples are needed-to achieve reasonable power. If there is a constant effect across all trials, resulting in 33% direct hits when 25% are expected by chance, to achieve a one-tailed significance level of 0.05 with 95% probability would require 345 sessions. We end this section by returning to the aspirin and heart attack example in Section 3 and expand- ing a -comparison noted by Atkinson, Atkinson, Smith and Bem (1990, page 237). Computing the equivalent of Cohen's - h for comparing obser- ved heart attack rates in the aspirin and placebo groups results in h = 0.068. Thus, the effect size observed in the ganzfeld data base is triple the` much publicized effect of aspirin on heart attacks: .6..:OTHER META ANALYSES IN PARAPSYCHOLOGY Four ' additional meta-analyses have been con- ducted in various areas of parapsychology since the original ganzfe1d meta-analyses were - reported: Three . of the four analyses focused on evidence of psi abilities, ' while the fourth examined the rela`-.-4 tionship between extroversion and -'psychic funs- tioning. In this section, each~?af'the four analyses will be briefly summarized. .,..-f There `are only a handful l of English-language journals and proceedings in parapsychology, so retrieval of the relevant studies in each of the four cases was simple to accomplish by searching those sources in detail and by searching other' bibliographic data bases for keywords. Each analysis included an overall summary, an- analysis of the quality of the studies versus the size of the effect and a "file=drawer" analysis to deter- mine the possible number of unreported studies'' Three of the four also contained comparisons across 6.1 Forced-Choice Precognition Experiments Honorton and Ferrari (1989).. analyzed forced choice experiments conducted from-1935 to 1987, in - which -the. target material was randomly selected' after the subject had attempted to predict what it' would be. The time delay in selecting- the target ranged from under a second to one year. Target. ? material included items: as diverse as ESP cards and automated random number generators. Two investigators, S. G. Soal and Walter J. Levy, were` not included because some of their work has been suspected to be fraudulent. Overall Results. There were 309 studies -re- ported by 62 senior authors, including more than":,. - 50,000 subjects and nearly two million individual:," trials. Honorton and Ferrari used z/' as the measure of effect size (ES) for each study, where n- was the , number of Bernoulli trials in the study:. They reported a mean ES of 0.020, and a mean-,=. z-score of 0.65 over all studies. They also reported a",' combined z of 11.41, p = 6.3 x 10'25. Some 30%'.: (92) of the studies were statistically significant. ae. a = 0.05. The mean ES per investigator was 0.033, ? ., and the significant results were not due to just a:: few investigators. Quality. - Eight dichotomous quality measures were assigned to: each study, resulting in possible w. Approved For Release 2003/04/18 : CIA-RDP96-00789R00270001Q001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT scores from zero for the lowest quality, to eight for the highest. They included features such as ade- quate randomization, preplanned analysis and au- tomated recording of the results. The correlation between study quality and effect size was 0.081, indicating a slight tendency for 'higher quality studies to be more successful, contrary to claims by critics that the opposite would be true. There was a clear relationship between quality and year of publication, presumably because over the years experimenters in parapsychology have responded to suggestions from critics for improving their methodology. File Drawer. Following Rosenthal (1984), the authors calculated the "fail-safe N" indicating the number of unreported studies that would have to be sitting in file drawers in order to negate the signifi- cant effect. They found N = 14,268, or a ratio of 46 unreported studies for each one reported. They also followed a suggestion by Dawes, Landman and Williams (1984) and computed the mean z for all studies with z > 1.65. If such studies were a ran- dom sample from the upper 5% tail of a N(0,1) distribution, the mean z would be 2.06. In this case it was 3.61. They concluded that selective reporting could not explain these results. Comparisons. Four variables were identified that appeared to have a systematic relationship to study outcome. The first was that the 25 studies using subjects selected on the basis of good past performance were more successful.: than the 223 using unselected subjects,: with mean effect. sizes.:of 0.051 and 0.008, respectively. Second, the 97 stud- ies testing subjects individually were more success- ful than the 105 studies that used group testing; mean effect sizes were 0.021 and 0.004, respec- tively. Timing of feedback was the third moderat- ing variable, but information was only available for 104 studies. The 15 studies that never told the subjects what the targets were had a mean effect size of -0.001. Feedback after each trial produced the best results, the mean ES for the 47 studies was 0.035. Feedback after each set of trials re- sulted in mean ES of 0.023 (21 studies), while delayed feedback (also 21 studies) yielded a mean ES of only 0.009. There is a clear ordering; as the gap between time of feedback and time of the actual guesses decreased, effect sizes increased. The fourth variable was the time interval be- tween the subject's guess and the actual target selection, available for 144 studies. The best results were for the 31 studies that generated targets less than a second after the guess (mean ES = 0.045), while the worst were for the seven studies that delayed target selection by at least a month (mean ES = 0.001). The mean effect sizes showed a clear trend, decreasing in order as the time interval increased from minutes to hours to days to weeks to months. 6.2 Attempts to Influence Random Physical Systems Radin and Nelson (1989) examined studies de- signed to test the hypothesis that "The statistical output of an electronic RNG (random number gen- erator] is correlated with observer intention in ac- cordance with prespecified instructions" (page 1502). These experiments typically involve RNGs based on radioactive decay, electronic noise or pseu- dorandom number sequences seeded with true ran- dom sources. Usually the subject is instructed to try to influence the results of a string of binary trials by mental intention alone. A typical protocol would ask a subject to press a button (thus starting the collection of a fixed-length sequence of bits), and then try to influence the random source to produce more zeroes or more ones. A run might consist of three successive button presses, one each in which the desired result was more zeroes or more ones, and one as a control with no conscious intention. A z score would then be computed for each button press. The 832 studies in the analysis were conducted from 1959 to 1987 and included 235 "control" stud- ies, in which the output of the RNGs were recorded but there was no conscious intention involved. These were usually conducted before and during the.. experimental series, as tests of the RNGs. Results. The effect size measure used was again z / V n--, where z was positive if more bits of the specified type were achieved. The mean effect size for control studies was not significantly different from zero (-1.0 x 10'5). The mean effect size for the experimental studies was also very small, 3.2 x 10'4, but it was significantly higher than the mean ES for the control studies (z = 4.1). Quality. Sixteen quality measures were defined and assigned to each study, under the four general categories of procedures, statistics, data and the RNG device. A score of 16 reflected the highest quality. The authors regressed mean effect size on mean quality for each investigator and found a slope of 2.5 x 10' with standard error of 3.2 x 10-5, indicating little relationship between quality and outcome. They also calculated a weighted mean effect size, using quality scores as weights, and found that it was very similar to the unweighted mean ES. They concluded that "differences in methodological quality are not significant predictors of effect size" (page 1507). File Drawer. Radin and Nelson used several methods for estimating the number of unreported Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 studies (pages 1508-1510). Their estimates ranged from 200 to 1000 based on models assuming that all significant studies were reported. They calculated the fail-safe N to be 54,000. 6.3 Attempts to Influence Dice Radin and Ferrari (1991) examined 148 studies, published. from 1935 to .1987, designed to test whether or not.-consciousness can influence, the results of tossing.dice. They also found 31 "con- trot" studies ..in which no conscious intention was involved. Results. The effect size measure used was z / V, where z was based on the number of throws in which'the.,die landed with the desired face (or, faces) up, in n throws. The weighted mean ES for the experimental studies was 0.0122 with a stan- dard error of 0.00062; for the control studies the mean and standard error were 0.00093 and 0.00255, respectively. Weights for. each, studi were de- termined. by quality, giving more weight to high quality studies.. Combined z.scores for the exper- imental and control studies were reported by Radin and Ferrari to be 18.2 and 0.18, respectively. . Quality. Eleven dichotomous quality measures were assigned, ranging from automated recording to whether or not control studies were interspersed with the experimental studies. The final quality score for each study combined these with informa. tion, on method.of tossing the dice, and.withsource of subject (defined below). A regression of quality score versus effect size resulted in a slope of - 0.002, with. a standard error of 0.0011. However, when effect sizes were weighted by sample size, there was a significant relationship between quality and ef- fect size, leading Raclin and Ferrari to conclude that higher-quality studies produced lower weighted effect sizes. File Drawer. Radin and Ferrari calculated Rosenthal's fail-safe. N for this analysis to be 17,974. Using the assumption that all significant studies were reported, they estimated the number of unreported studies to be 1152. As a final assess- ment, they compared studies published before and after 1975, when the Journal of Parapsychology adopted an official policy of publishing nonsigni- ficant results. They concluded, based on that an- alysis, that more nonsignificant studies were published after 1975, and thus "We must consi- der the overall (1935-1987) data base as suspect with respect to the filedrawer problem." Comparisons. Radin and Ferrari noted that there was bias in both the experimental and control studies across die.face. Six was the face most likely to come up, consistent' with the observation that' it has the least mass. Therefore, they 'examined re- sults for the subset of 69 studies in which targets were evenly balanced among the six faces. They still found. a significant effect, with mean and stan- dard error for effect size of 8.6 x 10-3 and 1.1 x 10 - 3, respectively. The combined z was 7.617 for these studies. They also compared effect sizes across types of subjects used in the studies, categorizing them as ' unelected, experimenter and other subjects, exper- imenter as sole subject, and specially selected sub=jects. Like Honorton and Ferrari (1989), they found the highest mean ES for studies with selected subjects; it was approximately 0.02, more than twice that for unselected subjects. 6.4 Extroversion and ESP Performance Honorton, Ferrari and Bem. (1991) conducted a meta-analysis to examine the relationship between scores on tests of extroversion and scores on psi-related tasks. They found 60 studies by 17 investigators,, conducted from 1945 to 1983. Results. The effect size measure used for this analysis was the correlation between each subject's extroversion score and ESP score. A variety of.. measures :had been used for both scores across stud-.- ies, so various correlation . coefficients were used. Nonetheless, a stem and leaf diagram .of the corre- lation showed an approximate bell shape with` mean and standard deviation of 0.19 and ' 0.26, . respectively, and with an additional outlier at r = 0.91. Honorton et al. reported that when weighted. by degrees of freedom, the weighted mean r was ' 0.14, with a 95% confidence interval covering 0.10 to 0.19. Forced-Choice versus Free-Response Re'- suits. Because forced-choice and free-response, tests differ qualitatively, Honorton et al. chose to exam- ine their relationship to extroversion separately. They found that for free-response studies there was a significant correlation between extroversion and. ESP scores, with mean r = 0.20 and z = 4.46. Fur- ther, this effect was homogeneous across both investigators and extroversion scales. For forced-choice studies, there was a significant, correlation between ESP and extroversion, but only for those studies that reported the ESP results to the subjects before measuring extroversion. Honorton et al. speculated that the relationship was an - artifact, in which extroversion scores were temporarily inflated as a result of positive feedback on ESP performance. Confirmation with New Data Following the extroversion/ESP meta-analysis, Honorton et al. attempted to confirm the relationship using the autoganzfeld data base. Extroversion scores, based on the Myers-Briggs Type Indicator were available for 221 of the 241 subjects who had..' narticinated in auto -anzfeld studies. Appro ed ForRel base 2003/04/18 : CIA-RDP96-00789R00270001 001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT ON The correlation between extroversion scores and ganzfeld rating scores was r = 0.18, with a 95% confidence interval from 0.05 to 0.30. This is con- sistent with the mean correlation of r = 0.20 for free-response experiments, , determined from the meta-analysis. These correlations indicate that :ex- troverted subjects can -produce higher -scores in free-response ESP tests. 7.' CONCLUSIONS Parapsychologists often make a distinction be- tween "proof-oriented research" and "process- oriented research.".The. -former is typically con- ducted to test the hypothesis that-psi Abilities exist, while the latter is 'designed to answer' questions about how psychic functioning works. Proof- oriented research has dominated the literature in . parapsychology. . Unfortunately, many of the studies used. small samples and would :thus be nonsignificant even if a moderate-sized effect exists. The recent focus on meta-analysis in parapsy- chology has revealed that there are small but consistently nonzero effects across studies, experi- menters and laboratories. The sizes of the effects in forced-choice studies appear to be comparable to those, reported in some medical studies that had been heralded as breakthroughs. (See Section 5; also Honorton and Ferrari, 1989, .page 301.) Free- response studies show effect sizes of far ' greater magnitude. A promising direction for future process-oriented research is to examine the causes of individual differences in psychic functioning. The ESP/ex- troversion meta-analysis is a step in that 'direction. In keeping with the idea of individual differ- ences, Bayes and empirical Bayes methods would appear to make more sense than the classical infer- ence methods commonly used, since they would allow individual abilities and beliefs to be modeled. Jeffreys (1990) reported a Bayesian analysis of some of the RNG experiments and showed that conclu- sions were closely tied to prior beliefs even though hundreds of thousands of trials were available. It may be that the nonzero effects observed in the meta-analyses can be explained by something other than ESP, such as shortcomings in our understand- ing of randomness and independence. Nonetheless, there is an anomaly that needs an explanation. As I have argued elsewhere (Utts, 1987), research in parapsychology should receive more support from the scientific community. If ESP does not exist, there is little to be lost by erring in the -direction. of further research, which may in fact uncover other anomalies. If ESP does exist, there is much to be much to be gained by discovering how to enhance and apply these abilities to important world problems. ACKNOWLEDGMENTS I would like to thank Deborah Delany, Charles Honorton, Wesley Johnson, Scott Plous and an anonymous,reviewer .for their helpful comments on an earlier draft of this paper, and Robert Rosenthal and Charles Honorton for discussions that helped clarify details. REFERENCES ATKINSON, R. L., ArxtNsoN. R. C., SMrnI, E. E. and $EM, D. J. (1990). Introduction to Psychology, 10th ed. Harcourt Brace Jovanovich, San Diego. BELOFF, J. (1985). Research strategies for dealing with unstable phenomena. In The Repeatability Problem in Parapsychol- ogy (B. Shapin and L. Coly, eds.) 1-21. Parapsychology Foundation. New York. BLACKMORE, S. J. (1985). Unrepeatability: Parapsychology's only finding. In The Repeatability Problem in Parapsychology (B. Shapin and L. Coly, eds.) 183-206. Parapsychology Foundation, New York. BURDICK. D. S. and Ksux,:E. F. (1977).-Statistical methods in parapsychological research. In Handbook.of Parapsychology (B. B. Wolman, ed.) 81-130. Van Nostrand Reinhold. New York. CAMP. B. H.11937). (Statement in Notes Section.) Journal of Parapsychology 1305. COHEN, J. (1990). Things I have learned (so far). American Psychologist 45'1304-1312. CoovER, J. E. (1917). Experiments in Psychical Research at Leland Stanford Junior University. Stanford Univ. DAwES, R. M., LANDMAN, J. and WILLIAMS,,J. (1984). Reply to Kurosawa. American Psychologist 39 74-75. DIACONIS, P. (1978). Statistical problems in ESP research. Sci- ence 201 131-136. DoMMEYER, F. C. (1975). Psychical research at Stanford Univer- sity. Journal of Parapsychology 39 173-205. DRUCKMAN. D. and SwErs, J. A., eds. (1988) Enhancing-Human Performance: Issues, Theories, and Techniques. National Academy Press, Washington, D.C. EDGEWORTH. F. Y. (1885). The calculus of probabilities applied to psychical research. In Proceedings of the Society for Psychical Research 3 190-199. EDGEwoRTH. F. Y. (1886). The calculus of probabilities applied to psychical research. II. In Proceedings of the Society for Psychical Research 4 189-208. FELLER, W. K. (1940). Statistical aspects of ESP. Journal of Parapsychology 4 271-297. FELLER, W. K. (1968). An Introduction to Probability Theory and Its Applications 1. 3rd ed. Wiley. New York. FISHER, R. A. (1924). A method of scoring coincidences in tests with playing cards. In Proceedings of the Society for Psychi- cal Research 34 181-185. FISHER, R. A. (1929). The statistical method in psychical re- search. In Proceedings of the Society for Psychical Research 39189-192. GALLUP, G. H., JR., and NEWPORT, F. (1991). Belief in paranor- mal phenomena among adult Americans. Skeptical Inquirer 15137-146. GARDNER. M. J. and ALrMAN. D. G. (1986). Confidence intervals rather than p-values: Estimation rather than hypothesis L.L.d:....I 00) vea_7cn Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 GsujsoR.E, J. B. (1989). Randomness and the search for psi. ..Journal of Parapsychology 53 309-340. GILMORE, J. B. (1990). Anomalous significance in pararandom and psi-free domains. Journal of Parapsychology 54 53-58. GREELEY, A. (1987). Mysticism goes mainstream. American Health 7 47-49. GREENHOUSE, J. B. and GREENHOUSE, S. W. (1988). An aspirin a day ... ? Chance 1:24-31. GaEENwooD,. J. A. and STuARr, C. E. (1940). A review of Dr. Feller's critique. Journal of Parapsychology 4 299-319. HACKING, L (1988). Telepathy: Origins of randomization' in ex perimental design. Isis 79 427-451. HANSEL, C. E. M. (1980). ESP and Parapsychology: A Critical Re-evaluation. Prometheus Books, Buffalo, N.Y. HARRIS, M. J. and RosENTHAt., R. (1988a). Interpersonal Ex- pectancy Effects and Human Performance Research. Na. tional Academy Press, Washington, D.C. HARRIS, M. J. and ROSENTHAt., R. (1988b). Postscript to Interper. sonal Expectancy Effects and Human Performance Research. National Academy Press, Washington, D.C. HEDGES, L. V. and OLxzN, I. (1985). Statistical Methods for Meta Analysis. Academic, Orlando, Fla. HONORTON, C. (1977). Psi and internal attention states. In Handbook of Parapsychology (B. B. Wolman, ed.) 435-472. ? Van Nostrand Reinhold, New York. HoNoRTON, C. (1985a). How to evaluate and improve the repli- cability of parapsychological effects. In The Repeatability Problem in Parapsychology (B. Shapin and L. Coly, eds.) 238-255. Parapsychology Foundation, New York. HONORTON, C. (1985b). Meta-analysis of psi ganzfeld research: A response to Hyman. Journal of Parapsychology 49 51=91. HONORTON, C., BERGER, R. E., VARvormts, M. P., QUANT, M., DERR, P., ScHEcHTER, E. I. and FERRARI, D. C. (1990). Psi communication in the ganzfeld: Experiments with an automated testing system and a comparison with a meta- analysis of earlier 'studies. Journal of Parapsychology 54 99-139. HoNORTON, C. and FERRARI, D. C. (1989). "Future telling": A meta-analysis of forced-choice precognition experiments, 1935-1987. Journal of Parapsychology 53 281-308. HONORTON, C.. FERRARI. D. C. and BEM, D. J. (1991j. Extraver- sion and ESP performance: A meta-analysis and a new confirmation. Research in Parapsychology 1990. The Scare- crow Press, Metuchen, N.J. To appear. HYMAN, R. (1985a). A critical overview of parapsychology. In A Skeptic's Handbook of Parapsychology (P. Kurtz, ed.) 1-96. Prometheus Books, Buffalo, N.Y. HYMAN, R. (1985b). The ganzfeld psi experiment: A critical appraisal. Journal of Parapsychology 49 3-49. HYMAN, R. and HovoRTON, C. (1986). Joint communique: The psi ganzfeld controversy. Journal of Parapsychology 50 351-364. IVERSEN, G. R., LONGCOR, W. H., MosrELLER, F., Gu.sERT, J. P. and Yourz, C. (1971). Bias and runs in dice throwing and recording: A few million throws. Psychometrika 36 1-19. JEFFREYS, W. H. (1990). Bayesian analysis of random event generator data. Journal of Scientif is Exploration 4 153-169. LINDI.EY, D. V. (1957). A statistical paradox. Biometrika 44 187-192. MAUSKOPF, S. H. and MCVAUGH, M. (1979). The Elusive Science: Origins of Experimental Psychical Research. Johns Hopkins Univ. Press. McVAUGH, M. R. and MAUSKOPF, S. H. (1976). J. B. Rhine's Extrasensory Perception and its background in psychical research. Isis 67.161-189. NEUUEP. J. W., ed. (1990). Handbook of replication research in CPYRGHT I the behavioral and social sciences. Journal of Social Behao- for and Personality 5 (4) 1-510. OFFICE of TacINoLooY ASSESSMENT (1989). Report of a work- shop on experimental parapsychology. Journal of, the Amer- ican Society for Psychical Research 83 317-339. PALMER, J. (1989). A reply to Gilmore. Journal of Parapsychol- ogy 53441-344:. .. PALMER, J..(1990). Reply to Gilmore: Round two. Journal of Parapsychology 54 59-61. PALMER, J. A., HoNOSroN, C. and Urrs, J. (1989). Reply to the. National Research Council study on parapsychology. Jour- nal ofthe American Society. for Psychical Research 83 31-49. RADIN, D. I. and FERRARI, D. C. (1991). Effects of consciousness on the fall. of dice: A meta-analysis. Journal of Scientific Exploration'5.61-83.. RADLN, D_1. and NELSON; R. D..(1989)....Evidence for conscious. ness-related anomalies' in random physical systems."Foun. ' dations of Physics '19 1499-1514'---!.. RAO, K. R..(1985). Replication in conventional and controversial sciences. In The Repeatability Problem in Parapsychology (B. Shapin and L. Coly, eds.) 22-41. Parapsychology Foun- dation, New York. RHINE, J. B. (1934). Extrasensory Perception. Boston Society for Psychical Research, Boston. (Reprinted ? by Branden Press, 1964.) RHINE, J. B. (1977). History of experimental studies. In Hand book. of Parapsychology (B. B. Wolman, ed.) 25-47. Van' Nostrand Reinhold, New York. RicHET, C. (1884). IA suggestion mentale et le calcul des'probat''. bilites. Revue Philosophique I8 608-674. ROSEhTHAL, ?R. (1984). Meta Analytic Procedures for Social Re- search. Sage, Beverly Hills. . ROSENrHAL, R. (1986). Meta-analytic procedures and the nature' of replication: The ganzfeld debate. Journal of Parapsychol- ogy 50 315-336. RosENnHAt, R. (1990a). How are we 'doing in soft psychology? American Psychologist 45 775-777. RosE.%-rHAL, R. (1990b). Replication 'in behavioral research. Journal of Social Behavior and Personality 5 1-30. SAUNDERS, D. R. (1985). On Hyman's factor analysis. Journal of Parapsychology 4986-88. SHAPIN, B. and COLY, L., eds. (1985). TheRepeatability Problem in Parapsychology. Parapsychology Foundation, New York. SPENCER-BROWN, G. (1957).. Probability and Scientific Inference. Longmans Green, London and New York. STUART, C. E. and GREENWOOD, J. A. (1937). A review of criti- cisms of the mathematical evaluation of ESP.data. Journal' of Parapsychology 1,295-304. TVERSKY, A. and KAHNEMAN, D. (1982). Belief in the law of small numbers. In Judgment Under Uncertainty: Heuristics and Biases (D. Kahneman,-P. Slovic and A. Tversky, eds.) 23-31. Cambridge Univ. Press. Urrs, J. (1986). The ganzfeld debate: A statistician's perspec- tive. Journal of Parapsychology 50 395-402. Urrs, J. (1987). Psi, statistics, and society. Behavioral and Brain Sciences 10 615-616. Urns, J. (1988). Successful replication versus statistical signifi- cance. ?. Journal of Parapsychology 52 305-320. Urrs, J. (1989). Randomness and randomization tests: A reply to Gilmore. Journal of Parapsychology 53 345-351. Urrs, J. (1991). Analyzing free-response data: A progress report... In Psi Research Methodology: A Reexamination (L. Coly, ed.). Parapsychology Foundation, New York. To appear. WILKS, S. S. (1965a). Statistical aspects. of expeirments in'. telepath. N.Y. 'Statistician 16 (6) 1-3. `' WILKS, -S. S. (1965b). -Statistical aspects of experiments in ' telepathy. N.Y. Statistician 16 (7) 4-6. Appro ed ForRel base 2003/04/18 : CIA-RDP96-00789R00270001~001-1 0- C11YRGHT Comment M. J. Bayarri and James Berger Approved For Release 2003/04/18.: CIA-RDP96-00789R002700010001-1 1. INTRODUCTION There are many fascinating issues discussed in this - paper. Several concern parapsychology itself and the interpretation -of statistical methodology therein. We are not experts in parapsychology, and so have only one comment concerning such mat- ters: In Section 3 we briefly discuss the need to switch from P-values to Bayes factors in discussing evidence concerning parapsychology. A more general issue raised in the-paper is that of replication. It is quite illuminating to consider the issue of replication from a Bayesian perspec- tive, and this is done in Section 2 of our discussion. 2. REPUCATION Many insightful observations concerning replica- tion are given in the article, and,these spurred us to determine if they could be quantified within Bayesian reasoning. Quantification requires clear delineation of the possible purposes of replication, and at least two are obvious. The first is simple reduction of random error, achieved by obtaining more observations from the replication. The second purpose is to search for possible bias in the original experiment. We use "bias" in a loose sense here, to refer to any of the huge number- of ways - in which the effects being measured by the experiment can differ from the actual effects of interest. Thus a clinical trial without a placebo can suffer a placebo "bias"; a survey can suffer a "bias" due to the. sampling frame being unrepresentative of the actual population; and possible sources of bias in parapsychological experiments have been extensively discussed. Replication to Reduce Random Error If the sole goal of replication of an experiment is to reduce random error, matters are very straight- forward. Reviewing the Bayesian way of studying this issue is, however, useful and will be done through the following simple example. M. J. Bayarri is Titular Professor, Department of Statistics and Operations Research, University of Valencia, Avenida Dr. Moliner 50, 46100 Burjassot, Valencia, Spain. James Berger is the Richard M. Brumfield Distinguished Professor of Statistics, Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Expt.E 1. Consider the example from Tversky and Kahnemann (1982), in which an experiment results in a standardized test statistic of zl = 2.46. (We will assume normality to keep computations trivial:) The question is: What is the highest value of z2 in' a second set of data that would be consid- ered a failure to replicate? Two possible precise versions of this question are: Question 1: What is the probability of observing z2 for which the null hypothesis would be rejected in the replicated ex- periment? Question 2: What value of z2 would leave one's overall opinion about the null hypothe- sis unchanged? Consider the simple case where Z, - N(zl 0, 1) and (independently) Z2 - N(z210, 1), where 0- is the mean and 1 is the standard deviation of the normal distribution. Note that we are considering the case in which no experimental bias is suspected and so the means for each experiment are assumed to be the same. Suppose that it is desired to test Ho: 0 0, and suppose that initial prior` .,opinion about 0 can 'be. described by the noninformative prior - u(9) = 1. We consider the one-sided testing problem with a constant prior in this section, be- cause it is 'known that 'then the posterior probabil- ity of H0, to be denoted by P(Ho I data), equals the P-value, allowing us to avoid complications arising from differences between Bayesian and classical answers. After observing zi = 2.46, the posterior distribu- tion of 0 is ir(0 I zi) = N(0;12.46, 1). Question 1 then has the answer (using predictive Bayesian reasoning) P(rejecting at level a I r?? ~?? 1 cf oo c. - 2.46 where 4, is the standard normal cdf and cQ is the (one-sided) critical value corresponding to the level, a, of the test. For instance, if a = 0.05, then this probability equals 0.71.78, demonstrating that there is a quite substantial probability that the second experiment will fail to reject. If a is chosen: to be the observed significance level from the-first exper- Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 380 J. u'rrs second experiment will reject is just 1/2. This is nothing but a statement of the well-known martin- gale property of Bayesianism, that what you "ex- pect" to see in the future is just what you know today. In' a sense, therefore, question 1 is exposed as being.uninteresting. Question 2 more properly focuses on the fact that. the. 'stated goal .of replication here is simply, to. reduce: .uncertainty in - stated conclusions. :The an swer to the.question follows.immediately..from not- ing that the posterior from the combined data. (zi.z2) x(0I.zi,,. z2) N(0I (zi +.z2)/2,1./%/ ), so that P(H0Idata) = 4(-(Zi + z2)/V)? Setting this equal to P(H0 zi) and solving for z2 yields. z2 = (VT - 1U zi = 1.02. Any value' of z2 greater than this will increase the total evidence against Ho, while any value smaller than 1.02 will decrease the evidence. Replication to Detect Bias The aspirin. example dramatically raises the is-. sue of '.bias' detection as a motive for replication. Professor Utts observes that replication 1 gives results that", are.. fully compatible with those of the original, study,.~which could be interpreted as sug- gesting, that there is no bias in the. original study, while replication' 2 would raise serious concerns of bias: We became very interested in the implicit suggestion that replication 2 would thus lead to less.overall evidence. against the null hypothesis than would replication 1, even though in isolation replication 2 was much more "significant" than was replication 1. In attempting to see if this is so, we considered the Bayesian approach. to study of bias within the framework of the aspirin example. EXAMPLE 2. For simplicity in the aspiring exam- ple, we reduce consideration to 0 true difference in heart attack rates between aspirin and placebo populations multiplied by 1000; Y difference in observed heart attack rates be- tween aspirin and placebo groups in original study multiplied by 1000; X; = difference in observed heart attack rates be- tween aspirin and placebo groups in Replica- tion i`niultiplied by 1000. We assume that the replication :studies. are ex- tremely well. designed and implemented, so that 0 CPYRGHT one is very confident that the,, X, have mean 0. Using normal approximations for convenience, the data can be summarized as Xi - N(xi 10, 4.82), X2 - N(x210, 3.63) with actual observations ' xi = 7.704 and x2 =. 13.07,.. . Consider, now. the bias issue. We assume that the original., experiment is somewhat suspect in this,, regard, and ,.,we will model, bias .by defining the mean of Y. to be where f is the unknown bias. Then the data in the. original experiment can be summarized by Y - N(y I 'q, 1.54), with the actual observation being y = 7.707. Bayesian analysis requires specification of a prior distribution, ir(f), for the suspected amount of bias Of particular interest then are the posterior distr-,._ bution of 0, assuming 'replication i has been .,. performed, given by. .Jr(a) y, x,) where aril is the variance (4.82? or .3.63) from repli cation , i; and the posterior probability of Ho, given by . P(H01 y, xJ =J co - (y-,0) 1.54 a? + 1.542 or, a;2 + 1.542 Recall that our goal here was to see if Bayesian analysis can reproduce the intuition that the origi- nal experiment could be trusted if replication 1 had., been done, while it could not be trusted (in spite of its much larger sample size) had replication 2 been performed. Establishing this requires finding a prior distribution 7r(O) for which 7r((3I y, x,) has little effect on P(H0I y, xi), but 7r((31 y, x2) has a large effect on P(H0 I y, x2). To achieve the first objective, a(3) must be tightly concentrated near zero. To.achieve the second, x(P) must be such that large I y - x21, which suggests presence of a large. . Xx(s I y x) ds. bias, can result in a substantial shift of posterior; mass, for # away from zero. . is Approved For,RRel base 2003/04/18 : CIA-RDP96-00789R00270001(p001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT do A sensible candidate for the prior density 7r(j5) is the Cauchy (0, V) density ^VVII 7V (I + (0/V)2] . Flat-tailed densities, such as this, are well known to have the property that when discordant data is observed (e.g., when (I y - x2 1 is large), substan- tial mass shifts away from the prior center towards the likelihood center. It is easy to see that a normal prior for 0 can not have the desired behavior. Our first surprise in consideration of these priors was how small V needed to be chosen in order for P(Ho I y. x1) to be unaffected. by the. bias. For instance, even with V = 1.54/100 (recall that 1.54 was the standard deviation of Y from the original experiment), computation yields P(Ho I y, x1) = 4.3 x 10-5, compared with the P-value (and poste- rior probability from -the original experiment as- suming no bias) of 2.8 x 10-7. There is a clear lesson here; even very small suspicions of bias can drastically alter a small P-value. Note that replica- tion 1 is very consistent with the presence of no bias, and so the posterior distribution for the bias remains tightly concentrated near zero; for in- stance, the mean of the posterior for 16 is then 7.2 x 10-6, and the standard deviation is 0.25. When we turned attention to replication 2, we found that it did not seriously change the prior perceptions of bias. Examination quickly revealed the reason; even the maximum likelihood. estimate of the bias is no more than 1.4 standard deviations from zero, which is not enough to change strong prior beliefs. We, therefore, considered a third experiment, defined in Table 1. Transforming to approximate normality, as before, yields X3-N(x310,3.48), with x3 = 22.72 being the actual observation. The maximum likelihood estimate of bias is now 3.95 standard deviations from zero, so there is potential for a substantial change in opinion about the bias. Sure enough, computation when V = 1.54/100 yields that E[01 y, x31 = -4.9 with (posterior) standard deviation equal to 6.62, which is a dra- matic shift from prior opinion (that 0 is Cauchy (0, TABLE 1 Frequency of heart attacks in replication 3 Aspirin 5 2309 Placebo 54 2116 381 1.54/100)). The effect of this is to essentially ignore the original experiment in overall assessments of evidence. For instance, P(Ho I y, x3) = 3.81 x 10 -11., which is very close to P(Ho ( x3) = 3.29 x 10-11. Note that, if 0 were set equal to zero, the overall posterior probability of Ho (and P-value) would be 2.62 x 10 -'3. Thus Bayesian reasoning can reproduce the intu- ition that replication which indicates bias can cast considerable doubt on the original experiment, while replication which provides no evidence of bias leaves evidence from the original experiment intact. Such behavior seems only obtainable, how- ever, with flat-tailed priors for bias (such as the Cauchy) that are very concentrated (in comparison with the experimental standard deviation) near zero. 3. P-VALUES OR BAYES FACTORS? Parapsychology experiments usually consider testing of Ho: No parapsychological effect exists. Such null hypotheses are often realistically repre- sented as point nulls (see Berger and Delampady, 1987, for the reason that care must be taken in such representation), in which case it is known that there is a large difference between P values and posterior probabilities (see Berger and Delampady, 1987, for review). The article by Jefferys ?(1990) dramatically illustrates this, showing that a very small P-value can actually correspond to evidence for Ho when considered from a Bayesian perspec- tive. (This is very related to the famous "Jeffreys" paradox.) The argument in favor of the Bayesian approach here is very strong, since it can be shown that the conflict holds for virtually any sensible prior distribution; a Bayesian answer can be wrong if the prior information turns out to be inaccurate, but a Bayesian answer that holds for all sensible priors is unassailable. Since P-values simply cannot be viewed as mean- ingful in these situations, we found it of interest to reconsider the example in Section 5 from a Bayes factor perspective. We considered only analysis of the overall totals, that is, x = 122 successes out of n = 355 trials. Assuming a simple Bernoulli trial model with success probability 0, the goal is to test Ho:0 = 1 /4 versus H1:0 * 1/4. To determine the Bayes factor here, one must specify g(0), the conditional prior density on Ht. Consider choosing g to be uniform and symmetric, that is, 1 1 1 G,.(0) = Tr' f o r 4- r 5 0 5 4+ r, 10, otherwise. Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 $82' 'a:'tri~rs Crudely, r could be considered to be the maximum change in success'.probability that one would expect given that ESP.exists. Also, these distributions are the "extreme points" over the class of symmetric unlmodal conditional densities, so answers that hold over this class are also representative of -answers' over a much larger class. Note that here r`< 0.25 (because 0 0s1); for the"given data "the'6`> 0.5 are essentially irrelevant; but' if it 'were `deemed- important to take c them''into account: one 'could use the more sophisticated 'binomial analysis in- Berger' and Delampady (1987); For 1g,. the Bayes factor of Hl to H0, which is to be interpreted as the relative odds for the hypothe. ses provided by the data, 'is given by, B(r) = (1 /(2r)) I -;r 61(l - 6)355=122 d6 (1/4)1(1 - 1/4)5x5-i22 = 2r (63.13) r -.0937)- + - (r + .0937) . ( .0252 .0252 This. is graphed .in:-Figure i. :The P-value for this problem. was 0.00005, indi- cating:. overwhelming evidence against Ho from a classical . perspective. In contrast to the situation studied by Jefferys (1990), the - Bayes factor here does, not. completely reverse the conclusion, show. ing that there are. very reasonable values of r for which. the evidence against Ho is moderately strong, for example 100/1 or 200/1. Of course, this evidence is. probably not of. sufficient strength to overcome strong prior opinions against Ho (one Comment This paper offers readers interested in statistical science multiple views of the controversial history of parapsychology and how statistics has con- tributed to its development. It first provides an Ree Dawson is Senior Statistician, New England Biomedical Research Foundation, and Statistical Consultant, RFE/RL Research Institute. Her mail- ing address is 177 Morrison Avenue, Somerville, Massachusetts 02144. Fir.. 1.. The Bayes_ factor. of Hi to..Ho . as a function of r, the maximum change in sriecess probability' that 'is expected given. that ESP'exists, for the gaiufeld experiment. obtains final posterior odds by multiplying prior odds by' the : ' Bayes factor). To properly assess strength of: evidence, we feel that such Bayes factor computations should become standard in parapsy-' chology. As mentioned by Professor 'Utts, Bayesian meth- ods have 'additional potential in situations ' such as this, by allowing unrealistic models of iid trials to' be replaced by hierarchical models reflecting differ ing abilities among subjects. ACKNOWLEDGMENTS M. J. Bayarri's research was supported in partby the Spanish Ministry :of Education and Science under DGICYT Grant BE91-038, while visiting Purdue University. James Berger's research was supported by NSF Grant DMS-89-23071. account of how both design and inferential aspects of statistics have been pivotal issues in evaluating: the outcomes outcomes of experiments that study psi abili- - ties. It then emphasizes how the idea of science asp-: replication has been key in this field in which results have not been conclusive or consistent and. thus meta-analysis has been at the heart of the literature in parapsychology. The author not only reviews past debate on .how to interpret repeated psi studies, but also provides very detailed informa- tion on the Honorton-Hyman argument, a nice illustration of the challenges of resolving such de- , Approved For Release 2003/04/18 : CIA-RDP96-00789R00270001Q001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 effects for this data (this result is reported in Sec- tion 5). For the remaining 10 series, the chi-square value X9 = 7.01 strongly favors homogeneity, al- though more than one-third of its value is due to the novice series (number 4 in Table 1). This pat- tern points to the potential usefulness of a richer model to accommodate series that may be distinct from the others. For the earlier ganzfeld data ana- lyzed by Honorton (1985b), the appeal of a Bayes or other model that recognizes the heterogeneity across studies is clear cut: X? = 56.6, p = 0.0001, where only those studies with common chance hit rate have been included (see Table 2). Historic reliance on voting-count approaches to determine the presence of psi effects makes it natu- ral to consider Bayes models that focus on the ensemble of experimental effects from parapsycho- logical studies, rather than individual estimates. Recent work in parapsychology that compares ef- fect sizes across studies, rather than estimating separate study effects, reinforces the need to exam- ine this type of model. Louis (1984) develops Bayes and empirical Bayes methods for problems that consider the ensemble of parameter values to be the primary goal, for example, multiple compar- isons. For the simple compound normal model, Y; -- N(6i, 1), B; - N(K, r2), the standard Bayes estimates (posterior means) bate. This debate is also a good example of how statistical criticism can be part of the scientific process and lead to better experiments and, in gen- eral, better science. The remainder of the paper addresses technical issues of meta-analysis, drawing upon recent re- search in parapsychology for an in-depth applica. tion. Through a series of examples, the author presents a convincing argument that power issues cannot be overlooked in successive replications and that comparison of effect sizes provides a richer alternative to the dichotomous measure inherent in the use of p-values. This is particularly relevant when the potential effect. size is small and re- sources are limited, as seems to be the case for psi studies. The concluding section briefly mentions Bayesian techniques. As noted by the author, Bayes (or em- pirical Bayes) methodology seems to make sense for research in parapsychology. This discussion exam- ines possible Bayesian approaches to meta-analysis in this field. BAYES MODELS FOR PARAPSYCHOLOGY The notion of repeatability maps well into the Bayesian set-up in which experiments, viewed as a random sample from some superpopulation of ex- periments, are assumed to be exchangeable. When subjects can also be viewed as an approximately random sample from some population, it is appro- priate to pool them across experiments. Otherwise, analyses that partially pool information according. to experimental heterogeneity need to be consid- ered. Empirical and hierarchical Bayes methods offer a flexible modeling framework for such analy- ses, relying on empirical or subjective sources to determine the degree of pooling. These richer meth- ods can be particularly useful to meta-analysis of experiments in parapsychology conducted under potentially diverse conditions. For the recent ganzfeld series, assuming them to be independent binomially distributed as dis- cussed in Section 5, the data can be summed (pooled) across series to estimate a common hit rate. Honorton et al. (1990) assessed the homogene- ity of effects across the 11 series using a chi-square test that compares individual effect sizes to the weighted mean effect. The chi-square statistic X o = 16.25, not statistically significant (p = 0.093), largely reflects the contribution of the last "special" series (contributes 9.2 units to the Xio value), and to a lesser extent the novice series with a negative effect (contributes 2.5 units). The outlier series can be dropped from the analysis to provide a more conservative estimate of the presence of psi 2 8*=t +D(Y;-?) and D= 1+T2 where the 8t represent experimental effects of in- terest, are modified approximately to 0+VD__ (Y,?-L) when an ensemble loss function is assumed. The new estimates adjust the shrinkage factor D so that their sample mean and variance match the posterior expectation and variance of the 6's. Simi- lar results are obtained when the model is gener- -TAat.E 1 Recent gansfeld series CPYRGHT Pilot 22 0.36 -0.58 0.44 Pilot 9 0.33 -0.71 0.71 Pilot 36 0.28 -0.94 0.37 Novice 50 0.24 -1.15 0.33 Novice 50 0.36 -0.58 0.30 Novice 50 0.30 -0.85 0.31 Novice 50 0.36 -0.58 0.30 Novice 6 0.67 0.71 0.87 Experienced 7 0.43 -0.28 0.76 Experienced 50 0.30 -0.85 0.31 Experienced 25 0.64 0.58 0.42 Overall 355 0.34 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT 384 TABLE 2 Earlier ganzfeld studies 32 .. 0.44 -0.24 0.36 7 0.86 1.82 1.09 30 0.43 -0.28 0:37 30 0.23- -1.21'" 0.43 20 0.10 -2.20 0.75 10 0.90 2.20 ' 1:05 10 0.40 -0.41 0.65 28 0.29 -0.60 0.42 10 0.40 -0.41 0.65 20 0.35" -0.62 ' 0.47 26 0.31 -0.80 ?0.42 20 0.45 -0.20 0.45 20 0.45 -0.20 0.45 30 0.53 0.12 0.37 36 0.33 -0.71' 0.35 32 0.28 -0.94 0.39 40 0.28 -0.94 0.35 26 0.46 -0.16 0.39 20 .0.60. 0.41 0.46 100 0.41 -0.36 0.20 40 " 0.33 -0.71 0.34 27 0.41 -0.36 0.39 60 0.45 -0.20 0.26 48 0.21 -1.33 0.35 722 alized to "the case of unequal variances, Y; - N(0j, v,2). For the above model, the fraction of Of above (or below) a cut point' C is a consistent estimate of the fraction of . 0j > C (or 0, < C). Thus, the use. of ensemble.' rather than ..component-wise, loss . can help detect when individual effects are above a specified threshold by ' chance. For the meta- analysis of ganzfeld experiments, the observed bi- nomial proportions transformed on the logit (or aresin./). scale can be modeled in this framework., Letting di and m', denote the number of direct hits and misses respectively for the ith experiment, and p, as the corresponding population proportion of direct hits, the Y, are the observed logits Yj = log(d,/m,) and o 2, estimated by maximum likelihood as 1/d, + 1/rrtj, is the variance of Y, conditional on 0 j = logit(p,). The threshold logit (0.25) = 1.10 can be used to identify the number of experiments for which the proportion of direct hits exceeds that expected by chance. Table 1 shows Y, and a, for the 11 ganzfeld series. All but one of the series are well above the threshold; Y4 marginally falls below -1.10. Any shrinkage toward a common hit rate will lead to an estimate, 04 or 04', above the threshold. The use of ensemble loss (with its consistency property) pro- vides more convincing support that all 8; > - 1.10, although posterior estimates of uncertainty are needed to fully calibrate this. For the earlier ganzfeld data in Table 2, ensemble loss can simi larly be. used to determine the number of studies, with 0, < -1.10 and specifically Whether the nega-? tive? effects . of. studies 4 and, 24;., (Y4 = -1.21 and Y24 -1.33) occurred as a result of chance fluctuation.. . .Features of; the ganzfeld data in Section 5, such, as the.outlier series, suggest that further elabora.. tion of the basic Bayesian set-up.may be necessary, for some meta-analyses in parapsychology. Hierar- chical models :provide .a natural? framework to spec- ify these. elaborations and explore how. results change with the prior specification. This type. of. sensitivity analysis can expose whether conclusions are closely tied to prior beliefs, as observed by Jeffreys for RNG data (see Section 7). Quantifying the. influence ; of model components deemed to be more subjective or less certain is important to broad.. acceptance of results as evidence of psi performance (or lack thereof). Consider the initial model commonly used for Bayesian analysis of discrete data: ' ' Yj I p,, n, - B(pj. n3, ??W 0, - N(.?, r2), 0,?.. iogit(p;); with noninformative priors assumed for ? and r2 (e.g., 'log r locally uniform). The distinctiveness of the last "special' series and, in general, the differ- ent types of series (pilot versus formal, novice ver- sus experienced) raises the question of whether the experimental effects follow a normal distribution. plots (Ryan Q'Dempster, 1984). Weighted normal lots can be used to graphically diagnose the adequacy of second-stage normality (see Dempster, Selwyn and Weeks, 1983, for examples with binary response and normal superpopulation). Alternatively, if nonnormality is suspected, the model can be revised to include some sort of heavy- tailed prior to accommodate possibly outlying se- ries or studies. West (1985) incorporates additional scale parameters, one for each component of the model (experiment), that flexibly adapt to a typi- cal 0, and discount their influence on posterior estimates, thus avoiding under- or over-shrinkage due to such 0,. For example, the second. stage can specify the prior as a scale mixture of normals: 0, - N(K, r27, 1), k7r - X,I. vr'z-X2 This approach for the prior is similar to others for to Approved For'.;Rel base 2003/04/18 : CIA-RDP96-00789R00270001 Approved For Release 2003/04/18 : CIA-RDP96-00789R002700010001-1 CPYRGHT maximum likelihood estimation that modify the sampling error distribution to yield estimates that are "robust" against outlying observations. Like its maximum likelihood counterparts, in ad- dition to the robust effect estimates 8,*, the Bayes model provides (posterior) scale estimates These can be interpreted as the weight given to the data for each 01 in the analysis and are useful to diag- nosing which model . components (series or studies) are unusual and how they influence the shrinkage. When more complex groupings among the 9, are suspected, for example, bimodal distribution of studies from different sites or experimenters, other mixture. specifications can .be used to. further relax the shrinkage toward a common value. For the 11 ganzfeld series, the last "outlier" series, quite. distinct from the others (hit rate = 0.64), is moderately precise (N = 25). Omitting it from the analysis causes the overall hit rate to drop from 0.344 to 0.321. The scale mixture model is .a compromise between these two values (on the logit scale), discounting the influence of series 11 on the estimated posterior common hit rate used for shrinkage. The scale factor 7i1, an indication of how separate Q 1 is from the other parameters, also causes 911 to be shrunk less toward. the common hit rate than other, more homogeneous 0,, giving more weight to individual information for that series (see West, 1985). The heterogeneity of the earlier ganzfeld data is more pronounced, and studies are taken from a variety of sources over time. For these data, the -y. can be used to explore ,atypical studies (e.g., study 6, with. hit rate-_= 0.90, contributes more than 25% to the X23 value . for homogeneity) and groupings. among effects, as well as protect the analysis from misspecification of second-stage normality. Variation among ganzfeld series or studies and the degree to which pooling or shrinking is appro- priate can be investigated further by considering a range of priors for r2. If the marginal likelihood of r2 dominates the prior specification, then results should not vary as the prior for r2 is varied. Other- wise, it is important to identify the degree to which subjective information about interexperimental variability influences the conclusions. This sen- sitivity analysis is a Bayesian enrichment of the simpler test of homogeneity directed toward determining whether or not complete pooling is appropriate. To assess how well heterogeneity among his- torical control groups is determined by the data. Dempster, Selwyn and Weeks (1983) propose three priors for r2 in the logistic-normal model. The prior distributions range from strongly favoring individ- ual estimates, p(r2)dr