A SURVEY OF FREE-RESPONSE JUDGING PRACTICES

Document Type: 
Collection: 
Document Number (FOIA) /ESDN (CREST): 
CIA-RDP96-00792R000701020002-6
Release Decision: 
RIFPUB
Original Classification: 
U
Document Page Count: 
17
Document Creation Date: 
November 4, 2016
Document Release Date: 
May 17, 2000
Sequence Number: 
2
Case Number: 
Content Type: 
RP
File: 
AttachmentSize
PDF icon CIA-RDP96-00792R000701020002-6.pdf1.43 MB
Body: 
Approved For Release 2000/08/15: CIA-RDP96-00792R000701020002-6 A SURVEY OF FREE-RESPONSE JUDGING PRACTICES Julie Milton Psychology Department University of Edinburgh 7 George Square Edinburgh EH8 9JZ Scotland, U.K. An idealised model of the free-response judging process is developed, and its elements discussed in terms of judging practices in those free-response studies published in full between 1964 and 1985. A wide variety of occasionally conflicting judging practices was found, along with valuable indications for further research in this important area. AMMEDGEMENTS: My thanks are due to Nancy Zingrone and Deborah Weiner for allowing me to use a draft version of their free-response bibliography. 20002-6 Approved For Release 2000/08/15: CIA-RDP96-00792R000701020002-6 While free-response methodology has been popular in ESP studies over recent years, very little research has been directed to the important question of how best to judgethe correspondence between free-response material and the target. However, many experimenters have commented on judging issues, or have reported relevant analyses or data which, when brought together, may suggest strengths and weaknesses in our judging practices, and promising directions for future research. With these aims in mind, I have examined various aspects of procedure which might influence the success of judging, using as a database eighty-five free-response studies in which statistical assessment of the results was attempted and which were published in full between 1964 and 1987 inclusive, in the Journal of Parapsychology, Journal of the American Society for Psychical Research, Journal of the Society for Psychical Research, International Journal of Parapsychology, and European Journal of Parapsychology. Space constraints prevent me from presenting a summary table of these studies and their full references, but these can be obtained from me on request. All of the papers in these journals (whether experimental or not), and those appearing in Research in Parapsychology during the same period, were searched for commentary relevant to free-response judging, as well as other sources where appropriate. The survey is in two sections. In the first section, a model of an ideal judging process is presented, and its elements discussed in terms of their importance in current judging practices. The second section addresses the issues of. whether percipients or independent judges are best suited to perform the complex judging task, and what qualities a judge should have. Finally, the findings of the review are discussed with their implications for further research. The underlying structure of the judging process In a free-response ESP experiment, the percipient's task is to observe and report his or her thoughts, imagery, feelings and mental or physical experiences, which might relate to a randomly selected target. In free-response studies, the targets used are generally fairly complex (they may be people, or geographical locations, objects, and so on). The targets may have elements (such as colour, the presence or absence of people) which differ in their salience for the percipient, and in their frequency of occurrence. In addition, targets may be regarded as possessing various broad categories of content (such as semantic content, or emotional content), each of which broad categories may differ in their salience. The salience of both individual elements and categories of content may differ from one percipient to another, depending on individual differences. Just as free-response targets are complex and varied, so too are the mentations reported by percipients. Mentations may be in the form of imagery in any sense.modality, or merely abstract concepts; the may be vivid, bizarre, fleeting, spontaneous, or have other distinguishing characteristics. Content of various kinds may be present in them, with varying chance frequencies of occurrence. Mentation items may relate in a variety of ways to the target material, such as semantically or by association, and to a greater or lesser degree. The type of correspondence may vary from percipient to percipient, or from mentation to mentation, or both. Certain types of mentation, and certain kinds of target-mentation correspondence may be more likely to carry psi information than others. The function of a free-response judge (in process-oriented research at Approved For Release 2000/08/15: Ci! fRDP96-00792R000701020002-6 leastf pp wedeFaz Re ea&& 20GQ/ 5FSEIA[- Dg6 92bROOOft1 b2' 2 e probability that psi was responsible for any resemblance between the target and the mentation (or inversely, the strength of the ESP component on a given trial). In the complex situation described above, one way of looking at the task of an ideal judge is that he or she should: (i) Assign some numerical value in proportion to the degree of correspondence between a single mentation item and the target (and, in some types of judging, to the controls); (ii) increase this value (given a perfect match) in accordance with the rarity of occurrence of the mentation item's content in the mentation of all percipients in similar experimental. conditions (or in the mentation of that particular percipient on other trials, if such data is available); (iii) increase this value (given a perfect match) in accordance with the rarity of occurrence of the mentation item's content in the entire experimental target pool; (iv) increase this value in accordance with the likelihood that the mentation item, by virtue of its characteristics, is psi-related (e.g., whether it was bizarre, vivid, spontaneous, or whatever characteristics, if any, are shown by research to mediate ESP) (v) increase this value in accordance with the salience which the content of the mentation has for the percipient (e.g. if research shows that the presence of people in a target is highly salient to a percipient, then a mentation item bearing on the presence or absence of people would be weighted relatively heavily); (vi) increase this value in accordance with the likelihood that the type of correspondence (semantic, emotional, etc.) between mentation item and target carries psi-related information, if such differences in likelihood are indicated by research. Having thus arrived at a weighted measure of the correspondence between each mentation item in a trial and the target (and controls if appropriate), the measures may be summed across the trial or otherwise combined to yield the ESP score for that trial. Although this procedure resembles an atomistic judging procedure most closely in its structure, it can also be thought of as an implicit or idealised basis for holistic or coded judging procedures. In holistic judging, it is possible to think of the overall rating assigned to items in the judging set as a sum of individual mentation ratings weighted as appropriate. In coded judging, the decision of whether a given content category was present or absent could be regarded as being made according to the sum of weighted ratings of relevant mentation items. Further weightings could then be assigned to each decision according to the known salience of the content category and the rarity of that value of the code in the target pool. The importance of elements of judging in the literature Each of the six elements of judging in various forms has received occasional. attention either implicitly or .overtly in experimental and theoretical papers, although very little direct or systematic research has been done on this topic. Most opinion about how best to judge free-response material seems to be based on anecdotal observations. While such observations may be unreliable, they may also contain useful information about aspects of judging which should be investigated empirically. This being so, each of the six elements of judging is discussed in turn below in the context of commentary and experimental results in the literature surveyed. Approved For Release 2000/08/15: CIA-RDP96-00792R000701020002-6 (i) Assignment of a numerical value to correspondence Ideally, the value assigned to the correspondence between a mentation item and a target should reflect the correspondence in some objective (and hence reliable) way. 16 studies reported in 10 papers in the database surveyed, used atomistic judging, but in no case was interjudge reliability calculated for the allocation of such ratings. In eight of the studies, each point on the rating scale was labelled for the use of the judges (e.g., 0 _ "no correspondence"), which practice might be expected to increase interjudge reliability. The number of points on the rating scale ranged from two to eleven, with a mean of 4.2, and it is possible that the scales at the low end of the range may be too constrained to be sensitive, while those at the higher end require judges to make more fine judgements than is appropriate, and so may be insensitive in effect because they increase error variance. In this latter case of large rating range, interjudge reliability may be reduced. The same may be true of holistic rating scales, which ranged from 4 points to 101, and which were clearly reported as being labelled in only 14 out of the 52 studies in which a holistic scale was used. The number of items in the judging set may be a factor in determining the appropriate rating scale; in the studies surveyed, set size ranged from 2 to 36 items. Any future research which addresses the issue of the appropriate rating scale in this task could most usefully do so in the context of active training of judges, with feedback, in the use of such scales. Boerenkamp (1984) had considerable success in training eight independent judges to rate each statement made by a "psychic" about a missing person on a fully-labelled four-point scale of likelihood that it would apply to anyone in the population. To test the reliability of the judges' ratings, the judges were randomly assigned to two groups of four, and the average ratings of each statements were correlated, yielding correlations ranging from r8 = +0.66 (36df, p