Chapter Twelve

The “File-Drawer” Problem and Calculation of Effect Size

The file-drawer problem appears to have two causes: the reluctance of researchers to report their null results and the reluctance of professional journal editors to include studies whose results fail to reach statistical significance. Such studies remain in the “file-drawers” of the researchers. How much would these inaccessible studies affect the results of our meta-analysis? The answer seems to be not much.[1]

See file-drawer: Effect Size - Difference Between Two Means [PDF 72.3KB*]

Effect size is usually defined as the difference between the means of two groups divided by the standard deviation of the control group.[2] 

Effect sizes calculated in this way estimate the difference between two group means measured in control group standard deviations as seen in the figure above. Glass et al. suggest that the choice of the denominator is critical and that choices other than the control group standard deviation are defensible.[3] However, they endorse the standard choice of using the control group standard deviation.

Alternatively, Hedges and Olkin show that, for every effect size, both the bias and variance of its estimate are smaller when standard deviation is obtained by pooling the sample variance of two groups instead of using the control group standard deviation by itself.[4] An effect size based on a pooled standard deviation estimates the difference between two group means measured in standard deviations estimated for the full population from which both

experimental and control groups are drawn: ,[5] where

S is the pooled standard deviation:[6]

Most commentators suggest that effect sizes can be treated as descriptive statistics and entered into standard tests for statistical significance. Hedges and Olkin have shown that the error variance around estimates of effect size is inversely proportional to the sample size of the studies from which the effect sizes are drawn. If the effect size in any review is drawn from studies employing widely different sample sizes, then the heterogeneity of variance among effect sizes prohibits their use in conventional t-tests, analyses of variance, and other inferential tests. This is the case in most of these reviews; therefore, effect sizes reported in this study are treated only with descriptive statistics.

The effect sizes for computer-based training range from 0.20 to 0.46 depending on the population.[7] The effect size for distance instruction (television) is 0.15 and for interactive videodiscs, the effect sizes range from 0.17 to 0.66 depending on the population.[8] The effect size for flight simulation is 0.54 and the effect size for tutorials range from 0.25 to 0.41 depending on the presentation of the tutorial material.[9]

See Graph: Instructional Technology Effectiveness [PDF 89.8KB*]

Although the effect sizes for instructional technology range from 0.15 to 0.66 standard deviations, they all report favorable findings when compared to conventional instruction. There are many possible explanations for the differences in instructional technology effectiveness; it might be the result of population differences, system differences, interactivity or individualization. From a purely utilitarian point of view, the reason may not be all that important. If, at the very least, using instructional technology forces the producer to rethink the content of the course to match the delivery system, then revisiting the pedagogy may be enough to produce the positive effect sizes. Whatever reason for the changes in effectiveness, the use of instructional technology saves instructional time, overhead costs, and results in a higher level of achievement for the students in a variety of domains.



[1]Gene Glass, and Barry McGaw, “Choice of the Metric for Effect Size in Meta-Analysis”; Larry Hedges, “Estimation of Effect Size from a Series of Independent Experiments”; Larry Hedges and Ingram Olkin, “Vote-Counting Methods in Research Synthesis.”

[2]Glass's Effect Size,Experimental Mean,Control Mean,Control Standard Deviation

[3] Gene Glass, Barry McGaw and Mary Lee Smith, Meta-Analysis in Social Research.

[4] Larry Hedges and Ingram Olkin, Statistical Methods for Meta-Analysis.

[5] g=Hedge's Effect Size, S=Hedge's Pooled Standard Deviation

[6]Ne=Number of experimental subjects, Nc=Number of control subjects, Se=Standard deviation of experimental group, Sc=Standard deviation of control group

[7]The abbreviations in figure one: CBT=Computer Based Training, DI=Distance Instruction, IVD=Interactive Video Disc, SIM=Simulation. More than 300 research studies were used to develop these effect sizes, see Chen-Lin Kulik., James Kulik and Barbara Shwalb, “Effectiveness of Computer-Based Adult Education: A Meta-Analysis”; Chen-Lin Kulik and James Kulik, “Effectiveness of Computer-Based Education in Colleges”; Rob Johnston and J. Dexter Fletcher, A Meta-Analysis of the Effectiveness of Computer-Based Training for Military Instruction.

[8]Godwin Chu and Wilbur Schramm, Learning from Television; J. Dexter Fletcher Effectiveness and Cost of Interactive Videodisc Instruction in Defense Training and Education; J. Dexter Fletcher, “Computer-Based Instruction: Costs and Effectiveness.”

[9]R. T. Hays, J. W. Jacobs, C. Prince and E. Salas, “Flight Simulator Training Effectiveness: A Meta-Analysis”; Peter Cohen, James Kulik and Chen-Lin Kulik, “Educational Outcomes of Tutoring.”

* Adobe® Reader® is needed to view Adobe PDF files. If you don't already have Adobe Reader installed, you may download the current version at (opens in a new window). [external link disclaimer]

Historical Document
Posted: Mar 16, 2007 08:49 AM
Last Updated: Jun 28, 2008 01:02 PM