Inference and Attribution Errors in Test Interpretation

Terence J. Tracey and James Rounds

University of Illinois at Urbana-Champaign

 

In R. K. Goodyear & J. W. Lichtenberg, (Eds.), (1999)

Test interpretation:  Integrating science and practice

Boston:  Allyn & Bacon

 

 

 

The fool doth think he is wise, but the wise man knows himself to be a fool (Shakespeare, As You Like It, Act V, scene i, line 35)

 

Thou speakest wise than thou art ware of (Shakespeare, As You Like It, Act II, scene iv, line 57)

 

The present chapter reviews the research on how information processing relates to the inference errors commonly made by those who must interpret assessment information.  Assessments (which are viewed as being composed of both psychological tests results and other information obtained from the client and other sources) provide the grist for the interpreter's mill.  The interpreter must sort the information and decide overall conclusions and implications.  As has been demonstrated in a variety of professional arenas, including psychology, humans are not particularly skilled at combining various pieces of information.  By familiarizing the practitioner with the common biases in information processing, it is hoped that the effects of these biases will be minimized in test interpretation and decision-making.  Much is known about how to develop and evaluate psychological tests.  We know less about how to use the information generated by tests..

 

As mental health workers and clinicians, we have the unenviable task of attempting to establish methods to enable the reliable and valid classification and understanding of individuals.  Clinicians are called upon, and present themselves, to help make decisions on a wide variety of issues, such as prediction of who will benefit from treatment, what treatment to select, recidivism, psychodiagnosis, child custody, who will become violent, admissions into programs, and hiring selection.

 

Our information typically is extremely subjective and fuzzy, consisting of impressions of how individuals present themselves to us or their reporting of background experiences.  Each new client or individual presents us with a wealth of information of varying quality.  We typically try to evaluate this wealth of fuzzy information and place people into very fuzzy sets (Cantor, Smith, French, & Mezzich, 1980).  To what information should we attend in our decision-making and how should we use this information?  Clearly as humans we cannot attend to all that is being presented.  Hopefully, through our training we have been shown what information is most salient or worthy of attention in our deliberations.  We need to reduce or simplify the information that we are presented with.  Then after shrinking our perceptual field to a manageable size, we need to examine the information and make some decisions about the individual (e.g., is this person appropriate for this treatment, what is the prognosis or diagnosis).  In this simplified view, clinical decision making involves two different steps, one of information restriction and selection, and one of information aggregation or processing.  As has been demonstrated in clinicians (Dawes, 1994; Wierzbicki, 1993) and humans (Gilovich, 1991) in general, we are not especially skilled at either of these steps with respect to the judgments of other people.

 

Information restriction:  We see what we expect or wish to see

Although there are a variety of manners in which humans restrict the information that is attend to, I will focus on three specific forms that affect us as human information processors:  (a) our tendency to see patterns where none exist, (b) our tendency to seek confirmatory evidence, and (c) our usage of preconceived biases.  We, as counselors, are prone to these errors of information restriction.

 

Gilovich (1991) has summarized a wealth of research on humans perceptions of relations and causes in everyday life and how we are very prone to impute order to ambiguous information.  We strive for predictability in our world.  He cites several examples of the clustering illusion to support this claim.  When presented with random sequences (either points on a map or sequences of shots made in a basketball game by different players), we tend to focus on clusters of points and infer a relation even though the actual process is random.  A common example of this inference of cause is the gambler's fallacy where the probability of a particular event is over evaluated given independent prior events (e.g., the presence of heads when the previous 4 coin tosses resulted in tails).  Gilovich has argued that humans are predisposed to look for and see order in the relations among events and that this tendency has served the species well in an evolutionary context, but that it is not without flaws.  Whether or not there is an evolutionary basis in this human tendency to impute order, it is clearly present in our thinking about the world and the behavior of other people. 

 

Besides imputing order where none exists, humans are prone to self-confirmation in cases where equivocal information exists (Granberg & Brent, 1983; Sears & Whitley, 1973).  We believe what we wish to believe.  A partial reason for this tendency to believe what we wish is the human tendency to search for confirmatory information.  There has been a wealth of research demonstrating the human tendency to search out and attend only to evidence that confirms our ideas, beliefs, or hypotheses (Greenwald, Pratkanis, Leippe, & Baumgardner, 1986; Nisbett & Ross, 1980; Snyder, & Campbell, 1980).  The problem with the confirmatory tendency is that only information supportive of ones beliefs is thus attended to, even in the face of extremely disconfirming information.  Information that could provide corrective feedback that our beliefs are in error is rarely attended to.  This process of seeking confirmation not only applies to daily information but also our memory searches.  Snyder and Cantor (1979) found that we recalled memories that supported our beliefs while not remembering others which disconfirmed our beliefs.  This process of searching for confirmation can thus lead to some very inaccurate conclusions and, as will be discussed below, may lead to an increased confidence in one's conclusions because in searching for information of the correctness of one's belief, only confirming information is focused on and thus it is easy to gain an inaccurate assurance that the belief is warranted.

 

This human process of searching for confirmatory information has also been documented among clinicians (Haverkamp, 1993; Strohmer, Shivy, & Chiodo, 1990).  Although presumably clinicians have been trained to examine all information in making clinical decisions, clinicians are just as prone to search out and attend to only evidence that confirms initial clinical impressions.

 

Related to the usage of a confirmatory hypothesis testing strategy, is the tendency to reify our preconceptions.  The literature has many examples of how clinicians make errors in their clinical decision-making that are related to beliefs regarding specific cues.  There is a tendency to overpathologize clients (Langer & Abelson, 1974; Rosenhan, 1973; Temerlin, 1968).  If the situation is one associated with pathology (e.g., the individual seeing you is a client admitting distress), counselors are prone to search for information indicative of pathology and then interpret this information as indicative of more pathology that may exist.  This tendency to overpathologize is especially prone when the cues of social class (Abromowitz & Dokecki, 1977), race (Abromowitz & Murray, 1983; Lopez, 1989), and sex (Abromowitz & Dokecki, 1977; Broverman, et al., 1970; Davidson & Abromowitz, 1980; Whitley, 1979; Zedlow, 1978) are present.  Clinicians tend to view the poor, members of lower social classes, people of color, and females as having greater pathology.

 

Clinicians need to be aware of these limitations in the information attended to because inaccurate decisions regarding clients can easily be made.  It is difficult for clinicians to claim that they are attending to the appropriate information and if it is attended to, that it is evaluated in a manner that is free from inaccurate preconceived biases.  These biases in information focused on could be related to the poor predictions yielded from interviews.  The literature on the lack of predictive validity of interviewing is quite strong in a wide variety of arenas (Bloom & Brundage, 1947; Carrol et al, 1988; DeVaul et al, 1957; Milstein et al, 1981) as well as in clinical practice (Oskamp, 1965).  As Meehl (1973) has noted, clinicians are not especially skilled at selecting the best information to which to attend.  The relative validity of information is often overlooked.  Usage of standardized assessments devices helps obviate some of the issues related to information restriction (Dahlstrom, 1993).  However usage of high quality information provided in psychological tests alone is not enough to remedy the ills involved in our information processing.  Given quality information, the next task is to combine the information and to make decisions.  

 

Clinical versus Actuarial Prediction

Years ago, Meehl (1954) penned what he calls his "disturbing little book" comparing clinical versus actuarial prediction.  In his review of the existing literature, he documented the relative superiority of actuarial methods (i.e., those methods using population base rates and/or regression techniques) over clinical methods in clinical decision making (diagnosis, treatment application, and prognostication).  So even with quality information from psychological tests, clinicians were still somewhat lacking with respect to clinical prediction.  Meehl, however, held out hope for the clinical decision making with respect to situations that require unique combinations of data and the formulation of original hypotheses (Meehl, 1954, 1957, 1959).  Meehl (1986) subsequently has retracted much of these hopes for the clinician.  Extensive review of the literature (Dawes, Faust, & Meehl, 1989; Sawyer, 1966; Wiggins, 1981) has demonstrated that time and time again, actuarial methods of clinical prediction surpass clinical methods.  In almost all cases, optimal weighting by using regression methods results in superior prediction than clinicians' judgments.

 

Early critics (Holt, 1958; Zubin, 1956) of these conclusions noted that the research failed to consider (a) the lack of consideration given to experience level of clinicians (expert clinicians may be much better in predicting than general clinicians and certainly graduate student clinicians), (b) failure to take account of clinician confidence (it may be that clinicians may be more confident in some predications than others), (c) the situation is too artificial for the results to apply to clinical assessment in the real world, and (d) the lack of generalization of regression weights obtained in one study (the optimal weighting for one sample may not at all match that for another sample).  Subsequent research has focused on addressing these concerns with the literature.

 

In reviewing a wealth of studies, Faust (1991) and Dawes, Faust, and Meehl (1989) concluded that expertise has little effect on the results.  In conditions where clinicians can choose information and collect it in their preferred manner, expert clinicians still performed no better and typically worse than actuarial methods.  Clearly expert clinicians did at times outperform straight statistical models, however there was little consistency to this.  Expert clinicians evidenced little agreement among themselves in the predictions made and even for individual expert clinicians, there was little consistency in prediction accuracy from one case to the next.  In some contexts, a specific expert did an excellent job and in others the expert did poorly.  It should be noted that these results applied in areas that the experts viewed themselves as skilled.  In addition, even simple summing of predictors (as opposed to using more sophisticated regression equations) outperformed the expert clinicians (Dawes, 1979).  If optimal weighting (i.e., regressions) of information was not adopted, but a strategy of simply adding up the information was used, this procedure still yielded predictions unmatched in accuracy by the expert clinicians.

 

Similar results were obtained when clinician confidence in prediction was taken into account.  Indeed many studies have found that expert clinicians often are no more accurate than less expert clinicians but that they have greater confidence in their predictions (Friedlander & Phillips, 1984; Friedlander & Stockton, 1983; Goldberg, 1959, 1968; Oskamp, 1962, 1965; Rock, et al., 1987).  Goldberg (1986) concluded that in general there is no relation between confidence in the accuracy of one's prediction and its actual accuracy.  Confidence is related to accuracy only when the assessment is based on validated procedures.

 

Few areas of research in the field of psychology have yielded results as unequivocal as these (Meehl, 1986).  However, this body of literature has been viewed as having "almost zilch" (Dawes, 1988) impact on the practitioner and the field.  We as a profession continue to disregard the importance of this literature and its implications to practice.  It is not clear whether this eschewal of this actuarial versus clinical prediction debate rests on our professional inertia, our grandiosity and irrational overestimation of our abilities as clinicians, or general wishing it weren't so.  Regardless, we cannot continue to ignore this literature from either a pragmatic or ethical position.  Even Holt, who has continual criticized this literature (Holt, 1958, 1970, 1978, 1991) has noted that "maybe there are still lots of clinicians who believe that they can predict anything better than a suitably programmed computer; if so, I agree that it is not only foolish but at times unethical of them to do so (Holt, 1986, p. 378).

 

Attention needs to be placed on understanding how and why we as clinicians do not do as well when making predictions.  Two possible reasons may account for our relative inability to generate valid clinical predictions:  the specific nature of the clinical endeavor, and basic problems associated with the fallible process of human information processing.

 

Uniqueness of Clinical Practice

Einhorn (1986) proposed (and Dawes, 1991, later elaborated on) that perhaps a major reason for the clinician's relative poor predictive capabilities is the different focus involved.  Actuarial methods focus on general trends and tend to view behavior as probabilistic.  In every specific prediction there is a certain amount of error built in, but over all cases attempts are made to minimize error.  This approach is very different from the approach of clinicians who adopt a more deterministic model, wanting to know in individual cases what the causes are and minimize error in each individual case.  If only for our own certainty in dealing with individual cases, we tend to adopt an approach that focuses on allowing no error.  The clinician desires accuracy on every occasion, presumably because of the focus on individuals and perception of potential errors appears more salient in individual cases.  Inaccurate predictions may thus be viewed as too costly to the clinician.  A minimization of high risk strategy is the one adopted; to reduce error in the individual case a very conservative approach is adopted.  However, the acceptance of individual case error in the probabilistic models of the actuarial method paradoxically results in less overall error.  Thus the clinicians' focus on individual prediction may actually hurt the ability to predict individual behavior.

 

Clinicians are not alone in their relative inability to outperform actuarial prediction.  Identical results have been yielded in a variety of professional domains such as: medical diagnosis (Einhorn, 1979), predicting bank failures (Libby, 1976), stock market fluctuations (Johnson, 1988), internship matching (Johnson, 1988), and predicting graduate student success (Dawes, 1979).  The clinical versus actuarial prediction results may thus be attributable to the different foci involved in each model, but more probably they are related to the information processing capabilities of human decision makers.

 

Heuristics in Clinical Judgment

Tversky and Kahneman (1974) presented three heuristics that humans use to aid decision-making: representativeness, availability, and anchoring.  These heuristics serve to simplify decision-making, making it more efficient, however, they also can result in inaccurate decisions.  The application of these heuristics have been summarized widely as they pertain to general human decision-making (Dawes, 1988; Tversky & Kahneman, 1981) as well as clinical decision-making (Dawes, 1994; Dumont, 1991, 1993; Dumont & Lecomte, 1987; Salovey & Turk, 1991; Tracey, 1991; Turk & Salovey, 1985; Wierzbicki, 1993).

 

Representativeness refers to the extent to which something matches relevant categories.  A frequent example in clinical practice is the comparison of a specific client with diagnostic categories.  The clinician observes client behavior and then assesses the extent to which the behavior fits the different diagnostic types.  The question asked is one of "Is this behavior representative of this diagnostic type?"  If the behavior is seen as similar to the diagnosis, a conclusion of diagnosing the client as falling in that diagnostic group is typically made.  The problem with this decision-making heuristic is that often other relevant information is ignored.  Three common problems with representativeness are insensitivity to (a) prior probabilities, (b) sample size, and (c) predictability.

 

Insensitivity to prior probabilities refers to the common failure to take account of base-rates in assessing representativeness.  An example of insensitivity to base-rates is the number of diagnoses of multiple personalities made by some clinicians.  The number of individuals with this diagnosis is extremely rare in the population, yet one of the authors has known some clinicians who have claimed upwards of 10 such clients in their caseload.  Besides the obvious exaggeration of symptoms necessary to make such a diagnosis, this diagnosis ignores the very rare probability of occurrence.  The clinician sensitive to base-rates would very closely scrutinize any such low base-rate diagnosis.

 

Insensitivity to sample size refers to the frequent equating of information generated from large and small samples.  Obviously comparing an individual instance to a category generated from a large sample is superior to comparing it to a category generated from a small sample, but this is frequently ignored.  Humans manifest this insensitivity in two ways:  by over generalizing from our own limited experience and by over generalizing from limited observation.  We build our clinical experience from a small sample of the individual clients we personally have interviewed yet we frequently err in valuing our own sample as much as some larger sample.  For example, one of the authors had two clients early in his training who were diagnosed as compulsive and each of these clients had violent episodes.  Upon seeing a third client who was similar but not violent, this author discounted the diagnosis of compulsive (even though it was very appropriate given assessment scales and DSM criteria) because the client did not resemble one part of the idiosyncratic category of compulsive (he was not violent).  The favoring of the limited personal sample while ignoring the information generated from larger samples (DSM and assessment scores) demonstrates this insensitivity to sample size.  The other manifestation of this insensitivity is over generalizing from a limited sample.  We as clinicians frequently make diagnoses from very limited bits of information and may be prone to ignore input from other sources that have much more information developed over a longer time.

 

Insensitivity to predictability is similar to insensitivity to base-rate in that no account is taken of the probability of events.  Where insensitivity to prior probabilities refers to our ignoring base-rate information, insensitivity to predictability refers to ignoring the differential probabilities of the future behavior.  Some behaviors and events are much more likely to occur than others.  Insensitivity to predictability refers to the common pattern of viewing all predictions as equally likely, or underestimating the relative differences in predictability.  Predicting a highly probable event (e.g., what the client will be doing tomorrow) is viewed as equal in predictability as an improbable event (e.g., what the client will be doing next year).  If one is able to predict tomorrow well, it is common (and inaccurate) to conclude that one can predict next year as well.

 

A related concept to insensitivity of predictability is the common misunderstanding regarding regression to the mean.  Less probable states are followed by more probable states.  The most probable future event after an extreme event is one that is less extreme.  Predicting that any extremely depressed client will not feel as depressed in the next session is much more likely than the client will become more depressed.

 

A final aspect of representativeness is the confusion regarding reverse conditional probabilities, wherein the probability of one behavior (behavior A) given another (behavior B) is viewed as equal to the probability of behavior B given behavior A.  Clinicians seem especially prone to this bias.  A common piece of clinical lore is that clients who attempt suicide tend to do so after coming out of a depression, so that clinicians should be alert to elevations in mood of depressed clients.  The justification for this pattern is that the client has made a decision to kill him or herself and is thus less in turmoil.  The elevation of mood may indeed have occurred in each and every client who has attempted suicide, but this in no way implies that we should attend to mood changes as cues to suicide.  The probability of a mood elevation given a suicide attempt may indeed be extremely high, but this does not equal the probability of a suicide attempt given a mood elevation.  A conservative estimate of the base-rate of the later (suicide attempt given a mood elevation) is in the ball park of .00001 if not lower.  The probability of mood elevations being unrelated to suicide attempts is very high, so the value of attending to mood elevations as a cue for suicide is not justified.  Another common example of confusion over reverse conditional probability is the usage of past incidents as diagnostic cues.  Just because current clients who report certain interpersonal difficulties have also reported past abuse, does not mean that current abuse is related to having these specific interpersonal difficulties in the future.  Another example is clients with eating disorders.  Many clinicians have noted the perfectionistic tendencies in clients who have eating disorders and have suggested that perfectionistic tendencies should be used as a diagnostic sign.  However, the number of individuals who have perfectionistic tendencies who do not manifest eating disorders far exceeds the number that do. 

 

The second heuristic used by human information processors is that of availability, which refers to the incomplete nature of our memory search for information.  To facilitate speed of memory search, we focus on only the most salient aspects and frequently ignore other aspects that may also be relevant.  Those aspects that are more easily brought to mind are viewed as thus more salient.  Availability thus refers to memory access and this is affected by exposure, mood, imaginability, and category vividness. 

 

The bias of exposure is one especially relevant to clinicians.  Clinicians use their past and current clients as comparisons so the quality of any decision rests upon the completeness of this sample and our ability to access it completely.  Cohen and Cohen (1984) demonstrated that our clinical samples are extremely biased and unrepresentative.  The clinical caseload very quickly gets filled with a relatively few number of clients who tend to be fairly pathological.  Given our familiarity with this group, these are the individuals that are most easily accessed as a comparison group.  Using this group as a basis of deciding the relative health and pathology of individuals is very problematic.  Cohen and Cohen called this natural tendency to inappropriately make decision based on this very flawed sample the "clinician's illusion."  Howard et al. (1989) have documented that indeed for most clinicians, their caseload is composed of a very few, fairly pathological individuals.  Because of the ease of retreivability, these few clients serve as an inappropriate basis of clinical comparison for decision-making.

 

Our access to memory is also affected by our mood.  The literature on state dependent learning and recall (Bower, 1981; Forgas, 1990; Gilligen & Bower, 1984; Isen, 1984) is an example of this mood availability heuristic.  Similar to clients who are only able to access negative life experiences when depressed, we as clinicians suffer from the same retreivability flaw (Dumont, 1993).  If we are feeling angry with a client, we are most prone to access past clients to which we had similar reactions.  This access to past clients to which we reacted similarly is helpful, but conversely we would be less likely to remember other clients toward whom we were not angry and thus perhaps miss important comparisons.

 

Biases of imaginability refers to the tendency to retrieve information that is plausible without regard to its probability.  We construct a series of possible behaviors or plans based to a large extent on our ability to imagine their occurring.  If we can imagine a particular course of events, it is likely that we will plan accordingly, regardless of the probability of these events transpiring.  We use the imaginability as a flawed indicator of probability of occurrence.  Being able to imagine that a client could commit suicide greatly increases our assessment that it could occur even though it could be extremely unlikely.  Because we incorrectly inflate the probability of events due to their imaginability, we often take the very conservative approach toward prevention even in the face of highly unlikely events.

 

Finally, the availability heuristic of category vividness also serves us well as information processors but can inflict bias in our decision-making.  Humans tend to retrieve those categories that are most vivid.  Aspects that are most memorable in their extremeness and characteristics are those that are most easily retrieved.  Information that is less exciting or remarkable is the information that tends to be last retrieved.  With respect to clinical decision-making, this aspect of availability ensures that the past clients most likely to be retrieved are those most disturbed, troubling, or most successful.  The norm is much less frequently accessed as it is less vivid.  Also there is a tendency to be more able to access information that is more abstract than specific.  Clinicians, for example, will be more likely to remember that a client has relationship problems but unable to provide the specifics of the difficulties.  So with respect to the availability heuristic, we are most prone to retrieve information that is vivid (often defined as extreme), abstract (having few specifics to substantiate the concept), based on our own flawed sample of clients, and similar to our current mood.

 

The final major heuristic noted by Tversky and Kahneman (1974) is that of anchoring, which refers to our tendency to let initial information and impressions determine subsequent decision-making.  Even when presented with very different information, humans do not tend to adjust their decisions much away from the initial starting point, or anchor.  For example, if one receives early information from the client or other sources that the diagnosis of borderline may be appropriate, less pathological diagnoses may never be examined.  Clinicians tend to make decisions rapidly and maintain these decisions over time (Gauron & Dickinson, 1966; Meehl, 1960; Rubin & Shontz, 1960).  Providing more information to clinicians does not help alter incorrect clinical decisions (Clavelle & Turner, 1980) or lead to better decisions, although it does lead to the false impression that one has made a better decision (Oskamp, 1965, 1982).

 

The heuristics of representativeness, availability, and anchoring are important aids in human decision-making in that they allow for efficient processing of information.  Each heuristic, however, carries with it a bias and this bias can affect the quality of decisions made.  It is these decision-making aids that help account for the relative superiority of actuarial methods of aggregating information over clinical methods.  Clinicians rely too much on memory and their own idiosyncratic weighting of information.  Actuarial models do not rely on memory and can be combined in a variety of straightforward manners (even by just averaging the different scales; Dawes, 1979).  Perhaps some the biggest advantages of usage of psychological tests in this context is the high quality information associated with the scales (established validity) and the normative base (includes information about base-rates) which alone will help offset some of the heuristic biases noted.

 

Information Aggregation

Goldberg (1991) has noted the superiority of actuarial combination of psychological test data is related to five issues.  First, as we have previously discussed, one reason clinicians do not do well is that they ignore the different validities of the predictors.  Usage of sound psychological test data with uniformly high validity should obviate this problem.  Second, it is difficult to combine variables if they have different metrics (e.g., how does one intuitively combine scores from two variables one with scores ranging from 0 to 100 and another with scores from 1 to 5?).  Typically, psychological test data provides information that is normed and scored in a common metric (e.g., T scores).  Third, clinicians typically are not consistent in their application of predictions made from data; they apply inconsistent weights to the predictors.  For example, a clinician may weight one predictor scale highly for one case and in the next case a different scale is weighted highly.  Clearly, applying a consistent manner of combining the data would improve prediction.  Dawes (1979) has demonstrated that even simple averaging of information or scales is superior to inconsistent clinician combination of information.  Fourth, clinicians are insensitive to different degrees of redundancy in information.  Sines (1959) demonstrated that when clinicians seek more information they tend to add psychological tests that overlap highly with those already included.  This usage of overlapping indices does little to increase prediction accuracy.  If added information is sought to improve a clinical decision, instruments with little overlap to the current measures should be used.  Only by adding non redundant information will the incremental validity (prediction above and beyond that already obtained) improve.  Finally, clinicians are relatively insensitive to regression effects as noted above and thus need to take these into account when interpreting psychological test information.

 

One other important hypothesis testing strategy needs to be covered.  Typically all clinicians evaluate the accuracy of their interpretations and clinical decisions in relation to client reactions to these interpretations.  Using client reactions is a highly fallible piece of feedback on the quality and accuracy of clinical decisions because clients are not that discriminating in their acceptance.  The well known "P. T. Barnum effect" relates to this lack of critical client evaluation.  Snyder, Shenkel, and Lowrey (1977) have reviewed the extensive literature on clients and individuals willingness to accept most any interpretation of psychological test data as accurate descriptions of themselves.  Most of what we interpret to our clients will probably be accepted uncritically.  It is thus a mistake to attach too much accuracy to client affirmation of our interpretations.

 

Clinical Training and Experience as Remedies to Poor Clinical Judgment?

Given the abundance of research on the problems with us as information processors, how is it that we do not improve with experience.  Would not we expect to improve our predictive skills as we gain experience and feedback on our decisions?  The research demonstrates that, no, this is not the case.  In general, novice and expert clinicians do not differ in accuracy (Friedlander & Phillips, 1984; Friedlander & Stockton, 1983; Goldberg, 1959; Oskamp, 1962, 1965; Taft, 1955) although reviewers of the research have differed somewhat regarding the strength of this conclusion (e.g., Faust, 1986; Faust & Ziskin, 1988; Garb, 1989; Rock, Bransford, Maisto, & Morey, 1987; Wiggins, 1973).  For example, one of the more favorable reviews with respect to clinicians' accuracy was done by Garb (1989) and he concluded that with some sorts of data (biographical, projective tests, and neuropsychological tests) there was no difference between graduate students and experienced clinicians, however there were differences when biographical, WAIS and MMPI were used (which Dawes, [1994] attributes to the quality of the information and the training in knowing how to use this specific information).

 

Perhaps a major reason for the failure of us as clinicians to learn from our experience is the hindsight bias (Wedding & Faust, 1989) which is the tendency to falsely believe that we were able to accurately predict an event after the event has transpired.  A common term applied to this bias in the sports pages is the "Monday morning quarterback" where we can criticize the wisdom of certain plays or strategies that occurred the previous day and with certainty claim that we would have done otherwise had we but had the choice.  This bias has been documented repeatedly (Arkes, Faust, Guilmette, & Hart, 1988; Arkes, Wortman, Saville, & Harkness, 1981; Fischhoff, 1975) and may help account for our lack of usage of information.  This hindsight bias creates an "illusion of learning."  More experienced clinicians are more confident of their judgments than are novices, even though the judgments are no less accurate (Einhorn, 1986; Einhorn & Hogarth, 1978; Friedlander & Phillips, 1984; Oskamp, 1962, 1965).

 

Given that we as processors of information are quite fallible, what is to be done.  Should all attempts at clinical decision be abandoned in favor of statistical models?  We think that the response to this question is a qualified no.  Clearly statistical models have more predictive accuracy than we do as clinicians and computerized test interpretation has promise as outperforming us in this area also (Eyde, Kowal, & Fishburne, 1991).  However, clinicians do have the ability to observe and select relevant information.  For example, Johnson (1988) and Einhorn (1986) demonstrated that experts' strength was in the selection cues.  Regardless, clinicians need to be acutely aware of their limitations as processors of information.

 

It is the disease of not listening, the malady of not marking, that I am troubled withal (Shakespeare, King Henry IVth, Part 2, Act 1, scene ii, line 139)

Clinical Decision-Making Aids

Given our weaknesses as processors of information, what can we do to minimize these biases?  There are several recommendations that have been made (Arkes, 1981; Dumont, 1993; Salovey & Turk, 1991; Wierzbicki, 1993).

1.  Adopt a scientific approach to information evaluation and hypothesis testing (Tracey, 1991).  This involves not confusing the ability to explain with the ability to predict.  Clinicians should focus on making explicit predictions and then assessing the extent to which these predictions are borne out.  This process of making predictions forces the clinician to be explicit about assumptions and hinders the "hindsight" bias. 

2. Get quality information.  Dawes (1994) has noted that clinicians typically get poor feedback information.  So even if appropriate hypothesis testing were conducted, the quality of information obtained provided little corrective feedback.  For example rarely do clinicians obtain information on what has transpired with their clients after termination.  Frequently the only cases where feedback is obtained are those that have not succeeded and return for treatment.  Attempts should be made to obtain reliable and valid information following termination to evaluate the accuracy of predictions made.  Also, care should be taken in using client acceptance of test interpretations as accuracy feedback because of the "P. T. Barnum effect."

3. Think Bayesian (Dawes, 1991).  This means be aware of base-rates as they are related to the probability of occurrence of different behaviors and the probability of predictability of different future behaviors.  Thinking Bayesian requires attention to the full range of individuals both with and without the disorder of focus.  Also thinking Bayesian means one should not equate reverse conditional probabilities.  The ability to think Bayesian requires knowledge of simple Bayesian probability rules but it also requires extensive knowledge of population probabilities.  Psychological tests help provide some of the information on base-rates and predictability of behavior.

4. Consider alternative hypotheses and engage in disconfirming hypothesis testing.  As noted, humans tend to seek confirming evidence and this strategy is not beneficial for accuracy of decision-making.  Specify disconfirming evidence and then seek this information out.

5. Rely less on memory as this relates to several biases in processing, especially availability.

6.  Use the best information and methods.  Clinicians need to choose high quality information.  This information should include the best psychological instruments.  If the clinician requires more information in cases where the information obtained may not be enough, attempts should be taken to choose non redundant scales.  For example, if one in interested in assessing depression in a client and the depression scale one used is not clear, using another similar depression scale will add little information.  Also, the best methods should be used and this means more actuarial combination of information (even straight averaging of scales) because this aggregation is consistent.  As noted, clinical impressions gleaned from interviewing can be added to the equation, but for best prediction the clinician should not rely on his or her own combination of information because this tends to be idiosyncratic and inconsistent.  Valuable clinical impressions can and should be added to the prediction information, but clinicians need to be careful not to combine the data because they tend to use only one or two variables and not all.

7. Recognize personal biases as they pertain to clinical decision-making, especially as they pertain to age, gender, class, and race.

8. Be aware of the effects of regression.  Less likely states tend to be followed by more likely states.  A depressed person will feel less depressed tomorrow.

An attempt has been made to sensitize the reader to the many problems involved in clinical decision-making.  We as clinicians and as humans are clearly fallible decision-makers.  Psychological tests provide an avenue to improve our decision accuracy.  Care, however, must still be taken in their selection and interpretation.  Faust (1991) has written a wonderful tongue-in-cheek description of how clinicians would be different had we heeded Meehl's recommendations back in 1954 regarding our foibles as clinical decision-makers.  We ignored them then and continue in many ways to do so now.


References

Abromowitz, C. V., & Dokecki, P. R. (1977). The politics of clinical judgment: Early empirical returns. Psychological Bulletin, 84, 460-476.

Abromowitz, C. V., Murray, J. (1983).  Race effects in psychotherapy.  In J. Murray & P. R. Abramson (Eds.), Bias in psychotherapy (pp. 215-255).  New York: Academic.

Arkes, H. R. (1981). Impediments to accurate clinical judgment and possible ways to minimize their impact.  Journal of Consulting and Clinical Psychology, 49, 323-330.

Arkes, H. R., Faust, D., Guilmette, T. J., & Hart, K. (1988). Eliminating the hindsight bias.  Journal of Applied Psychology, 73, 305-307.

Arkes, H. R., Wortmann, R. L., Saville, P., & Harkness, A. R. (1981).  The hindsight bias among physicians weighting the likelihood of diagnosis.  Journal of Applied Psychology, 66, 252-254.

Broverman, I. K., Broverman, D. M., Clarkson, F. E., Rosenkrantz, P. S., & Vogel, S. R. (1970).  Sex-role stereotypes and clinical judgments of mental health.  Journal of Consulting and Clinical Psychology, 34, 1-7.

Bower, G. (1981).  Mood and memory.  American Psychologist, 36, 129-148.

Cantor, N., Smith, E., French, R., Mezzich, J. (1980).  Psychiatric diagnosis as prototype categorization.  Journal of Abnormal Psychology, 89, 181-193.

Carrol, J. S., Winer, R. L., Coates, D., Galegher, J., & Alibrio, J. J. (1988).  Evaluation, diagnosis, and prediction in parole decision-making.  Law and Society Review, 17, 199-228.

Chapman, L. J., & Chapman, J. P. (1969).  Illusory correlation as an obstacle to the use of valid diagnostic signs.  Journal of Abnormal Psychology, 73, 193-204.

Clavelle, P. R., & Turner, A. D. (1980).  Clinical decision-making among professionals and paraprofessionals.  Journal of Clinical Psychology, 33, 133-152.

Cohen, P. , & Cohen, J.  (1984).  The clinician's illusion.  Archives of General Psychiatry, 41, 1178-1182.

Dahlstrom, W. G. (1993).  Tests:  Small samples, large consequences.  American Psychologist, 48, 393-399.

Davidson, C. V., & Abromowitz, S. I. (1980).  Sex bias in clinical judgment: Later empirical returns.  Psychology of Women Quarterly, 4, 377-395.

Dawes, R. M. (1979).  The robust beauty of improper linear models in decision making.  American Psychologist, 34, 571-582.

Dawes, R. M. (1986).  Representative thinking in clinical judgment.  Clinical Psychology Review, 6, 425-441.

Dawes, R. M. (1988).  Rational choice in an uncertain world. San Diego:  Harcourt, Brace, Jovanovich.

Dawes, R. M. (1991).  Probabalistic versus causal thinking.  In D. Cicchetti & W. M. Grove (Eds.), Thinking clearly about psychology:  Vol. 1.  Matters of public interest (pp. 185-216).  Minneapolis:  University of Minnesota Press.

Dawes, R. M. (1994).  House of cards:  Psychology and psychiatry built on myth.  New York: Free Press.

Dawes, R. M., Faust, D., & Meehl. P. E. (1989). Clinical and actuarial judgment.  Science, 243, 1668-1674.

DeVaul, R. A., Jersey, F., Chappell, J. A., Carver, P., Short, B., & O'Keefe, (1957).  Medical school performance of initially rejected students.  Journal of the American Medical Association, 257, 47-51.

Dumont, F. (1991).  Expertise in psychotherapy:  Inherent liabilities of becoming experienced.  Psychotherapy, 28, 422-428.

Dumont, F. (1993).  Inferential heuristics in clinical problem formulation:  Selective review of their strengths and weaknesses.  Professional Psychology: Research and Practice, 24, 196-205.

Dumont, F., & Lecomte, C. (1987).  Inferential processes in clinical work:  Inquiry into logical errors that affect diagnostic judgments.  Professional Psychology: Research and Practice, 18, 433-438.

Einhorn, H. J. (1979).  Expert measurement and mechanical combination.  Organizational Behavior and Human Performance, 13, 171-192.

Einhorn, H. J. (1986).  Accepting error to make less error.  Journal of Personality Assessment, 50, 387-395.

Einhorn, H. J. & Hogarth, R. M. (1978).  Confidence in judgment:  Persistence of the illusion of validity.  Psychological Review, 85, 395-416.

Eyde, L. D., Kowel, D. M., & Fishburn, Fishburne, F. J. (1991).  The validity of computer-based test interpretations of the MMPI.  In T. B. Gutkin & S. L. Wise (Eds.), Buros-Nebraska Symposium on Measurement and Testing:  The computer and the decision-making process (Vol. 4, pp. 75-124).  Hillsdale, NJ: Erlbaum.

Faust, D. (1986).  Research on human judgment and its application to clinical practice. Professional Psychology:  Research and Practice, 17, 420-430.

Faust, D. (1991).  What if we had really listened?  Present reflections on altered pasts.  In D. Cicchetti & W. M. Grove (Eds.), Thinking clearly about psychology:  Vol. 1.  Matters of public interest (pp. 185-216).  Minneapolis:  University of Minnesota Press.

Faust, D., & Ziskin, J. (1988).  The expert witness in psychology and psychiatry.  Science, 241, 31-35.

Fischoff, B. (1975).  Hindsight = foresight: The effect of outcome knowledge on judgment under uncertainty.  Journal of Experimental Psychology:  Human Perception and Performance, 1, 288-299.

Forgas, Affective influences on individual and group judgments.  European Journal of Social Psychology, 20, 441-453.

Friedlander, M, L., & Phillips, S. D. (1984).  Preventing anchoring errors in clinical judgment.  Journal of Consulting and Clinical Psychology, 52, 366-371.

Friedlander, M, L., & Stockton, S. J. (1983).  Anchoring and publicity effects in clinical judgment.  Journal of Clinical Psychology, 39, 637-643.

Garb, H. N. (1989).  Clinical judgment, clinical training, and professional experience.  Psychological Bulletin, 105, 387-396.

Gauron, E. G., &  Dickinson, J. K. (1966).  Diagnostic decision-making in psychiatry 2: Diagnostic styles.  Archives of General Psychiatry, 14, 233-237.

Gilovich, T. (1991).  How we know what isn't so:  The fallibility of human reason in everyday life.  New York:  Free Press.

Gilligan, S. G., & Bower, G. H. (1984).  Cognitive consequences of emotional arousal.  In C. Izard, J. Kagan, & R. Zajonc (Eds.), Emotions, cognition, and behavior (pp. 547-588). New York: Cambridge.

Goldberg, L. R. (1959).  The effectiveness of clinician's judgments:  The diagnosis of organic brain damage from the Bender-Gestalt.  Journal of Consulting Psychology, 23, 25-33.

Goldberg, L. R. (1965).  Diagnostician versus diagnostic signs:  The diagnosis of psychosis versus neurosis from the MMPI.  Psychological Monograph, 79.

Goldberg, L. R. (1968).  Simple models or simple processes?  Some research on clinical judgment.  American Psychologist, 23, 483-496.

Goldberg, L. R. (1970).  Man versus model of man:  A rationale plus some evidence for a method of improving on clinical inferences.  Psychological Bulletin, 73, 422-432.

Goldberg, L. R. (1986).  Some informal explorations and ruminations about graphology.  In B. Nevo (Ed.), Scientific aspects of graphology (pp. 281-293).  Springfield, IL: Charles C. Thomas.

Goldberg, L. R. (1991).  Human mind versus regression equation:  Five contrasts.  In D. Cicchetti & W. M. Grove (Eds.), Thinking clearly about psychology; Vol. 1: Matters of public interest (pp. 173-184).  Minneapolis: University of Minnesota Press.

Granberg, D., & Brent, E. (1983).  When prophecy bends: The preference-expectation link in U.S. presidential elections, 1952-1980.  Journal of Personality and Social Psychology, 45, 477-491.

Greenwald, A. G., Pratkanis, A. R., Leippe, M. R., & Baumgardner, M. H. (1986).  Under what conditions does theory obstruct research progress?  Psychological Review, 93, 216-229.

Haverkamp, B. E. (1993).  Confirmatory bias in hypothesis testing for client-identified and counselor self-generated hypotheses.  Journal of Counseling Psychology, 40, 303-315.

Holt, R. R. (1958).  Clinical and statistical prediction:  A reformulation and some new data.  Journal of Abnormal and Social Psychology, 56, 1-12.

Holt, R. R. (1970).  Yet another look at clinical and statistical prediction:  Or is clinical psychology worthwhile?  American Psychologist, 25, 337-349.

Holt, R. R. (1978).  A historical survey of the clinical-statistical controversy.  In R. R. Holt (Ed.), Methods in clinical psychology: Vol. 2.  Prediction and research (pp. 3-18).  New York: Plenum.

Holt, R. R. (1986).  Clinical and statistical prediction:  A retrospective and would be integrative perspective.  Journal of Personality Assessment, 50, 376-386.

Holt, R. R. (1991).  Judgment, inference, and reasoning in clinical perspective.  In D. C. Turk & P. Salovey, (Eds.), Reasoning inference and judgment in clinical psychology (pp. 233-250).  New York:  Free Press.

Isen, A. M. (1984).  Toward understanding the role of affect in cognition.  In R. S. Wyer & T. K. Srull (Eds.), Handbook of social cognition (Vol. 3, pp. 179-236).  Hillsdale, NJ: Erlbaum.

Howard, K. I., Davidson, C. V., O'Mahoney, M. T., Orlinsky, D. E., & Brown, K. P. (1989).  Patterns of psychotherapy utilization.  American Journal of Psychiatry, 146, 775-778.

Johnson, E. J. (1988).  Expertise and decision under uncertainty:  Performance and process.  In M. T. H. Chi, R. Glaser, and M. J. Farr (Eds.), The nature of expertise (pp. 209-228).Hillsdale, NJ: Erlbaum.

Kahneman, D., & Tversky, A. (1982).  Intuitive prediction: Biases and corrective procedures.  In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty:  Heuristics and biases (pp. 414-421).  New York:  Cambridge.

Langer, E. J., & Abelson, R. P. (1974).  A patient by any other name...: Clinical group differences in labeling bias.  Journal of Consulting and Clinical Psychology, 42, 4-9.

Libby, R.  (1976).  Man versus model of man:  Some conflicting evidence.  Organizational Behavior and Human Performance, 16, 1-12.

Lopez, S. R. (1989).  Patient variable biases in clinical judgment:  Conceptual overview and methodological considerations.  Psychological Bulletin, 106, 184-203.

Meehl, P. E. (1954).  Clinical versus statistical prediction:  A theoretical analysis and a review of the evidence.  Minneapolis:  University of Minnesota Press.

Meehl, P. E. (1957).  When shall we use our heads instead of the formula?  Journal of Consulting and Clinical Psychology, 4, 268-273.

Meehl, P. E. (1959).

Meehl, P. E. (1960).  The cognitive activity of the clinician.  American Psychologist, 15, 19-27.

Meehl, P. E. (1973).  Why I do not attend case conferences.  In P. E. Meehl, Psychodiagnosis :  Selected papers (pp. 225-302).  Minneapolis, MN: University of Minnesota Press.

Meehl, P. E. (1986).  Causes and effects of my disturbing little book.  Journal of Personality Assessment, 50, 370-375.

Milstein, R. M., Wilkinson, L., Burrow, G. N., & Kessen, W. (1981).  Admission decisions and performance during medical school.  Journal of Medical Education, 56, 77-82.

Nisbett, R. E., & Ross, L. (1980).  Human inference:  Strategies and shortcomings of social judgment.  New York:  Prentice-Hall.

Oskamp,  S.(1962).  The relationship of clinical experience and training methods to several criteria of clinical prediction.  Psychological Monographs:  General and Applied, 76 (No. 547), 1-28.

Oskamp, S. (1965).  Overconfidence in case-study judgments.  Journal of Consulting Psychology, 29, 261-265.

Oskamp, S. (1982).  Overconfidence in case-study judgments.  In D. Kahneman, P. Slovic, and A. Tversky (Eds.), Judgment under uncertainty:  Heuristics and biases (pp. 287-293).  New York: Cambridge.

Rock, D. L., Bransford, J. D., Maisto, S. A., & Morey, L. (1987).  The study of clinical judgment:  An ecological approach.  Clinical Psychology Review, 7, 645-661.

Rubin, M., & Shontz, F. C. (1960).  Diagnostic prototypes and diagnostic processes of clinical psychologists.  Journal of Consulting Psychology, 24, 234-239.

Rosehan, D. L. (1973).  On being sane in insane places.  Science, 179, 250-258.

Salovey, P. & Turk,  D. C. (1991).  Clinical judgment and decision-making.  In C. R. Snyder & D. R. Forsyth (Eds.), Handbook of social and clinical psychology:  The health perspective (pp. 416-437).  New York:  Pergamon.

Sawyer, J.  (1966).  Measurement and prediction, clinical and statistical.  Psychological Bulletin, 66, 178-200.

Sears, D. O., & Whitley, R. E. (1973).  Political persuasion.  In I. deS. Pool, W. Schramm, F. W. Frey, N. Maccoby, & E. B. Parker (Eds.), Handbook of communication (pp. 253-289).  Chicago: Rand-McNally.

Sines,  L. K. (1959).  The relative contribution of four kinds of data to accuracy in personality assessment.  Journal of Consulting Psychology, 23, 483-492.

Snyder, C. R., Shenkel, R. J., & Lowrey, C. R. (1977).  Acceptance of personality interpretations:  The "Barnum effect" and beyond.  Journal of Consulting and Clinical Psychology, 45, 104-114.

Snyder, M., & Campbell, B. (1980).  Testing hypotheses about other people:  The role of the hypothesis.  Personality and Social Psychology Bulletin, 6, 421-426.

Snyder, M., & Cantor, N. (1979).  Testing hypotheses about other people:  The use of historical knowledge.  Journal of Experimental Social Psychology, 15, 330-342.

Strohmer, D. C., Shivy, V. A., & Chiodo, A. L. (1990).  Information processing strategies in counselor hypothesis testing:  The role of selective memory and expectancy.  Journal of Counseling Psychology, 37, 465-472.

Taft, R.  (1955).  The ability to judge people.  Psychological Bulletin, 52, 1-23.

Temerlin, M. K. (1968).  Suggestion effects in psychiatric diagnosis.  Journal of Nervous and Mental Disease, 147, 349-353.

Tracey, T. J. (1991). Counseling research as an applied science.  In C. E. Watkins and L. S. Schneider (Eds.)., Research in counseling (pp. 1-31).  Hillsdale, NJ: Erlbaum.

Turk, D. C., & Salovey, P. (1985).  Cognitive structures, cognitive processes, and cognitive-behavior modification: II. Judgments and inferences of the clinician.  Cognitive Therapy and Research, 9, 19-33.

Tversky, A., & Kahneman, D. (1974).  Judgment under uncertainty: Heuristics and biases.  Science, 185, 1124-1131.

Tversky, A., & Kahneman, D. (1981).  The framing of decisions and the psychology of choice.  Science, 21, 453-458.

Wedding, D., & Faust, D. (1989).  Clinical judgment and decision making in neuropsychology.  Archives of Clinical Neuropsychology, 4, 233-265.

Whitley, B. E. (1979).  Sex roles and psychotherapy:  A current appraisal.  Psychological Bulletin, 86, 1309-1321.

Wierzbicki, M. (1993).  Issues in Clinical Psychology:  Subjective versus objective approaches.  Boston:  Allyn & Bacon.

Wiggins, J. (1973).  Personality and prediction:  Principles of personality assessment.  Reading, MA: Addison-Wesley.

Wiggins, J. (1981).  Clinical and statistical prediction:  Where are we and where do we go from here?  Clinical Psychology Review, 1, 3-18.

Zedlow, P. B. (1978).  Sex differences in psychiatric evaluation and treatment:  An empirical review.  Archives of General Psychiatry, 35, 89-93.

Zubin, J. (1956).  Clinical versus actuarial prediction:  A pseudo-problem.  In Proceedings, 1955 invitational conference on testing problems (pp. 107-128).  Princeton, NJ: Educational Testing Service.