Inference and Attribution Errors in Test
Interpretation
Terence J. Tracey and James Rounds
University of Illinois at
Urbana-Champaign
In R. K. Goodyear & J. W.
Lichtenberg, (Eds.), (1999)
Test interpretation: Integrating science and practice
Boston:
Allyn & Bacon
The fool doth think he is wise, but the
wise man knows himself to be a fool (Shakespeare, As You Like It, Act V, scene
i, line 35)
Thou speakest wise than thou art ware of
(Shakespeare, As You Like It, Act II, scene iv, line 57)
The present chapter reviews the research
on how information processing relates to the inference errors commonly made by
those who must interpret assessment information. Assessments (which are viewed as being composed of both
psychological tests results and other information obtained from the client and
other sources) provide the grist for the interpreter's mill. The interpreter must sort the information
and decide overall conclusions and implications. As has been demonstrated in a variety of professional arenas, including
psychology, humans are not particularly skilled at combining various pieces of
information. By familiarizing the
practitioner with the common biases in information processing, it is hoped that
the effects of these biases will be minimized in test interpretation and
decision-making. Much is known about
how to develop and evaluate psychological tests. We know less about how to use the information generated by
tests..
As mental health workers and clinicians,
we have the unenviable task of attempting to establish methods to enable the
reliable and valid classification and understanding of individuals. Clinicians are called upon, and present
themselves, to help make decisions on a wide variety of issues, such as
prediction of who will benefit from treatment, what treatment to select,
recidivism, psychodiagnosis, child custody, who will become violent, admissions
into programs, and hiring selection.
Our information typically is extremely
subjective and fuzzy, consisting of impressions of how individuals present
themselves to us or their reporting of background experiences. Each new client or individual presents us
with a wealth of information of varying quality. We typically try to evaluate this wealth of fuzzy information and
place people into very fuzzy sets (Cantor, Smith, French, & Mezzich,
1980). To what information should we
attend in our decision-making and how should we use this information? Clearly as humans we cannot attend to all
that is being presented. Hopefully,
through our training we have been shown what information is most salient or
worthy of attention in our deliberations.
We need to reduce or simplify the information that we are presented
with. Then after shrinking our
perceptual field to a manageable size, we need to examine the information and
make some decisions about the individual (e.g., is this person appropriate for
this treatment, what is the prognosis or diagnosis). In this simplified view, clinical decision making involves two
different steps, one of information restriction and selection, and one of
information aggregation or processing.
As has been demonstrated in clinicians (Dawes, 1994; Wierzbicki, 1993)
and humans (Gilovich, 1991) in general, we are not especially skilled at either
of these steps with respect to the judgments of other people.
Information
restriction: We see what we expect or
wish to see
Although there are a variety of manners
in which humans restrict the information that is attend to, I will focus on
three specific forms that affect us as human information processors: (a) our tendency to see patterns where none
exist, (b) our tendency to seek confirmatory evidence, and (c) our usage of
preconceived biases. We, as counselors,
are prone to these errors of information restriction.
Gilovich (1991) has summarized a wealth
of research on humans perceptions of relations and causes in everyday life and
how we are very prone to impute order to ambiguous information. We strive for predictability in our world. He cites several examples of the clustering
illusion to support this claim. When
presented with random sequences (either points on a map or sequences of shots
made in a basketball game by different players), we tend to focus on clusters
of points and infer a relation even though the actual process is random. A common example of this inference of cause
is the gambler's fallacy where the probability of a particular event is over
evaluated given independent prior events (e.g., the presence of heads when the
previous 4 coin tosses resulted in tails).
Gilovich has argued that humans are predisposed to look for and see
order in the relations among events and that this tendency has served the
species well in an evolutionary context, but that it is not without flaws. Whether or not there is an evolutionary
basis in this human tendency to impute order, it is clearly present in our
thinking about the world and the behavior of other people.
Besides imputing order where none exists,
humans are prone to self-confirmation in cases where equivocal information
exists (Granberg & Brent, 1983; Sears & Whitley, 1973). We believe what we wish to believe. A partial reason for this tendency to
believe what we wish is the human tendency to search for confirmatory
information. There has been a wealth of
research demonstrating the human tendency to search out and attend only to
evidence that confirms our ideas, beliefs, or hypotheses (Greenwald, Pratkanis,
Leippe, & Baumgardner, 1986; Nisbett & Ross, 1980; Snyder, &
Campbell, 1980). The problem with the
confirmatory tendency is that only information supportive of ones beliefs is
thus attended to, even in the face of extremely disconfirming information. Information that could provide corrective
feedback that our beliefs are in error is rarely attended to. This process of seeking confirmation not
only applies to daily information but also our memory searches. Snyder and Cantor (1979) found that we
recalled memories that supported our beliefs while not remembering others which
disconfirmed our beliefs. This process
of searching for confirmation can thus lead to some very inaccurate conclusions
and, as will be discussed below, may lead to an increased confidence in one's
conclusions because in searching for information of the correctness of one's
belief, only confirming information is focused on and thus it is easy to gain
an inaccurate assurance that the belief is warranted.
This human process of searching for
confirmatory information has also been documented among clinicians (Haverkamp,
1993; Strohmer, Shivy, & Chiodo, 1990).
Although presumably clinicians have been trained to examine all
information in making clinical decisions, clinicians are just as prone to
search out and attend to only evidence that confirms initial clinical
impressions.
Related to the usage of a confirmatory
hypothesis testing strategy, is the tendency to reify our preconceptions. The literature has many examples of how
clinicians make errors in their clinical decision-making that are related to
beliefs regarding specific cues. There
is a tendency to overpathologize clients (Langer & Abelson, 1974; Rosenhan,
1973; Temerlin, 1968). If the situation
is one associated with pathology (e.g., the individual seeing you is a client
admitting distress), counselors are prone to search for information indicative
of pathology and then interpret this information as indicative of more
pathology that may exist. This tendency
to overpathologize is especially prone when the cues of social class
(Abromowitz & Dokecki, 1977), race (Abromowitz & Murray, 1983; Lopez,
1989), and sex (Abromowitz & Dokecki, 1977; Broverman, et al., 1970;
Davidson & Abromowitz, 1980; Whitley, 1979; Zedlow, 1978) are present. Clinicians tend to view the poor, members of
lower social classes, people of color, and females as having greater pathology.
Clinicians need to be aware of these
limitations in the information attended to because inaccurate decisions
regarding clients can easily be made.
It is difficult for clinicians to claim that they are attending to the
appropriate information and if it is attended to, that it is evaluated in a
manner that is free from inaccurate preconceived biases. These biases in information focused on could
be related to the poor predictions yielded from interviews. The literature on the lack of predictive
validity of interviewing is quite strong in a wide variety of arenas (Bloom
& Brundage, 1947; Carrol et al, 1988; DeVaul et al, 1957; Milstein et al,
1981) as well as in clinical practice (Oskamp, 1965). As Meehl (1973) has noted, clinicians are not especially skilled
at selecting the best information to which to attend. The relative validity of information is often overlooked. Usage of standardized assessments devices helps
obviate some of the issues related to information restriction (Dahlstrom, 1993). However usage of high quality information
provided in psychological tests alone is not enough to remedy the ills involved
in our information processing. Given
quality information, the next task is to combine the information and to make
decisions.
Clinical
versus Actuarial Prediction
Years ago, Meehl (1954) penned what he
calls his "disturbing little book" comparing clinical versus
actuarial prediction. In his review of
the existing literature, he documented the relative superiority of actuarial
methods (i.e., those methods using population base rates and/or regression
techniques) over clinical methods in clinical decision making (diagnosis,
treatment application, and prognostication).
So even with quality information from psychological tests, clinicians
were still somewhat lacking with respect to clinical prediction. Meehl, however, held out hope for the
clinical decision making with respect to situations that require unique
combinations of data and the formulation of original hypotheses (Meehl, 1954,
1957, 1959). Meehl (1986) subsequently
has retracted much of these hopes for the clinician. Extensive review of the literature (Dawes, Faust, & Meehl,
1989; Sawyer, 1966; Wiggins, 1981) has demonstrated that time and time again,
actuarial methods of clinical prediction surpass clinical methods. In almost all cases, optimal weighting by
using regression methods results in superior prediction than clinicians'
judgments.
Early critics (Holt, 1958; Zubin, 1956)
of these conclusions noted that the research failed to consider (a) the lack of
consideration given to experience level of clinicians (expert clinicians may be
much better in predicting than general clinicians and certainly graduate
student clinicians), (b) failure to take account of clinician confidence (it
may be that clinicians may be more confident in some predications than others),
(c) the situation is too artificial for the results to apply to clinical
assessment in the real world, and (d) the lack of generalization of regression
weights obtained in one study (the optimal weighting for one sample may not at
all match that for another sample).
Subsequent research has focused on addressing these concerns with the
literature.
In reviewing a wealth of studies, Faust
(1991) and Dawes, Faust, and Meehl (1989) concluded that expertise has little
effect on the results. In conditions
where clinicians can choose information and collect it in their preferred
manner, expert clinicians still performed no better and typically worse than
actuarial methods. Clearly expert
clinicians did at times outperform straight statistical models, however there
was little consistency to this. Expert
clinicians evidenced little agreement among themselves in the predictions made
and even for individual expert clinicians, there was little consistency in
prediction accuracy from one case to the next.
In some contexts, a specific expert did an excellent job and in others
the expert did poorly. It should be
noted that these results applied in areas that the experts viewed themselves as
skilled. In addition, even simple
summing of predictors (as opposed to using more sophisticated regression
equations) outperformed the expert clinicians (Dawes, 1979). If optimal weighting (i.e., regressions) of
information was not adopted, but a strategy of simply adding up the information
was used, this procedure still yielded predictions unmatched in accuracy by the
expert clinicians.
Similar results were obtained when
clinician confidence in prediction was taken into account. Indeed many studies have found that expert
clinicians often are no more accurate than less expert clinicians but that they
have greater confidence in their predictions (Friedlander & Phillips, 1984;
Friedlander & Stockton, 1983; Goldberg, 1959, 1968; Oskamp, 1962, 1965;
Rock, et al., 1987). Goldberg (1986)
concluded that in general there is no relation between confidence in the
accuracy of one's prediction and its actual accuracy. Confidence is related to accuracy only when the assessment is
based on validated procedures.
Few areas of research in the field of
psychology have yielded results as unequivocal as these (Meehl, 1986). However, this body of literature has been
viewed as having "almost zilch" (Dawes, 1988) impact on the
practitioner and the field. We as a
profession continue to disregard the importance of this literature and its
implications to practice. It is not
clear whether this eschewal of this actuarial versus clinical prediction debate
rests on our professional inertia, our grandiosity and irrational
overestimation of our abilities as clinicians, or general wishing it weren't
so. Regardless, we cannot continue to
ignore this literature from either a pragmatic or ethical position. Even Holt, who has continual criticized this
literature (Holt, 1958, 1970, 1978, 1991) has noted that "maybe there are
still lots of clinicians who believe that they can predict anything better than
a suitably programmed computer; if so, I agree that it is not only foolish but
at times unethical of them to do so (Holt, 1986, p. 378).
Attention needs to be placed on
understanding how and why we as clinicians do not do as well when making
predictions. Two possible reasons may
account for our relative inability to generate valid clinical predictions: the specific nature of the clinical
endeavor, and basic problems associated with the fallible process of human
information processing.
Uniqueness
of Clinical Practice
Einhorn (1986) proposed (and Dawes, 1991,
later elaborated on) that perhaps a major reason for the clinician's relative
poor predictive capabilities is the different focus involved. Actuarial methods focus on general trends
and tend to view behavior as probabilistic.
In every specific prediction there is a certain amount of error built
in, but over all cases attempts are made to minimize error. This approach is very different from the
approach of clinicians who adopt a more deterministic model, wanting to know in
individual cases what the causes are and minimize error in each individual
case. If only for our own certainty in
dealing with individual cases, we tend to adopt an approach that focuses on
allowing no error. The clinician
desires accuracy on every occasion, presumably because of the focus on
individuals and perception of potential errors appears more salient in
individual cases. Inaccurate
predictions may thus be viewed as too costly to the clinician. A minimization of high risk strategy is the
one adopted; to reduce error in the individual case a very conservative
approach is adopted. However, the
acceptance of individual case error in the probabilistic models of the
actuarial method paradoxically results in less overall error. Thus the clinicians' focus on individual
prediction may actually hurt the ability to predict individual behavior.
Clinicians are not alone in their
relative inability to outperform actuarial prediction. Identical results have been yielded in a
variety of professional domains such as: medical diagnosis (Einhorn, 1979),
predicting bank failures (Libby, 1976), stock market fluctuations (Johnson,
1988), internship matching (Johnson, 1988), and predicting graduate student
success (Dawes, 1979). The clinical
versus actuarial prediction results may thus be attributable to the different
foci involved in each model, but more probably they are related to the
information processing capabilities of human decision makers.
Heuristics
in Clinical Judgment
Tversky and Kahneman (1974) presented
three heuristics that humans use to aid decision-making: representativeness,
availability, and anchoring. These
heuristics serve to simplify decision-making, making it more efficient,
however, they also can result in inaccurate decisions. The application of these heuristics have
been summarized widely as they pertain to general human decision-making (Dawes,
1988; Tversky & Kahneman, 1981) as well as clinical decision-making (Dawes,
1994; Dumont, 1991, 1993; Dumont & Lecomte, 1987; Salovey & Turk, 1991;
Tracey, 1991; Turk & Salovey, 1985; Wierzbicki, 1993).
Representativeness refers to the extent to which something
matches relevant categories. A frequent
example in clinical practice is the comparison of a specific client with
diagnostic categories. The clinician observes
client behavior and then assesses the extent to which the behavior fits the
different diagnostic types. The
question asked is one of "Is this behavior representative of this
diagnostic type?" If the behavior
is seen as similar to the diagnosis, a conclusion of diagnosing the client as
falling in that diagnostic group is typically made. The problem with this decision-making heuristic is that often
other relevant information is ignored.
Three common problems with representativeness are insensitivity to (a)
prior probabilities, (b) sample size, and (c) predictability.
Insensitivity to prior probabilities
refers to the common failure to take account of base-rates in assessing
representativeness. An example of
insensitivity to base-rates is the number of diagnoses of multiple personalities
made by some clinicians. The number of
individuals with this diagnosis is extremely rare in the population, yet one of
the authors has known some clinicians who have claimed upwards of 10 such
clients in their caseload. Besides the
obvious exaggeration of symptoms necessary to make such a diagnosis, this
diagnosis ignores the very rare probability of occurrence. The clinician sensitive to base-rates would
very closely scrutinize any such low base-rate diagnosis.
Insensitivity to sample size refers to
the frequent equating of information generated from large and small
samples. Obviously comparing an
individual instance to a category generated from a large sample is superior to
comparing it to a category generated from a small sample, but this is frequently
ignored. Humans manifest this insensitivity
in two ways: by over generalizing from
our own limited experience and by over generalizing from limited
observation. We build our clinical
experience from a small sample of the individual clients we personally have
interviewed yet we frequently err in valuing our own sample as much as some
larger sample. For example, one of the
authors had two clients early in his training who were diagnosed as compulsive
and each of these clients had violent episodes. Upon seeing a third client who was similar but not violent, this
author discounted the diagnosis of compulsive (even though it was very
appropriate given assessment scales and DSM criteria) because the client did
not resemble one part of the idiosyncratic category of compulsive (he was not
violent). The favoring of the limited
personal sample while ignoring the information generated from larger samples
(DSM and assessment scores) demonstrates this insensitivity to sample size. The other manifestation of this
insensitivity is over generalizing from a limited sample. We as clinicians frequently make diagnoses
from very limited bits of information and may be prone to ignore input from
other sources that have much more information developed over a longer time.
Insensitivity to predictability is similar
to insensitivity to base-rate in that no account is taken of the probability of
events. Where insensitivity to prior
probabilities refers to our ignoring base-rate information, insensitivity to
predictability refers to ignoring the differential probabilities of the future
behavior. Some behaviors and events are
much more likely to occur than others.
Insensitivity to predictability refers to the common pattern of viewing
all predictions as equally likely, or underestimating the relative differences
in predictability. Predicting a highly
probable event (e.g., what the client will be doing tomorrow) is viewed as
equal in predictability as an improbable event (e.g., what the client will be
doing next year). If one is able to
predict tomorrow well, it is common (and inaccurate) to conclude that one can
predict next year as well.
A related concept to insensitivity of
predictability is the common misunderstanding regarding regression to the
mean. Less probable states are followed
by more probable states. The most
probable future event after an extreme event is one that is less extreme. Predicting that any extremely depressed
client will not feel as depressed in the next session is much more likely than
the client will become more depressed.
A final aspect of representativeness is
the confusion regarding reverse conditional probabilities, wherein the
probability of one behavior (behavior A) given another (behavior B) is viewed
as equal to the probability of behavior B given behavior A. Clinicians seem especially prone to this
bias. A common piece of clinical lore
is that clients who attempt suicide tend to do so after coming out of a
depression, so that clinicians should be alert to elevations in mood of
depressed clients. The justification
for this pattern is that the client has made a decision to kill him or herself
and is thus less in turmoil. The
elevation of mood may indeed have occurred in each and every client who has
attempted suicide, but this in no way implies that we should attend to mood changes
as cues to suicide. The probability of
a mood elevation given a suicide attempt may indeed be extremely high, but this
does not equal the probability of a suicide attempt given a mood elevation. A conservative estimate of the base-rate of
the later (suicide attempt given a mood elevation) is in the ball park of
.00001 if not lower. The probability of
mood elevations being unrelated to suicide attempts is very high, so the value
of attending to mood elevations as a cue for suicide is not justified. Another common example of confusion over
reverse conditional probability is the usage of past incidents as diagnostic
cues. Just because current clients who
report certain interpersonal difficulties have also reported past abuse, does
not mean that current abuse is related to having these specific interpersonal
difficulties in the future. Another
example is clients with eating disorders.
Many clinicians have noted the perfectionistic tendencies in clients who
have eating disorders and have suggested that perfectionistic tendencies should
be used as a diagnostic sign. However,
the number of individuals who have perfectionistic tendencies who do not
manifest eating disorders far exceeds the number that do.
The second heuristic used by human
information processors is that of availability, which refers to the
incomplete nature of our memory search for information. To facilitate speed of memory search, we
focus on only the most salient aspects and frequently ignore other aspects that
may also be relevant. Those aspects
that are more easily brought to mind are viewed as thus more salient. Availability thus refers to memory access
and this is affected by exposure, mood, imaginability, and category
vividness.
The bias of exposure is one especially
relevant to clinicians. Clinicians use
their past and current clients as comparisons so the quality of any decision
rests upon the completeness of this sample and our ability to access it
completely. Cohen and Cohen (1984)
demonstrated that our clinical samples are extremely biased and
unrepresentative. The clinical caseload
very quickly gets filled with a relatively few number of clients who tend to be
fairly pathological. Given our
familiarity with this group, these are the individuals that are most easily accessed
as a comparison group. Using this group
as a basis of deciding the relative health and pathology of individuals is very
problematic. Cohen and Cohen called
this natural tendency to inappropriately make decision based on this very
flawed sample the "clinician's illusion." Howard et al. (1989) have documented that indeed for most
clinicians, their caseload is composed of a very few, fairly pathological
individuals. Because of the ease of
retreivability, these few clients serve as an inappropriate basis of clinical
comparison for decision-making.
Our access to memory is also affected by
our mood. The literature on state
dependent learning and recall (Bower, 1981; Forgas, 1990; Gilligen & Bower,
1984; Isen, 1984) is an example of this mood availability heuristic. Similar to clients who are only able to
access negative life experiences when depressed, we as clinicians suffer from
the same retreivability flaw (Dumont, 1993).
If we are feeling angry with a client, we are most prone to access past
clients to which we had similar reactions.
This access to past clients to which we reacted similarly is helpful,
but conversely we would be less likely to remember other clients toward whom we
were not angry and thus perhaps miss important comparisons.
Biases of imaginability refers to the
tendency to retrieve information that is plausible without regard to its
probability. We construct a series of
possible behaviors or plans based to a large extent on our ability to imagine
their occurring. If we can imagine a
particular course of events, it is likely that we will plan accordingly,
regardless of the probability of these events transpiring. We use the imaginability as a flawed
indicator of probability of occurrence.
Being able to imagine that a client could commit suicide greatly
increases our assessment that it could occur even though it could be extremely
unlikely. Because we incorrectly
inflate the probability of events due to their imaginability, we often take the
very conservative approach toward prevention even in the face of highly
unlikely events.
Finally, the availability heuristic of
category vividness also serves us well as information processors but can
inflict bias in our decision-making.
Humans tend to retrieve those categories that are most vivid. Aspects that are most memorable in their
extremeness and characteristics are those that are most easily retrieved. Information that is less exciting or
remarkable is the information that tends to be last retrieved. With respect to clinical decision-making,
this aspect of availability ensures that the past clients most likely to be
retrieved are those most disturbed, troubling, or most successful. The norm is much less frequently accessed as
it is less vivid. Also there is a
tendency to be more able to access information that is more abstract than
specific. Clinicians, for example, will
be more likely to remember that a client has relationship problems but unable
to provide the specifics of the difficulties.
So with respect to the availability heuristic, we are most prone to
retrieve information that is vivid (often defined as extreme), abstract (having
few specifics to substantiate the concept), based on our own flawed sample of
clients, and similar to our current mood.
The final major heuristic noted by
Tversky and Kahneman (1974) is that of anchoring, which refers to our
tendency to let initial information and impressions determine subsequent
decision-making. Even when presented
with very different information, humans do not tend to adjust their decisions
much away from the initial starting point, or anchor. For example, if one receives early information from the client or
other sources that the diagnosis of borderline may be appropriate, less pathological
diagnoses may never be examined. Clinicians tend to make decisions rapidly and maintain these
decisions over time (Gauron & Dickinson, 1966; Meehl, 1960; Rubin &
Shontz, 1960). Providing more information
to clinicians does not help alter incorrect clinical decisions (Clavelle &
Turner, 1980) or lead to better decisions, although it does lead to the false
impression that one has made a better decision (Oskamp, 1965, 1982).
The heuristics of representativeness,
availability, and anchoring are important aids in human decision-making in that
they allow for efficient processing of information. Each heuristic, however, carries with it a bias and this bias can
affect the quality of decisions made.
It is these decision-making aids that help account for the relative
superiority of actuarial methods of aggregating information over clinical
methods. Clinicians rely too much on
memory and their own idiosyncratic weighting of information. Actuarial models do not rely on memory and
can be combined in a variety of straightforward manners (even by just averaging
the different scales; Dawes, 1979).
Perhaps some the biggest advantages of usage of psychological tests in
this context is the high quality information associated with the scales
(established validity) and the normative base (includes information about
base-rates) which alone will help offset some of the heuristic biases noted.
Information
Aggregation
Goldberg (1991) has noted the superiority
of actuarial combination of psychological test data is related to five
issues. First, as we have previously
discussed, one reason clinicians do not do well is that they ignore the
different validities of the predictors.
Usage of sound psychological test data with uniformly high validity
should obviate this problem. Second, it
is difficult to combine variables if they have different metrics (e.g., how
does one intuitively combine scores from two variables one with scores ranging
from 0 to 100 and another with scores from 1 to 5?). Typically, psychological test data provides information that is
normed and scored in a common metric (e.g., T scores). Third, clinicians typically are not
consistent in their application of predictions made from data; they apply
inconsistent weights to the predictors.
For example, a clinician may weight one predictor scale highly for one
case and in the next case a different scale is weighted highly. Clearly, applying a consistent manner of
combining the data would improve prediction.
Dawes (1979) has demonstrated that even simple averaging of information
or scales is superior to inconsistent clinician combination of
information. Fourth, clinicians are
insensitive to different degrees of redundancy in information. Sines (1959) demonstrated that when
clinicians seek more information they tend to add psychological tests that overlap
highly with those already included.
This usage of overlapping indices does little to increase prediction
accuracy. If added information is
sought to improve a clinical decision, instruments with little overlap to the
current measures should be used. Only
by adding non redundant information will the incremental validity (prediction
above and beyond that already obtained) improve. Finally, clinicians are relatively insensitive to regression
effects as noted above and thus need to take these into account when
interpreting psychological test information.
One other important hypothesis testing
strategy needs to be covered. Typically
all clinicians evaluate the accuracy of their interpretations and clinical
decisions in relation to client reactions to these interpretations. Using client reactions is a highly fallible
piece of feedback on the quality and accuracy of clinical decisions because
clients are not that discriminating in their acceptance. The well known "P. T. Barnum
effect" relates to this lack of critical client evaluation. Snyder, Shenkel, and Lowrey (1977) have
reviewed the extensive literature on clients and individuals willingness to
accept most any interpretation of psychological test data as accurate
descriptions of themselves. Most of
what we interpret to our clients will probably be accepted uncritically. It is thus a mistake to attach too much
accuracy to client affirmation of our interpretations.
Clinical
Training and Experience as Remedies to Poor Clinical Judgment?
Given the abundance of research on the
problems with us as information processors, how is it that we do not improve
with experience. Would not we expect to
improve our predictive skills as we gain experience and feedback on our
decisions? The research demonstrates
that, no, this is not the case. In
general, novice and expert clinicians do not differ in accuracy (Friedlander
& Phillips, 1984; Friedlander & Stockton, 1983; Goldberg, 1959; Oskamp,
1962, 1965; Taft, 1955) although reviewers of the research have differed
somewhat regarding the strength of this conclusion (e.g., Faust, 1986; Faust
& Ziskin, 1988; Garb, 1989; Rock, Bransford, Maisto, & Morey, 1987;
Wiggins, 1973). For example, one of the
more favorable reviews with respect to clinicians' accuracy was done by Garb
(1989) and he concluded that with some sorts of data (biographical, projective
tests, and neuropsychological tests) there was no difference between graduate
students and experienced clinicians, however there were differences when
biographical, WAIS and MMPI were used (which Dawes, [1994] attributes to the
quality of the information and the training in knowing how to use this specific
information).
Perhaps a major reason for the failure of
us as clinicians to learn from our experience is the hindsight bias (Wedding
& Faust, 1989) which is the tendency to falsely believe that we were able
to accurately predict an event after the event has transpired. A common term applied to this bias in the
sports pages is the "Monday morning quarterback" where we can
criticize the wisdom of certain plays or strategies that occurred the previous
day and with certainty claim that we would have done otherwise had we but had
the choice. This bias has been
documented repeatedly (Arkes, Faust, Guilmette, & Hart, 1988; Arkes,
Wortman, Saville, & Harkness, 1981; Fischhoff, 1975) and may help account
for our lack of usage of information.
This hindsight bias creates an "illusion of learning." More experienced clinicians are more
confident of their judgments than are novices, even though the judgments are no
less accurate (Einhorn, 1986; Einhorn & Hogarth, 1978; Friedlander &
Phillips, 1984; Oskamp, 1962, 1965).
Given that we as processors of
information are quite fallible, what is to be done. Should all attempts at clinical decision be abandoned in favor of
statistical models? We think that the
response to this question is a qualified no.
Clearly statistical models have more predictive accuracy than we do as
clinicians and computerized test interpretation has promise as outperforming us
in this area also (Eyde, Kowal, & Fishburne, 1991). However, clinicians do have the ability to
observe and select relevant information.
For example, Johnson (1988) and Einhorn (1986) demonstrated that
experts' strength was in the selection cues.
Regardless, clinicians need to be acutely aware of their limitations as
processors of information.
It is the
disease of not listening, the malady of not marking, that I am troubled withal
(Shakespeare, King Henry IVth, Part 2, Act 1, scene ii, line 139)
Clinical
Decision-Making Aids
Given our weaknesses as processors of
information, what can we do to minimize these biases? There are several recommendations that have been made (Arkes,
1981; Dumont, 1993; Salovey & Turk, 1991; Wierzbicki, 1993).
1.
Adopt a scientific approach to information evaluation and hypothesis
testing (Tracey, 1991). This involves
not confusing the ability to explain with the ability to predict. Clinicians should focus on making explicit
predictions and then assessing the extent to which these predictions are borne
out. This process of making predictions
forces the clinician to be explicit about assumptions and hinders the
"hindsight" bias.
2. Get quality information. Dawes (1994) has noted that clinicians
typically get poor feedback information.
So even if appropriate hypothesis testing were conducted, the quality of
information obtained provided little corrective feedback. For example rarely do clinicians obtain
information on what has transpired with their clients after termination. Frequently the only cases where feedback is
obtained are those that have not succeeded and return for treatment. Attempts should be made to obtain reliable
and valid information following termination to evaluate the accuracy of predictions
made. Also, care should be taken in
using client acceptance of test interpretations as accuracy feedback because of
the "P. T. Barnum effect."
3. Think Bayesian (Dawes, 1991). This means be aware of base-rates as they
are related to the probability of occurrence of different behaviors and the
probability of predictability of different future behaviors. Thinking Bayesian requires attention to the
full range of individuals both with and without the disorder of focus. Also thinking Bayesian means one should not
equate reverse conditional probabilities.
The ability to think Bayesian requires knowledge of simple Bayesian
probability rules but it also requires extensive knowledge of population
probabilities. Psychological tests help
provide some of the information on base-rates and predictability of behavior.
4. Consider alternative hypotheses and
engage in disconfirming hypothesis testing.
As noted, humans tend to seek confirming evidence and this strategy is
not beneficial for accuracy of decision-making. Specify disconfirming evidence and then seek this information
out.
5. Rely less on memory as this relates to
several biases in processing, especially availability.
6.
Use the best information and methods.
Clinicians need to choose high quality information. This information should include the best
psychological instruments. If the
clinician requires more information in cases where the information obtained may
not be enough, attempts should be taken to choose non redundant scales. For example, if one in interested in
assessing depression in a client and the depression scale one used is not
clear, using another similar depression scale will add little information. Also, the best methods should be used and this
means more actuarial combination of information (even straight averaging of
scales) because this aggregation is consistent. As noted, clinical impressions gleaned from interviewing can be
added to the equation, but for best prediction the clinician should not rely on
his or her own combination of information because this tends to be
idiosyncratic and inconsistent.
Valuable clinical impressions can and should be added to the prediction
information, but clinicians need to be careful not to combine the data because
they tend to use only one or two variables and not all.
7. Recognize personal biases as they
pertain to clinical decision-making, especially as they pertain to age, gender,
class, and race.
8. Be aware of the effects of
regression. Less likely states tend to
be followed by more likely states. A
depressed person will feel less depressed tomorrow.
An attempt has been made to sensitize the
reader to the many problems involved in clinical decision-making. We as clinicians and as humans are clearly
fallible decision-makers. Psychological
tests provide an avenue to improve our decision accuracy. Care, however, must still be taken in their
selection and interpretation. Faust
(1991) has written a wonderful tongue-in-cheek description of how clinicians
would be different had we heeded Meehl's recommendations back in 1954 regarding
our foibles as clinical decision-makers.
We ignored them then and continue in many ways to do so now.
References
Abromowitz, C. V., & Dokecki, P. R.
(1977). The politics of clinical judgment: Early empirical returns. Psychological
Bulletin, 84, 460-476.
Abromowitz, C. V., Murray, J.
(1983). Race effects in
psychotherapy. In J. Murray & P. R.
Abramson (Eds.), Bias in psychotherapy (pp. 215-255). New York: Academic.
Arkes, H. R. (1981). Impediments to
accurate clinical judgment and possible ways to minimize their impact. Journal of Consulting and Clinical
Psychology, 49, 323-330.
Arkes, H. R., Faust, D., Guilmette, T.
J., & Hart, K. (1988). Eliminating the hindsight bias. Journal of Applied Psychology, 73,
305-307.
Arkes, H. R., Wortmann, R. L., Saville,
P., & Harkness, A. R. (1981). The
hindsight bias among physicians weighting the likelihood of diagnosis. Journal of Applied Psychology, 66,
252-254.
Broverman, I. K., Broverman, D. M.,
Clarkson, F. E., Rosenkrantz, P. S., & Vogel, S. R. (1970). Sex-role stereotypes and clinical judgments
of mental health. Journal of
Consulting and Clinical Psychology, 34, 1-7.
Bower, G. (1981). Mood and memory. American Psychologist, 36, 129-148.
Cantor, N., Smith, E., French, R.,
Mezzich, J. (1980). Psychiatric
diagnosis as prototype categorization. Journal
of Abnormal Psychology, 89, 181-193.
Carrol, J. S., Winer, R. L., Coates, D.,
Galegher, J., & Alibrio, J. J. (1988).
Evaluation, diagnosis, and prediction in parole decision-making. Law and Society Review, 17, 199-228.
Chapman, L. J., & Chapman, J. P.
(1969). Illusory correlation as an
obstacle to the use of valid diagnostic signs.
Journal of Abnormal Psychology, 73, 193-204.
Clavelle, P. R., & Turner, A. D.
(1980). Clinical decision-making among
professionals and paraprofessionals. Journal
of Clinical Psychology, 33, 133-152.
Cohen, P. , & Cohen, J. (1984).
The clinician's illusion. Archives
of General Psychiatry, 41, 1178-1182.
Dahlstrom, W. G. (1993). Tests: Small samples, large consequences. American Psychologist, 48, 393-399.
Davidson, C. V., & Abromowitz, S. I.
(1980). Sex bias in clinical judgment:
Later empirical returns. Psychology
of Women Quarterly, 4, 377-395.
Dawes, R. M. (1979). The robust beauty of improper linear models
in decision making. American
Psychologist, 34, 571-582.
Dawes, R. M. (1986). Representative thinking in clinical
judgment. Clinical Psychology
Review, 6, 425-441.
Dawes, R. M. (1988). Rational choice in an uncertain world.
San Diego: Harcourt, Brace, Jovanovich.
Dawes, R. M. (1991). Probabalistic versus causal thinking. In D. Cicchetti & W. M. Grove (Eds.), Thinking
clearly about psychology: Vol. 1. Matters of public interest (pp.
185-216). Minneapolis: University of Minnesota Press.
Dawes, R. M. (1994). House of cards: Psychology and psychiatry built on myth. New York: Free Press.
Dawes, R. M., Faust, D., & Meehl. P.
E. (1989). Clinical and actuarial judgment.
Science, 243, 1668-1674.
DeVaul, R. A., Jersey, F., Chappell, J.
A., Carver, P., Short, B., & O'Keefe, (1957). Medical school performance of initially rejected students. Journal of the American Medical
Association, 257, 47-51.
Dumont, F. (1991). Expertise in psychotherapy: Inherent liabilities of becoming
experienced. Psychotherapy, 28,
422-428.
Dumont, F. (1993). Inferential heuristics in clinical problem
formulation: Selective review of their
strengths and weaknesses. Professional
Psychology: Research and Practice, 24, 196-205.
Dumont, F., & Lecomte, C.
(1987). Inferential processes in
clinical work: Inquiry into logical
errors that affect diagnostic judgments.
Professional Psychology: Research and Practice, 18, 433-438.
Einhorn, H. J. (1979). Expert measurement and mechanical
combination. Organizational Behavior
and Human Performance, 13, 171-192.
Einhorn, H. J. (1986). Accepting error to make less error. Journal of Personality Assessment, 50,
387-395.
Einhorn, H. J. & Hogarth, R. M.
(1978). Confidence in judgment: Persistence of the illusion of
validity. Psychological Review, 85,
395-416.
Eyde, L. D., Kowel, D. M., &
Fishburn, Fishburne, F. J. (1991). The
validity of computer-based test interpretations of the MMPI. In T. B. Gutkin & S. L. Wise (Eds.), Buros-Nebraska
Symposium on Measurement and Testing:
The computer and the decision-making process (Vol. 4, pp.
75-124). Hillsdale, NJ: Erlbaum.
Faust, D. (1986). Research on human judgment and its
application to clinical practice. Professional Psychology: Research and Practice, 17, 420-430.
Faust, D. (1991). What if we had really listened? Present reflections on altered pasts. In D. Cicchetti & W. M. Grove (Eds.), Thinking
clearly about psychology: Vol. 1. Matters of public interest (pp.
185-216). Minneapolis: University of Minnesota Press.
Faust, D., & Ziskin, J. (1988). The expert witness in psychology and
psychiatry. Science, 241, 31-35.
Fischoff, B. (1975). Hindsight = foresight: The effect of outcome
knowledge on judgment under uncertainty.
Journal of Experimental Psychology:
Human Perception and Performance, 1, 288-299.
Forgas, Affective influences on
individual and group judgments. European
Journal of Social Psychology, 20, 441-453.
Friedlander, M, L., & Phillips, S. D.
(1984). Preventing anchoring errors in
clinical judgment. Journal of
Consulting and Clinical Psychology, 52, 366-371.
Friedlander, M, L., & Stockton, S. J.
(1983). Anchoring and publicity effects
in clinical judgment. Journal of
Clinical Psychology, 39, 637-643.
Garb, H. N. (1989). Clinical judgment, clinical training, and
professional experience. Psychological
Bulletin, 105, 387-396.
Gauron, E. G., & Dickinson, J. K. (1966). Diagnostic decision-making in psychiatry 2:
Diagnostic styles. Archives of
General Psychiatry, 14, 233-237.
Gilovich, T. (1991). How we know what isn't so: The fallibility of human reason in everyday
life. New York: Free Press.
Gilligan, S. G., & Bower, G. H.
(1984). Cognitive consequences of
emotional arousal. In C. Izard, J.
Kagan, & R. Zajonc (Eds.), Emotions, cognition, and behavior (pp.
547-588). New York: Cambridge.
Goldberg, L. R. (1959). The effectiveness of clinician's
judgments: The diagnosis of organic
brain damage from the Bender-Gestalt. Journal
of Consulting Psychology, 23, 25-33.
Goldberg, L. R. (1965). Diagnostician versus diagnostic signs: The diagnosis of psychosis versus neurosis
from the MMPI. Psychological
Monograph, 79.
Goldberg, L. R. (1968). Simple models or simple processes? Some research on clinical judgment. American Psychologist, 23, 483-496.
Goldberg, L. R. (1970). Man versus model of man: A rationale plus some evidence for a method
of improving on clinical inferences. Psychological
Bulletin, 73, 422-432.
Goldberg, L. R. (1986). Some informal explorations and ruminations
about graphology. In B. Nevo (Ed.), Scientific
aspects of graphology (pp. 281-293).
Springfield, IL: Charles C. Thomas.
Goldberg, L. R. (1991). Human mind versus regression equation: Five contrasts. In D. Cicchetti & W. M. Grove (Eds.), Thinking clearly
about psychology; Vol. 1: Matters of public interest (pp. 173-184). Minneapolis: University of Minnesota Press.
Granberg, D., & Brent, E.
(1983). When prophecy bends: The
preference-expectation link in U.S. presidential elections, 1952-1980. Journal of Personality and Social
Psychology, 45, 477-491.
Greenwald, A. G., Pratkanis, A. R.,
Leippe, M. R., & Baumgardner, M. H. (1986). Under what conditions does theory obstruct research
progress? Psychological Review, 93,
216-229.
Haverkamp, B. E. (1993). Confirmatory bias in hypothesis testing for
client-identified and counselor self-generated hypotheses. Journal of Counseling Psychology, 40,
303-315.
Holt, R. R. (1958). Clinical and statistical prediction: A reformulation and some new data. Journal of Abnormal and Social
Psychology, 56, 1-12.
Holt, R. R. (1970). Yet another look at clinical and statistical
prediction: Or is clinical psychology
worthwhile? American Psychologist,
25, 337-349.
Holt, R. R. (1978). A historical survey of the
clinical-statistical controversy. In R.
R. Holt (Ed.), Methods in clinical psychology: Vol. 2. Prediction and research (pp. 3-18). New York: Plenum.
Holt, R. R. (1986). Clinical and statistical prediction: A retrospective and would be integrative
perspective. Journal of Personality
Assessment, 50, 376-386.
Holt, R. R. (1991). Judgment, inference, and reasoning in
clinical perspective. In D. C. Turk
& P. Salovey, (Eds.), Reasoning inference and judgment in clinical
psychology (pp. 233-250). New
York: Free Press.
Isen, A. M. (1984). Toward understanding the role of affect in
cognition. In R. S. Wyer & T. K.
Srull (Eds.), Handbook of social cognition (Vol. 3, pp. 179-236). Hillsdale, NJ: Erlbaum.
Howard, K. I., Davidson, C. V.,
O'Mahoney, M. T., Orlinsky, D. E., & Brown, K. P. (1989). Patterns of psychotherapy utilization. American Journal of Psychiatry, 146,
775-778.
Johnson, E. J. (1988). Expertise and decision under
uncertainty: Performance and
process. In M. T. H. Chi, R. Glaser,
and M. J. Farr (Eds.), The nature of expertise (pp. 209-228).Hillsdale,
NJ: Erlbaum.
Kahneman, D., & Tversky, A.
(1982). Intuitive prediction: Biases
and corrective procedures. In D.
Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under
uncertainty: Heuristics and biases
(pp. 414-421). New York: Cambridge.
Langer, E. J., & Abelson, R. P.
(1974). A patient by any other name...:
Clinical group differences in labeling bias.
Journal of Consulting and Clinical Psychology, 42, 4-9.
Libby, R. (1976). Man versus model
of man: Some conflicting evidence. Organizational Behavior and Human
Performance, 16, 1-12.
Lopez, S. R. (1989). Patient variable biases in clinical
judgment: Conceptual overview and
methodological considerations. Psychological
Bulletin, 106, 184-203.
Meehl, P. E. (1954). Clinical versus statistical
prediction: A theoretical analysis and
a review of the evidence.
Minneapolis: University of
Minnesota Press.
Meehl, P. E. (1957). When shall we use our heads instead of the
formula? Journal of Consulting and
Clinical Psychology, 4, 268-273.
Meehl, P. E. (1959).
Meehl, P. E. (1960). The cognitive activity of the
clinician. American Psychologist, 15,
19-27.
Meehl, P. E. (1973). Why I do not attend case conferences. In P. E. Meehl, Psychodiagnosis : Selected papers (pp. 225-302). Minneapolis, MN: University of Minnesota
Press.
Meehl, P. E. (1986). Causes and effects of my disturbing little
book. Journal of Personality
Assessment, 50, 370-375.
Milstein, R. M., Wilkinson, L., Burrow,
G. N., & Kessen, W. (1981). Admission
decisions and performance during medical school. Journal of Medical Education, 56, 77-82.
Nisbett, R. E., & Ross, L.
(1980). Human inference: Strategies and shortcomings of social
judgment. New York: Prentice-Hall.
Oskamp,
S.(1962). The relationship of
clinical experience and training methods to several criteria of clinical
prediction. Psychological
Monographs: General and Applied, 76
(No. 547), 1-28.
Oskamp, S. (1965). Overconfidence in case-study judgments. Journal of Consulting Psychology, 29,
261-265.
Oskamp, S. (1982). Overconfidence in case-study judgments. In D. Kahneman, P. Slovic, and A. Tversky
(Eds.), Judgment under uncertainty:
Heuristics and biases (pp. 287-293). New York: Cambridge.
Rock, D. L., Bransford, J. D., Maisto, S.
A., & Morey, L. (1987). The study
of clinical judgment: An ecological
approach. Clinical Psychology
Review, 7, 645-661.
Rubin, M., & Shontz, F. C.
(1960). Diagnostic prototypes and
diagnostic processes of clinical psychologists. Journal of Consulting Psychology, 24, 234-239.
Rosehan, D. L. (1973). On being sane in insane places. Science, 179, 250-258.
Salovey, P. & Turk, D. C. (1991). Clinical judgment and decision-making. In C. R. Snyder & D. R. Forsyth (Eds.), Handbook of social
and clinical psychology: The health
perspective (pp. 416-437). New
York: Pergamon.
Sawyer, J. (1966). Measurement and
prediction, clinical and statistical. Psychological
Bulletin, 66, 178-200.
Sears, D. O., & Whitley, R. E.
(1973). Political persuasion. In I. deS. Pool, W. Schramm, F. W. Frey, N.
Maccoby, & E. B. Parker (Eds.), Handbook of communication (pp.
253-289). Chicago: Rand-McNally.
Sines,
L. K. (1959). The relative
contribution of four kinds of data to accuracy in personality assessment. Journal of Consulting Psychology, 23,
483-492.
Snyder, C. R., Shenkel, R. J., &
Lowrey, C. R. (1977). Acceptance of
personality interpretations: The
"Barnum effect" and beyond. Journal
of Consulting and Clinical Psychology, 45, 104-114.
Snyder, M., & Campbell, B. (1980). Testing hypotheses about other people: The role of the hypothesis. Personality and Social Psychology
Bulletin, 6, 421-426.
Snyder, M., & Cantor, N. (1979). Testing hypotheses about other people: The use of historical knowledge. Journal of Experimental Social
Psychology, 15, 330-342.
Strohmer, D. C., Shivy, V. A., &
Chiodo, A. L. (1990). Information
processing strategies in counselor hypothesis testing: The role of selective memory and expectancy. Journal of Counseling Psychology, 37,
465-472.
Taft, R.
(1955). The ability to judge
people. Psychological Bulletin, 52,
1-23.
Temerlin, M. K. (1968). Suggestion effects in psychiatric
diagnosis. Journal of Nervous and
Mental Disease, 147, 349-353.
Tracey, T. J. (1991). Counseling research
as an applied science. In C. E. Watkins
and L. S. Schneider (Eds.)., Research in counseling (pp. 1-31). Hillsdale, NJ: Erlbaum.
Turk, D. C., & Salovey, P.
(1985). Cognitive structures, cognitive
processes, and cognitive-behavior modification: II. Judgments and inferences of
the clinician. Cognitive Therapy and
Research, 9, 19-33.
Tversky, A., & Kahneman, D.
(1974). Judgment under uncertainty:
Heuristics and biases. Science, 185,
1124-1131.
Tversky, A., & Kahneman, D.
(1981). The framing of decisions and
the psychology of choice. Science,
21, 453-458.
Wedding, D., & Faust, D. (1989). Clinical judgment and decision making in
neuropsychology. Archives of
Clinical Neuropsychology, 4, 233-265.
Whitley, B. E. (1979). Sex roles and psychotherapy: A current appraisal. Psychological Bulletin, 86,
1309-1321.
Wierzbicki, M. (1993). Issues in Clinical Psychology: Subjective versus objective approaches. Boston:
Allyn & Bacon.
Wiggins, J. (1973). Personality and prediction: Principles of personality assessment. Reading, MA: Addison-Wesley.
Wiggins, J. (1981). Clinical and statistical prediction: Where are we and where do we go from
here? Clinical Psychology Review, 1,
3-18.
Zedlow, P. B. (1978). Sex differences in psychiatric evaluation
and treatment: An empirical
review. Archives of General
Psychiatry, 35, 89-93.
Zubin, J. (1956). Clinical versus actuarial prediction: A pseudo-problem. In Proceedings, 1955 invitational conference on testing
problems (pp. 107-128). Princeton,
NJ: Educational Testing Service.