Overall Assessment of the Speaker’s Experience of Stuttering (OASES): Documenting multiple outcomes in stuttering treatment

October 28, 2020 - Dr. J. Scott Yaruss, Robert W. Quesal

In recent years, there has been a growing discussion about the importance of evidence-based
practice (EBP) in the field of speech–language pathology (e.g., ASHA, 2005). Although the
changes associated with implementing EBP affect many aspects of the field, the need for clinicians
to document the results of their intervention and to select treatment approaches based on a
meaningful body of literature has been particularly apparent in the field of fluency disorders (e.g.,
Blood, 1993; Blood&Conture, 1998; Bothe, 2003, 2004; Conture, 1996; Conture&Guitar, 1993;
Cordes & Ingham, 1998; Finn, 2003; Ingham, 2003; Ingham & Riley, 1998; Langevin & Kully,
2003; Onslow, 2003; Power, 2002; St. Louis &Westbrook, 1987; Starkweather, 1993; Thomas &
Howell, 2001; Yaruss, 1998a, 1998b, 2001, 2004; Yaruss & Quesal, 2004a, 2004b).

One of the central tenets of an evidence-based approach to clinical practice is the measurement
and evaluation of treatment outcomes (Frattali, 1998a; Olswang, 1998; Sackett, Straus,
Richardson, Rosenberg, & Haynes, 2000). In fluency disorders, this has been addressed through
numerous studies that have documented the effects of treatment on factors such as clinicianrated
frequency of disfluencies, naturalness of speech, and speaking rate (e.g., Ingham, 1984;
Ingham & Onslow, 1985; Onslow, Costa, Andrews, & Harrison, 1996; Schiavetti & Metz, 1997).
Results from these studies have clearly demonstrated that treatment can effectively minimize
these observable symptoms of the disorder.

Still, as many authors have noted, there is more to the stuttering disorder than just the surface
features (e.g., Cooper, 1993; Manning, 1999, 2001; Murphy, 1999; Shapiro, 1999; Sheehan, 1970;
Starkweather & Givens-Ackerman, 1998; Van Riper, 1982; Yaruss, 1998a; Yaruss & Quesal,
2004a). Perhaps the most well-known account is Sheehan’s (1970) “iceberg” analogy, which
depicted the fact that much of the speaker’s experience of the stuttering disorder occurs “under
the surface.” Manning (1999) described the many ways in which stuttering can affect an individual’s
life, while highlighting the fact that many of the changes associated with treatment occur not
only “under the surface” but also “over time.” Similarly, Murphy (1999) emphasized the strong
emotions, such as shame and guilt that develop in many people who stutter. In addition, numerous
autobiographical accounts (Bobrick, 1995; Carlisle, 1985; Jezer, 2003; Johnson, 1930) and
collected life stories of people who stutter (e.g., Ahlbach & Benson, 1994; Hood, 1998; St. Louis,
2001) have underscored the fact that the overall impact of stuttering on people’s lives involves
more than just the production of observable speech disfluencies. Indeed, even authors who have
focused their research primarily on the observable characteristics of stuttering have acknowledged
the importance of considering a speaker’s self-reports and perceptions about the disorder (Ingham
& Cordes, 1997).

The need for broad-based treatment outcomes research in stuttering

Although many authors have shown that stuttering involves more than the behaviors that can be observed on the surface, there are very few studies demonstrating the effects of treatment on these “intrinsic” (Manning, 2001) factors (see reviews in Bloodstein, 1995; Cordes, 1998).
The paucity of treatment outcomes research examining the totality of the stuttering disorder
makes it difficult for clinicians to apply a strictly evidence-based approach to selecting treatment
options (Quesal, Yaruss, & Molt, 2004; Yaruss & Quesal, 2002), for much of the treatment
that has been recommended by authorities over the years has not been subjected to empirical
research (Cordes, 1998). Thus, there is a compelling and immediate need for research on the
outcomes of treatment that address aspects of the stuttering disorder beyond the surface speech
behaviors.

There are a number of potential explanations for the relative lack of treatment outcomes
studies examining the less-observable components of stuttering. One possible explanation is the
fact that it is more difficult to define and measure the intrinsic aspects of a speaker’s experience of
stuttering (e.g., the speaker’s beliefs and feelings about stuttering, the impact of stuttering on the
speaker’s life) than it is to measure changes in speech behavior (e.g., the number of repetitions
or prolongations in a person’s speech). Nevertheless, several instruments for measuring broader
aspects of the stuttering disorder do exist (e.g., Ammons & Johnson, 1944; Andrews & Cutler,
1974; Brutten & Shoemaker, 1974; Crowe, Di Lollo, & Crowe, 2000; Erickson, 1969, Lanyon,
1967; Ornstein & Manning, 1985; Riley, Riley, & Maguire, 2004; Watson, 1988; Woolf, 1967;
Wright&Ayre, 2000). These instruments examine a wide range of factors, including the speaker’s
fluency in different speaking situations, the speaker’s confidence that he or she will be able to
maintain fluency in different situations, the emotional and cognitive reactions that speakers have
to stuttering in different speaking situations, the speaker’s opinions or attitudes about stuttering,
and other factors. Together, these instruments can give clinicians and researchers a more complete
picture of the speaker’s experience of the stuttering disorder, and the application of such tools in
the study of stuttering treatment outcomes could help to provide needed information about the
changes people experience as a result of therapy.

With a few notable exceptions (e.g., Boberg & Kully, 1994), however, such instruments have
not been widely used in stuttering treatment outcomes research. There are a number of possible
reasons for this. For example, some authors have asserted that existing “attitude scales” are simply
a reflection of a speaker’s fluency in certain situations (Ulliana & Ingham, 1984). Others have
stated that they are not convinced of the importance or measurability of emotional and cognitive
aspects of the disorder (see Ingham, 2003). Another explanation may be the difficulty associated
with administering multiple assessments to capture the range of behaviors, emotions, and consequences
associated with stuttering, for no one of the aforementioned instruments assesses the
totality of the disorder. Also, it is not always clear why certain items, factors, or constructs are
addressed in existing instruments, for not all of the instruments are based on a clearly defined
theoretical framework. Regardless of the specific explanation that is offered, the fact remains that
the literature contains numerous studies documenting reductions in stuttering that speakers can
achieve when using various methods of controlling fluency but significantly fewer studies documenting
changes speakers achieve in other, less-observable aspects of the disorder. If clinicians
and researchers wish to adhere to the principles of evidence-based practice when selecting broadbased
treatment approaches, then more comprehensive documentation of such changes is clearly
needed.

A framework for describing broad-based treatment outcomes

Although much of the necessary research has not yet been completed, documenting the broadbased
outcomes of treatment for a complex disorder such as stuttering is certainly not impossible. Indeed, most disorders (not just in speech–language pathology, but across the entire field of health
and rehabilitation science) involve far more than just the symptoms that can easily be observed,
counted, or classified. Accordingly, for the past several decades, there has been a growing emphasis
on the need to document factors such as changes in emotional reactions, functional outcomes, and
quality of life for a wide variety of disorders (see reviews in Granger & Gresham, 1984; Nagi,
1969; Pope & Tarlov, 1991). Rather than only documenting the fact that an individual may have
a certain disease or disorder, researchers and clinicians in the rehabilitation sciences have also
focused on developing ways to document the overall impact of those disorders on the individual’s
life.

As part of this effort, the World Health Organization (WHO) has developed two frameworks
for categorizing the totality of complex disorders, including not only the diagnosis (i.e., what
is wrong with the person), but also what that diagnosis means for the person’s life. The first of
these frameworks, the International Classification of Impairments, Disabilities, and Handicaps
(ICIDH; WHO, 1980, 1993), sought to describe the consequences that disorders could have on an
individual’s life. The ICIDH included three components (WHO, 1980, pp. 25–29): impairment,
or “loss or abnormality of psychological, physiological, or anatomical structure or function;”
disability, or “any restriction of lack . . . of ability to perform an activity in the manner or within
the range considered normal for a human being;” and handicap, or “a disadvantage for a given
individual, resulting from an impairment or a disability that limits or prevents the fulfillment of
a role that is normal . . . for that individual.”

Because of its emphasis on the individual’s experience of disorders, the ICIDH was widely
hailed as a framework that could be used to document treatment outcomes throughout the fields
of health and rehabilitation (e.g., Brandsma, Heerkens, & van Ravensberg, 1995; Chamie, 1990;
de Kleijn-de Vrankrijker, 1995; Halbertsma, 1995; Schuntermann, 1996; Yaruss, 1998a, 2001).
Still, there were a number of shortcomings that hindered the usefulness of the ICIDH for some
disorders (Badley, 1987; Thuriaux, 1995). For example, the original ICIDH failed to account for
differences between individuals that might exacerbate or mitigate their experience of disability or
handicap (e.g., coping mechanisms, attitudes, resources, support from the environment). Several
authors also raised concerns about the complexity of the three-tiered model, in particular highlighting
confusion surrounding the definition of the terms disability and handicap (e.g., Brandsma,
Lakerveld-Heyl, Van Ravensberg, & Heerkens, 1995).

To account for these and other issues, the WHO developed a revised framework, the International
Classification of Functioning, Disability, and Health (ICF; WHO, 2001). In the ICF, the
WHO simplified the descriptive framework to just two primary levels and expanded the system
to also address contextual factors that were omitted in the ICIDH. The ICF describes all healthrelated
experiences in terms of: (a) the structure and function of the body and (b) the activities
a person might engage in during their participation in daily life. When a person experiences
difficulties with body function or structure, they are termed impairments, and when a person
experiences difficulties with activities or participation, they are termed activity limitations or participation
restrictions. To account for individualized experiences of different people, the WHO
also added a parallel set of contextual factors to the model. These personal and environmental
factors describe the context, either within a person or surrounding the person, that could affect the
individual’s ability to function effectively. The resulting framework has the capacity to describe
all aspects of an individual’s health experience, including both normal and disordered functioning.
As such, the ICF holds considerable promise for helping clinicians and researchers consider
the wide range of changes that might occur during the course of treatment for disorders such as
stuttering.

Documenting multiple outcomes in stuttering treatment

In order to facilitate and support broad-based treatment outcomes research in stuttering, Yaruss
(1998a, 1998b, 2001) and Yaruss and Quesal (2004a, 2004b) adapted the WHO’s original ICIDH
and current ICF frameworks to the study of stuttering. Fig. 1 presents a schematized version of
the Yaruss and Quesal (2004a, 2004b) adaptation, which depicts how the stuttering disorder can
be viewed in terms of several interacting components:

• the presumed etiology or underlying cause(s) of the disorder;
• the impairment in body function, indicated by the observable characteristics of stuttering;
• the speaker’s affective, behavioral, and cognitive reactions to stuttering;
• the effects of the environment on stuttering, indicated by the difficulty in different speaking
situations and the reactions of others;
• the overall impact of stuttering on the speaker’s life, indicated by limitations in communication
activities and restrictions in participation in daily life.

By considering all of these components of the disorder, each drawn directly from the WHO’s
ICF model, clinicians can gain a greater understanding of not only the observable characteristics
of the disorder, but also the experience of stuttering from the perspective of the speaker.

Because this model describes the totality of the stuttering disorder in the context of the widely
accepted ICF framework, it provides an ideal foundation for the development of a comprehensive
measurement instrument that can be used both in daily treatment and in outcomes research. The
purpose of this manuscript is to present such an instrument, the Overall Assessment of the Speaker’s
Experience of Stuttering (OASES). To establish the value of the OASES as a tool for supporting
stuttering treatment research, the paper includes a detailed description of the development of the
OASES, along with an explanation of the testing and validation of individual assessment items,
a review of scoring procedures, and a summary of reliability and validity testing, involving more
than 300 people who stutter, that was conducted with various forms of the instrument throughout
its development.

It is important to note at the outset that the OASES was designed to supplement existing
clinician-administered measures of the stuttering impairment. Thus, the OASES can be used
alongside widely used measures such as the Stuttering Severity Instrument (Riley, 1994) or realtime
frequency counting procedures (see reviews in Conture, 2001; Yaruss, 1997, 1998c) to
provide a more complete account of the speaker’s overall experience of the stuttering disorder.
It is hoped that the availability of this type of broad-based measurement tool will facilitate the
collection of more comprehensive data about the outcomes of stuttering treatment from the perspective
of the individual who stutters and provide needed information supporting the use of
evidence-based practice throughout the field of fluency disorders.

Development and validation of the OASES

Development and validation of the OASES involved several stages, in which test items were
individually evaluated, compared to one another, and refined. At the outset of the project, several
key principles were defined in order to guide the development process and to ensure that the
resulting product would provide a useful tool for supporting treatment outcomes research.

First, it was determined that the final instrument should consist of a pencil-and-paper measure
that could be completed by people who stutter in a typical clinical setting. Thus, questions had
to be clearly and simply worded, with minimal ambiguity, yet still maintain a sufficient degree
of overlap between items and sections to ensure validity of results. Second, it was determined
that the final product should be able to be administered and scored in a reasonable period of
time to facilitate ease of use. This required that items be relatively limited in number, with a
small number of selection options, and items should be organized in such a way that clinicians
would be able to easily calculate scores without needing to refer to a complicated scoring procedure
or key. Third, it was determined that the test items should describe the experiences of
a broad cross-section of people who stutter. Thus, items that exhibited strong ceiling or floor
effects (indicating that they were relevant only for a small percentage of people who stutter)
were minimized, reworded, or eliminated in favor of more general questions that captured the
common experiences of people who stutter. Finally, and perhaps most importantly, the OASES
was designed to maintain a strong link to the WHO’s theoretical frameworks for describing
health experience. The instrument grew from an initial set of three individual tests, each examining
a separate aspect of the WHO’s framework, to a single broad-based tool designed to assess
the entire stuttering disorder. The following sections provide details about the development and
testing of the initial trial instruments, as well as the final version of the OASES, in order to
establish the reliability and validity of the instrument and support its use for treatment outcomes
research.

4.1. Initial trial instruments: SRS, FCS, and QOL-S

To facilitate the analysis of the constructs defined by the WHO, the earliest versions of the
OASES consisted of a set of three individual tests, each of which targeted a specific component of
the WHO model. Because the work was begun prior to the publication of the current ICF framework,
the original trial instruments were based on the WHO’s original ICIDH. Following Yaruss’s
(1998a) adaptation of the ICIDH, the three specific components of the stuttering disorder that were
targeted through the early trials were: (a) the speaker’s perceptions about stuttering (reactions),
(b) the speaker’s difficulties communicating in daily situations (disability and environment), and
(c) the overall impact of stuttering on the speaker’s quality of life (handicap).

4.1.1. Speaker’s reactions to stuttering (SRS)

The first instrument was designed to gather information about how speakers felt about their
speech, the actions they engaged in because of their stuttering, and their thoughts and perceptions
about their communication difficulties. Initial drafts of the SRS were developed based on a review
of existing instruments that examined people’s reactions to stuttering (e.g., Brutten & Shoemaker,
1974; Erickson, 1969; Ornstein & Manning, 1985; Woolf, 1967). Following Cooper (1993) and
others (e.g., Watson, 1988), these factors were described in terms of affective (e.g., negative
feelings and emotions such as embarrassment, anxiety, and shame), behavioral (e.g., tension and
struggle, avoidance), and cognitive (e.g., low self-esteem and negative self-evaluation) reactions.
For the affective reactions section, a preliminary list of more than 75 different emotion labels was
developed. This list was reviewed by focus groups involving more than 30 people who stutter, as
well as by more than 20 specialists in the treatment of stuttering, to determine which seemed most
relevant to people who stutter. Ultimately, a set of 20 specific “feeling” terms was selected for
further evaluation. Similar procedures were followed in the development of the other components
of the SRS draft: lists of behavioral and cognitive reactions were evaluated based on feedback from
people who stutter and stuttering specialists, then refined and compiled to create the complete
trial version of the SRS. This initial draft of the SRS contained a total of 100 items.

4.1.2. Functional communication and stuttering (FCS)

The second trial instrument addressed the disability of stuttering, as well as the impact of
the environment on the person’s speech, by examining the difficulties people experienced when
communicating in relevant situations in their lives. In some ways, this test was similar to some
previously existing “attitude inventories” in that it examined the role that different environments or
situations might play in affecting the speaker’s ability to communicate effectively. Nevertheless,
the “functional communication and stuttering” (FCS) scale differed from prior instruments in that
it did not focus primarily on situational factors and it did not seek information about speaker’s
fluency in those situations. Instead, the FCS examined how much overall difficulty the speakers
experienced in general when communicating in those situations (Frattali, Thompson, Holland,
Wohl, & Ferketic, 1995). The FCS contained 35 items examining the speaker’s challenges communicating
in three key situations: (a) talking to other people in general, (b) communicating
at work, and (c) interacting in social situations. As with the SRS, the initial lists of situations
were refined based on feedback from focus groups of people who stutter and speech–language
pathologists specializing in the diagnosis and treatment of stuttering.

4.1.3. Quality of life and stuttering (QOL-S)

The third trial instrument examined the potential disadvantages people might experience
because of stuttering (i.e., the handicap, according to the ICIDH) by assessing the impact of
stuttering on speakers’ overall quality of life. Quality of life is a broad measure that encompasses
the speaker’s satisfaction with his ability to communicate, the impact on the speaker’s sense of
well-being and satisfaction with life, and the effect of stuttering on factors such as the speaker’s
health and perceived ability to achieve goals in life. The initial items in the trial version were
based, in part, on items found in other QOL instruments (e.g., Schipper, Clinch, & Powell, 1990;
Schumaker, Anderson,&Czajkowski, 1990; Testa&Simonson, 1996; WHOQOL, 1995; see also
Frattali, 1998b; Kaplan, Anderson,&Ganiats, 1993; McEwan, 1993). These constructs were then
modified and expanded, based on input from the focus groups and specialist reviewers, so they
applied more specifically to stuttering. The draft QOL-S instrument contained 30 items.

4.2. Initial pilot studies: item analysis

The early SRS, FCS and QOL drafts were tested in a series of two pilot studies. The first pilot
study, which involved 39 participants, many of whom were personal contacts of the authors. This
initial study served primarily as an opportunity to further explore the opinions of the focus groups.
Specific issues that were examined in the first pilot study included the wording of test items, the
format and layout of the test forms, and the overall time required for test completion. Analyses
involved only basic descriptive analyses of central tendency and dispersion; more extensive revisions
were deferred until analysis of the second, more comprehensive pilot study. In the second
pilot study, the first complete versions of SRS, FCS, and QOL-S were distributed to approximately
85 people who stutter through clinical and personal contacts, and with the assistance of
the Research Committee of the National Stuttering Association. The instruments were distributed
with a brief demographic questionnaire, a respondent comment form, and the S-24 scale (Andrews
& Cutler, 1974) to provide a means of evaluating concurrent validity. A total of 71 packets were
returned (84% return rate), and these responses were used to support more detailed item analyses.

The distributions of each of the items in the trial instruments were examined individually to
ensure that they did not exhibit floor or ceiling effects, limited variability (i.e., a high proportion
of individuals providing the same answer), non-normal distributions (i.e., noticeably skewed
positively or negatively; highly leptokurtic, suggesting that most respondents provided “neutral”
answers; or highly platykurtic, suggesting that responses were nearly random or uniform across
respondents). In some cases, items were retained even if the analyses did not indicate a normal
distribution because the items addressed constructs that the focus group discussions had previously
revealed to be particularly relevant for a subset of respondents (e.g., items pertaining to feelings of
guilt or questions about talking to children). Next, pairwise correlation analyses were conducted
to ensure that items were not redundant. Items that exhibited Pearson product–moment correlation
coefficients of 0.90 or above with any other item were eliminated, reworded, or combined. At the
same time, reliability coefficients were calculated within each instrument to ensure that related
items were examining related constructs. Cronbach’s alpha ranged from 0.93 to 0.96 within each
instrument, indicating a high level of internal consistency. Finally, Pearson product–moment
correlations were calculated between total scores for the different tests to ensure that the tests
were not too similar to one another and that they were actually examining different aspects of
the speaker’s experience of the stuttering disorder (r ranged from 0.76 to 0.83). Thus, results
from these analyses demonstrate that each of the three instruments was focused on a single and
unique construct representing a specific aspect of the stuttering disorder as defined by the WHO
frameworks.

The initial pilot studies proved to be very helpful for guiding the future development of the
instruments. For example, the initial version of the SRS examined both the frequency with which
individuals experienced the various feelings and emotions that were sampled as well as the
strength of those reactions to see if there was a difference in participants’ responses to these ways
of viewing the occurrence of negative reactions. Pilot analyses revealed that respondents had
difficulty consistently differentiating between the constructs of frequency and strength, so later
versions focused only on how frequently the feelings were experienced. Also, the initial drafts
of the SRS included both “positive” and “negative” feeling terms, though item analyses revealed
that all of the “positive” terms were highly correlated with one another and strongly negatively
correlated with the “negative” feeling terms. Furthermore, their presence introduced confusion
in the scoring process so they were eliminated in later versions of the instrument. The initial pilot
analysis for the FCS revealed that respondents had difficulty interpreting questions about how much stuttering affected their ability to perform various activities. Specifically, it seemed that participants
were responding to how much they stuttered in those situations (e.g., Ulliana & Ingham,
1984), rather than how much difficulty the stuttering introduced in their ability to communicate in
those situations. As a result, all items in this section were reworded to examine overall “difficulty”
of communication in various situations. Early versions of these tests also examined different
anchoring terms and different scoring values for the anchoring terms (i.e., some items used higher
point values to indicate negative impact, while other items used lower point values to indicate
negative impact). Pilot analyses revealed that respondents were not consistent in following the
anchoring terms, and this led to contradictions in their responses. Thus, later versions used a
consistent set of anchoring terms, all with higher values indicating greater negative impact.
Finally, the early versions of the scales tested both 7-point Likert scales with a variety of different
anchoring terms, as well as 5-point versions with more consistent anchoring terms to enhance
ease-of-use, and results of the initial pilot study were used to modify the wording of the anchoring
terms. Comparison of the 7-point and 5-point pilot versions of the tests revealed that the 5-point
versions maintained a sufficient degree of variability while improving ease-of-use and enhancing
reliability. (For more information about the early development of the OASES, see Yaruss, 2001.)

4.3. Initial pilot studies: validity and reliability

The SRS, FCS, and QOL-S were designed to examine aspects of the stuttering disorder that had
not been thoroughly evaluated in standardized or criterion-referenced instruments. As a result, it
was difficult to identify an existing set of measures for establishing validity. The S-24 (Andrews
& Cutler, 1974) was selected to evaluate concurrent validity even though the SRS, FCS, and
QOL-S examine a broader set of constructs than the S-24, and it was anticipated that correlations
would be moderate in nature. Analyses yielded correlations ranging from 0.68 to 0.83 between
the three instruments (SRS, FCS, and QOL-S) and the S24, with the highest correlation exhibited
between the SRS and the S24. This was expected because the SRS contained the highest number
of items that are similar in nature to the “attitudinal” items included in the S24. Content validity
was established through responses from the focus groups, narrative responses from participants,
and input from expert reviewers as described above to ensure that specific items were relevant
for a large cross-section of people who stutter. Furthermore, the item analyses described above
helped to ensure that all items were relevant to the experiences of people who stutter.

Finally, a preliminary assessment of test–retest reliability for these trial instruments was also
conducted to ensure the stability of responses. Five individuals completed the SRS, FCS, QOL-S,
and S24 on two occasions, separated by approximately 2 weeks. Responses were compared based
on mean difference scores, correlations, and t-tests. Analyses confirmed many of the findings
from the item analyses, specifically that respondents had some difficulty with the wording of
certain items on the SRS and FCS scales, though reliability for most of the items was judged
to be quite high (point-to-point agreement greater than 80%; responses within ±1 for 97% of
responses). Those few items that contributed to lower test–retest reliability scores were eliminated
or reworded, and further reliability tests were conducted to ensure the overall reliability of the
final instrument (see below).

4.4. First integrated instrument: CASES

As a result of these pilot analyses, redundant or confusing items were eliminated or reworded.
In some cases, entire sections were modified to ensure that the subsections were more tightly integrated. Changes were also made to accommodate the revised structure and purpose of the
WHO’s ICF, which had been published during the completion of the pilot analyses described in
the previous section. Terminology from the ICIDHwas replaced by the ICF terminology, and some
items were further refined to ensure consistency with the language of the new framework (see
Yaruss & Quesal, 2004a). To facilitate further analyses, the resulting instruments were combined
into a single tool encompassing all three of the previously described instruments. In addition, a
new general information section was added in order to gather information about the speaker’s
self-perceptions and self-ratings of fluency (i.e., the impairment in body function as defined in
the ICF), as well as speech naturalness, ease of communication, knowledge about stuttering,
and general opinions and attitudes about stuttering. This addition allowed the tool to serve as a
“Comprehensive Assessment of the Speaker’s Experience of Stuttering” or CASES. It is worth
noting that there was some expected overlap between the general concepts considered in this new
general information section and the specific topics examined in more detail in other components
of the instrument, though this was done intentionally to help clinicians and researchers evaluate
the reliability of the respondents’ answers.

The trial version of the CASES contained a total of 100 items on 5-point Likert scales, organized
into four sections: General Information, Reactions to Stuttering, Communication in Daily
Situations, and Quality of Life. Together, these four sections maintained a tight integration with the
WHO’s ICF: the general information section addressed the speaker’s experience of impairment
in body function, the reactions section addressed personal contextual factors, the communication
abilities section addressed both in daily activities and the impact of environmental contextual
factors, and the quality of life section addressed restrictions in the person’s ability to participate
in life.

4.5. Additional pilot analyses and final revisions

To examine the reliability and validity of the integrated CASES instrument, substantial additional
pilot analyses were completed using a larger sample of respondents than the earlier pilot
studies described above. Every attempt was made to obtain a respondent group that would represent
a broad and varied cross-section of people who stutter. Unfortunately, however, it is difficult,
if not impossible, to obtain a truly random sample of people who stutter, for respondents in any
such study must to be identified either through clinical contacts or through some other means that
would require self-selection.

Accordingly, participants for this pilot study were drawn, in part, from a mailing list of 500
individuals provided by the National Stuttering Association (NSA). It could be argued that selecting
a portion of the study population from the NSA mailing list might introduce some bias
into the findings. Still, a more careful consideration reveals that it is unlikely that a consistent
bias would be introduced through the use of this mailing list as a starting place for identifying
study participants. Research has shown that individuals on the NSA mailing list come from
many different backgrounds (McClure & Yaruss, 2003) and have widely varying opinions about
stuttering (Yaruss, Quesal, & Murphy, 2002). Individuals on the NSA mailing list also report
having vastly different experiences, both in and out of therapy (Yaruss, Quesal, & Reeves et
al., 2002). Furthermore, a recent survey of the NSA membership (McClure & Yaruss, 2003)
revealed that most of the individuals on the NSA mailing list are not actually members of a
local support group, and that the majority are not active participants in NSA events. Indeed,
based on available demographic information for this study, nearly 60% of the respondents identified
through the NSA mailing list were not active participants in the organization. To further minimize the likelihood that the sample would be affected by selection biases, the respondent
population also included other individuals who stutter not affiliated in any way with the NSA.
These individuals were recruited from around the country through professional and clinical contacts.
In all, more than 550 people who stutter received the trial CASES as part of this third pilot
study.

Ultimately, 183 forms were obtained for data analysis. Of these, six were excluded because
respondents were less than 18 years of age, and four were excluded because the forms were not
completed at all or because only a few items were marked. (If a respondent did not complete at
least one of the four sections in its entirety that form was discarded.) As a result, data analyses
were based upon responses from 173 adults who stutter (ages 18–70 years) who completed and
returned the draft CASES forms. Although this represents a relatively low overall response rate
(approximately 30%), it was determined that the total number of responses was still sufficient to
permit descriptive analyses of the results for individual test items, as well as to allow adequate
power in the analyses of potential relationships between test sections.

Prior to the data analyses, each of the forms was examined individually to ensure that the form
was completed correctly. Occasionally, respondents skipped entire sections of the test, so those
sections were excluded (N= 6 sections out of a possible 692 sections [173 respondents×4 sections
per form] or 0.87% of the total number of sections). In addition, some respondents provided the
same answer to all of the items within one of the four major sections of the instrument (e.g.,
answering all 30 of the items relating to quality of life with a 1 or a 5). Although this could
reflect the true nature of the individual’s responses (e.g., for a person who perceives absolutely no
negative impact of stuttering or who experiences no difficulty in communicating in any speaking
situations), it was determined that such sections should be discarded—only for the purposes of
assessing the reliability of the test items—because of the possibility that the respondent was
simply not paying careful attention to completing that section of the form. A total of 12 sections
out of a possible 692 sections (1.7% of the total number of sections) were discarded for this
reason. Note that the entire form was not discarded; only the section with questionable responses
was excluded from analysis. In total, 18 individual sections (2.6% of all sections) across 14
respondents (8.1% of all respondents) had at least one section of their forms discarded because
they did not complete a section or because they provided the same answer to all items with the
section.

As with the initial draft versions, detailed individual item analyses were conducted to confirm
that the items in the CASES did not exhibit floor or ceiling effects, limited variability, or nonnormal
distributions. Note that for the vast majority of the items, this was the third time such data
were analyzed, for the items had already been studied in the first two pilot studies of the SRS,
FCS, and QOL. As a result, it was anticipated that most items would show the desired statistical
properties. All but 4 of the 100 items exhibited ranges from the minimum possible score of 1 to
the maximum possible score of 5, with a mean score across items ranging from 1.7 to 3.5 (S.D.
ranging from 0.75 to 1.6). Skewness values ranged from −0.5 to 1.5 (mean = 0.3; S.D. = 0.4) and
kurtosis ranged from−1.4 to 1.7 (mean =−0.6, S.D. = 0.5). None of the items exhibited unusually
high or low values of kurtosis. Items that exhibited skewness values greater than 1.0 (N= 5 of the
100 items) were examined individually to identify the reason the distribution was non-normal. It
was determined that the non-normal distributions were expected for each of these five specific
items because they targeted factors that were relevant for a subset of respondents, but not for all
respondents (e.g., the difficulty experienced in educational settings, difficulties communicating
with children, etc.). Thus, these particular items were retained in the instrument even though
responses were not as evenly distributed as other items.

Pairwise correlations were calculated among all items to ensure that items were not redundant.
As before, a maximum pairwise correlation for items within a given section of the CASES was
set at 0.90, and Pearson product–moment correlation coefficients ranged from 0.01 to 0.89), so
no further revisions to the items were necessary for this reason. Also, as before, correlations
were calculated for total scores between the four sections of the CASES to ensure that the different
sections were not too similar. Pearson r values ranged from 0.66 to 0.85, with the highest
correlation observed between total scores on reactions and quality of life sections. Again, this
finding confirmed that the individual sections of the instrument were not simply evaluating the
same constructs. Finally, Cronbach’s alpha coefficient, calculated independently for each of the
four sections of the instrument, revealed strong internal reliability (α ranged from 0.92 to 0.97)
and a high degree of probability that the items within each section were addressing the same
constructs.

These analyses confirmed that all of the test items exhibited appropriate reliability and validity
to support the use of the instrument in both clinical and research applications. Again, however,
based on the analyses, as well as feedback from expert reviewers and respondents, a few
very minor wording changes were made to specific items on the instrument. The resulting final
version of the revised instrument was renamed the Overall Assessment of the Speaker’s Experience
of Stuttering (OASES). The next section describes in more detail the final format of the
OASES instrument, the scoring procedures that were developed to allow the use of the instrument
in treatment outcomes research, and the final reliability and validity analyses that were
completed.

The Overall Assessment of the Speaker’s Experience of Stuttering

The final version of the OASES consists of 100 items, each scored on a Likert scale ranging
from 1 to 5. The instrument, which requires approximately 20 min to complete, is organized into
four sections: (a) General Information, (b) Reactions to Stuttering, (c) Communication in Daily
Situations, and (d) Quality of Life. Section I (General information) contains 20 items pertaining
to speakers’ perceived fluency and speech naturalness, knowledge about stuttering and stuttering
therapy, and overall perceptions about stuttering in general. Section II (Reactions) contains 30
items examining speakers’ affective, behavioral, and cognitive reactions. Section III (Communication
in daily situations) contains 25 items assessing the degree of difficulty speakers have when
communicating in general situations, at work, in social situations, and at home. Note that these
items specifically examine the communication difficulty speakers experience in these situations,
not their fluency in the situations. Section IV (Quality of life) contains 25 items about how much
stuttering interferes with speakers’ satisfaction with their ability to communicate, their relationships,
their ability to participate in their lives, and their overall sense of well-being. A sample
completed OASES, with actual data from one of the respondents in the validity testing sample, is
shown in Appendix A.

5.1. Scoring procedures

For each item on the OASES, response scales are organized so that higher scores indicate
a greater degree of negative impact associated with stuttering and lower scores indicate less
negative impact. Although this organization could possibly introduce some responder bias, it was
determined that this was necessary to maintain ease of scoring for the practitioner using this test
in a typical clinical setting.

Because not all of the items on the OASES apply to all individuals who stutter (e.g., not all
respondents will have children), respondents must skip those items that are not relevant for them.
As a result, scoring could not simply be based on a sum of the number of points in each section.
(Note that in the example in AppendixA, the respondent did not complete all items for Parts III and
IV.) To ensure that skipped itemswould not affect total scoring, two scoring rules were established.
First, it was determined that an individual section should only be scored if the respondent has
completed at least one-half of the items in that section. Second, a straightforward scoring procedure
was developed based on the calculation of a ratio of the total number of points in the respondents’
answers divided by the total number of points possible for the items that were completed.
Calculating this “impact score” involves three steps: First, the clinician calculates the number
of points in the respondent’s answers on a section-by-section basis. Second, the clinician counts
the total number of items the respondent completed on each section and multiplies the values by
5 (since each item is based on a 5-point scale) to obtain the number of possible points in each
section. Third, the clinician divides the number of points in the respondent’s scores by the number
of possible points. This value is then multiplied by 100. For example, if a respondent answered
18 out of the 20 items in Section I, and his total number of points for Section I was 67, then the
maximum possible score for Section I would be 18×5 = 90, and his ratio for Section I would be
67/90 = 0.744. When this value is multiplied by 100, the result is an impact score of 74.4. Because
of this scoring procedure, all impact scores range from a minimum score of 20 (if the speaker
answers 1 for every item within a section) up to a maximum of 100 (if the speaker answers 5
for every item within the section). This is true regardless of the number of items a speaker might
skip and regardless of whether the clinician is scoring just one section at a time or the entire
instrument. To facilitate the calculation of impact scores, a scoring summary sheet is provided as
the last page of the OASES.

5.2. Impact ratings

Many clinicians and researchers are accustomed to providing severity ratings as part of their
description of clients or study participants. Data from theOASESdo not yield an index of stuttering
severity, per se. Rather, scores provide an indication of the impact of stuttering on various aspects
of the speaker’s life. In order to develop an “impact rating” thatwould be analogous to a traditional
severity rating, analyses were conducted on the pilot data to determine if scores could be divided
into logical groups representing differing degrees of stuttering impact (mild, mild-to-moderate,
moderate, moderate-to-severe, severe).

Based on the distribution of the pilot data across participants, these groupings were defined
based a the degree of variability around the mean, specifically: more than 1.5S.D. below the mean
(mild impact), between 1.5S.D. and 0.5S.D. below the mean (mild-to-moderate impact), between
0.5S.D. below the mean and 0.5S.D. above the mean (moderate impact), 0.5S.D. to 1.5S.D. above
the mean (moderate-to-severe impact), and more than 1.5S.D. above the mean (severe impact).
Separating the data in this fashion resulted in a relatively normal distribution of impact ratings for
the pilot data, with the “mild” and “severe” impact ratings each accounting for 8% of the overall
distribution of the respondents’ data, the “mild-to-moderate” and “moderate-to-severe” impact
ratings each accounting for 26% of the data, and the “moderate” impact rating accounting for
32% of the data. Note that some rounding was used in the calculation of specific cut-off values
of the impact scores to facilitate scoring and to account for the fact that the distributions of the
impact scores were not exactly the same for each of the four sections of the instrument. These
impact ratings and the corresponding impact scores are shown in Table 1 and in the scoring form included in Appendix A.

Table 1 OASES impact ratings and cut-off scoresImpact rating Impact scores
Mild 20.0–29.9
Mild-to-moderate 30.0–44.9
Moderate 45.0–59.9
Moderate-to-severe 60.0–74.9
Severe 75.0–100

Also, to facilitate scoring, the same impact score cut-off values were
used for each individual section of the OASES, as well as for the overall impact score provided
for the entire instrument. (As shown in Appendix A, this overall score is obtained by combining
scores across all four components of the instrument, with each of the different sections assigned
equal weighting.)

Although the impact ratings may be viewed as a type of severity index, it is still important to
consider all of the responses to the individual items in the instrument rather than focusing just
on the label provided by the impact rating. This is true both for treatment planning and for the
evaluation of treatment outcomes. For this reason, impact scores are calculated separately for each
of the four sections of the instrument, as well as for the instrument as a whole. If used cautiously,
these impact ratings can provide a means of communicating general information about a speaker’s
experience of stuttering to others.

5.3. Final evaluation of reliability

After the wording of all items was finalized and the procedure for calculating impact scores
and ratings was developed, the OASES was subjected to one final round of reliability tests. Recall
that very few changes were made to the CASES instrument in the creation of the OASES, so
these final reliability analyses were conducted simply to confirm the findings of all of the pilot
studies conducted on the earlier drafts and to assess test–retest reliability of the impact scores.
The OASES was distributed to 20 adults who stutter through professional and clinical contacts.
Fourteen respondents (70% response rate; mean age of respondents = 45.4 years; S.D. = 9.26 years;
range = 22–65 years) completed the instrument on two occasions separated by 10–14 days, with
no intervening therapy during the retest period.

Based on this sample, test–retest reliability was examined in several ways. First, point-topoint
agreement was assessed individually for each item on the instrument. Note that it was
expected that not all of the responses would be identical across administrations, for the difference
between a score of a 4 or a 5 on a particular item could be affected by a variety of factors (e.g.,
the individual’s experience during the time between the first and second testing, the individual’s
“mood” or attitude on the day of testing, etc.). Thus, the more important question for the test–retest
analysis was whether the variability between test administrations would yield differences that
could affect the overall results provided by the instrument (e.g., the impact scores or impact
ratings).

Analyses revealed that participants’ scores on individual items were identical for 77.7% of
all of the 1399 responses provided by the 14 participants and within ±1 for 98.5% of all
responses. Thus, the vast majority of responses indicated strong consistency from one test administration to the next. These results were confirmed by findings of very small mean differences
for individual responses across administrations for each of the four parts of the OASES
(mean differences for each part of the OASES across all 14 participants ranged from 0.21 to
0.28; standard error ranged from 0.05 to 0.07). Next, impact scores were compared for each
of the four parts of the instrument, as well as for the overall instrument. Analyses revealed
a high degree of test–retest reliability for impact scores, with mean differences ranging from
2.1 to 3.0 (standard error ranged from 1.98 to 2.65). Pearson product–moment correlations for
impact scores obtained from the first and second administration of the instrument ranged from
0.90 to 0.97 for each of the 14 respondents. Finally, impact ratings were compared. Because
of the high degree of reliability for the impact scores, comparison of the impact ratings also
revealed strong reliability between initial and follow-up administrations of the OASES. None
of the impact ratings for the 14 participants changed from the first to second administration of
the instrument. Thus, it is clear from the detailed description of the development and validation
of the instrument described above that the final version of the OASES exhibits strong reliability
and validity that are sufficient to support its use in the evaluation of stuttering treatment
outcomes.

6. Discussion

The purpose of this paper has been to present a new instrument for measuring the overall
impact of stuttering through assessment of multiple aspects of the disorder. The Overall
Assessment of the Speaker’s Experience of Stuttering (OASES) seeks to accomplish this goal
by focusing on the speaker’s experience of stuttering, as defined, in part, by the WHO’s ICF
framework. Specific factors that are addressed include: the speaker’s self-perception of fluency,
stuttering, and speech naturalness, as well as the speaker’s knowledge about the disorder and
overall attitudes about stuttering (Section I); the speaker’s affective, behavioral, and cognitive
reactions to the disorder (Section II); the functional communication difficulties experienced by
the speaker in different communication environments (Section III); and the speaker’s judgment
of how stuttering affects overall quality of life (Section IV). As such, the OASES is
designed to supplement commonly used clinician-based measures of speech fluency and naturalness
in order to describe the experience of the stuttering disorder from the perspective of the
speaker.

As noted above, numerous other instruments have been developed over the years to assess
various aspects of the stuttering disorder. Each of these instruments has unique strengths and
specific areas of focus. For example, the Speech Situation Checklist (SSC; Brutten & Shoemaker,
1974) provides information about a client’s speech-related anxiety in different speaking situations.
The Self-Efficacy Scale for Adults Who Stutter (SESAS; Ornstein & Manning, 1985) provides
information about a speaker’s confidence that he or she will be able to enter and maintain fluency in
different situations. Broadly defined communication “attitudes” are examined Erickson’s S-Scale
(Erickson, 1969) and the S-24 (Andrews & Cutler, 1974), and more specific affective, behavioral,
and cognitive aspects of stuttering are assessed in the Inventory of Communication Attitudes (ICA;
Watson, 1988). More recently, Riley et al. (2004) presented the Subjective Stuttering Screening of
Stuttering (SSS), which examines the speaker’s self-rated stuttering severity, internal or external
locus of control, and avoidance of words or situations. Finally, one of the most notable recent
additions to the list of instruments designed to assess various aspects of the stuttering disorder is the
Wright-Ayer Stuttering Self Rating Profile (WASSP; Wright & Ayre, 2000). Like the OASES, the
WASSP is based on the WHO’s ICF framework, so it seeks to describe the entire disorder, though not with the same degree of detail as that seen in the OASES. Specifically, the WASSP contains
a total of 24 items that assess the speaker’s perceptions of stuttering behaviors, negative thoughts
and feelings about stuttering, avoidance of speaking situations, and “disadvantage” experienced
because of stuttering.

It is the present authors’ opinion that the use of all of these instruments (as well as others that
have not been listed here) would significantly enhance clinicians’ and researchers’ understanding
of how the disorder affects individuals who stutter. Of course, it would not be feasible to utilize
such a broad spectrum of assessments on a regular basis. Thus, one of the authors’ primary
goals in developing the OASES has been to encourage broad-based assessment of the speaker’s
experience of stuttering through the use of a single, comprehensive, easy-to-use but detailed
measurement instrument that could be used both in treatment planning and in treatment outcomes
research.

To ensure that the instrument would assess the totality of the disorder, the OASES was based
on a widely accepted and validated framework that is used throughout the fields of health and
rehabilitation (i.e., the WHO’s ICF). To ensure a high degree of reliability and validity of the
instrument, the OASES was subjected to extensive testing and refinement. Specifically, focus
groups of people who stutter and expert reviews by numerous stuttering specialists helped to
verify content and construct validity. A series of pilot studies, which, together, included a total of
more than 300 adults who stutter, helped to ensure the usability of the instrument and provided
data to support evaluation of test items. Although it is possible that the nature of the validation
sample may have introduced some bias in the test items, detailed item analyses ensured the
reliability of individual items and minimized the likelihood of consistent bias across subsections
of the instrument. Finally, statistical analyses demonstrated the internal consistency of the four
sections of the OASES while assuring that the sections do indeed examine relatively independent
constructs.

Through this effort, the OASES has evolved into a single, tightly integrated and theoretically
motivated tool that provides clinicians and researchers with critical information about the speaker’s
experience of the stuttering disorder. It is hoped that widespread use of this new tool will enhance
the ability of clinicians and researchers to conduct more thorough empirical evaluations of the
outcomes of stuttering treatment. This will increase the knowledge base about the results of
broad-based treatment approaches for stuttering and provide the opportunity for researchers to
more appropriately evaluate the outcomes of treatments that address factors in addition to the
observable aspects of a speaker’s fluency.