Clinical Methods

The Osteoporosis Risk Assessment Tool: Establishing Content Validity Through a Panel of Experts
Christine A. Wynd and Michelle Atkins Schaefer

Twenty-three items were initially developed for the Atkins Osteoporosis Risk Assessment Tool (ORAT) after a thorough examination of the literature. These items were reviewed for relevance to the domain of content by a panel of eight experts using Lynns (1986) two-stage process for content validation. The Content Validity Index and the kappa coefcient of agreement were analyzed from panelists quantitative ratings and 15 items were retained. Qualitative suggestions from the experts were also used to improve the nal items in the ORAT. Copyright 2002, Elsevier Science (USA). All rights reserved.

HE NURSING PROFESSION requires highquality measurement tools for both clinical and research use. Tools and instruments should consistently measure items, concepts, and constructs in a stable and reliable fashion. Also, these instruments need to closely represent the domain of content that is of essential interest to nurses. This latter requirement is dened as validity, or the degree to which an instrument measures what it purports to measure (Anders, Tomai, Clute, & Olson, 1997; Knapp, 1985). One of the rst steps in instrument development is to ascertain content validity, dened as the extent to which an instrument has the necessary number and types of items to represent the universe of content (Davis, 1992; Davis & Grant, 1993; Knapp, 1985). The purpose of this article is to describe the process used to assure content validity of a newly

developed tool called the Atkins osteoporosis risk assessment tool (ORAT) (Atkins, 1996). The ORAT is described together with the content validation process emphasizing the use of a panel of experts. Nurses who wish to replicate this process are provided with practical information and details.

Lynn (1986) established an important method for instrument content validation that has been used by many researchers (Anders et al., 1997; Davis, 1992; Davis & Grant, 1993; Grant & Davis, 1997; Tilden, Nelson, & May, 1990). Her efforts began as an attempt to differentiate face validity and content validity and to standardize a process for establishing content validity. Face validity was dened as validity by assumption, a nonquantitative, conceptual, and logical approach to linking parts of an instrument with its general purpose (Lynn, 1986). Lynn asserted, however, that in comparison to face validity, content validity could be quantied through use of a two-stage process that included a developmental stage and a judgment/ quantication stage. The developmental stage requires three steps. First, the domain of content is identied through a comprehensive review of the literature and/or consultation with content experts. During this rst step, all content elements and subelements are reviewed and established as part of the phenomenon
of interest. The second step is item generation or the development of explicit items reecting the domain of content and the purpose for the instrument. Finally in the third step, the actual instrument is constructed through appropriate wording of instrument items, development of instructions to subjects, establishment of a mechanism for scoring responses, and placement of items into a proper format for use with subjects (Lynn, 1986). Stage two, the judgment/quantication stage, requires a panel of content experts to review the instrument and validate the relevance of items to the domain of content. This is the quantitative phase, which measures the proportion or percent of experts who are in agreement about the relevance of the instrument and its items. The Content Validity Index (CVI), originally discussed in the writings of Martuza (1977), Waltz & Bausell (1983), and Waltz, Strickland, & Lenz (1991), is used to establish proportion/percent agreement among the experts. Lynn (1986) recommended the use of a relevance rating scale providing ordinal level data through four Likert-like choices (4 very relevant, 3 relevant but needs minor alteration, 2 unable to assess relevance without item revision, 1 not relevant). The four-choice rating scheme offers more in depth information about item revisions and relevance; however, in the end, only the proportion of items receiving ratings of 3 and 4 constitute the actual CVI (Waltz & Bausell, 1983), and any items rated at levels 1 and 2 should be further revised or eliminated. The CVI formula is represented by CVI or % agreement number of experts agreeing on items rated as 3 or 4 total number of experts The following ranges of CVI magnitude for expert agreement were used to evaluate content validity. Items were considered to have adequate content validity if they achieved an agreement of 89% or higher. Questionable items ranged from 70% to 88% agreement, and items were found to have unacceptable content validity if they achieved an agreement of 69% or lower (Tilden et al., 1990). Percent agreement and proportion agreement are often criticized because there is a risk for inated values from chance agreement (Garvin, Kennedy, & Cissna, 1988; Suen & Ary, 1989; Waltz, Strickland, & Lenz, 1991). Lynn (1986) purports to con-

trol for chance agreement by determining a minimum number of experts from the standard error of the proportion. Although Lynn (1986) makes an excellent case for quantifying and standardizing researchers approaches to content validation, the authors of this article also used Cohens kappa (1960) in an attempt to improve on the measure of expert agreement. A critique of proportion agreement and attempted resolutions are described elsewhere (Wynd, Schmidt, & Atkins, submitted); however, expert agreements, comments, and the resulting ORAT items are discussed in the next sections of this article.

The ORAT is an instrument developed for assessing potential osteoporosis risk in both men and women. Demographic and health-related factors are evaluated for their potential to protect against or promote risk for osteoporosis. Subject responses are scored and weighted according to the degree of risk. Construction of the ORAT began with the developmental stage and a thorough literature review to identify the domain of content regarding osteoporosis risk. The current state of the science was assessed and, as a result, 23 risk factors were included in the instrument (age, gender, and race; history of fractures, kidney stones, thyroid disease, and hormone replacement therapy; previous diagnosis of osteoporosis; previous bone densitometry measurements; age at onset of menopause; use of estrogen, steroids, heparin, cyclosporine, antacids, and barbiturates; family history of fractures; use of calcium supplements; daily consumption of caffeine, alcohol, and servings of calcium-rich foods; smoking history; the amount of weight-bearing exercises performed per week).

An explicit rationale for selecting the panel of experts is needed as researchers enter into the judgment/quantication stage of instrument development (Davis & Grant, 1993). Experts are often chosen based on their knowledge of the content or the domain of interest. In the current study, experts were sought if they had clinical and research knowledge about osteoporosis risk in adults. This



Table 1. Example of Items From the Relevance Rating Scale Used by Expert Panelists to Review the ORAT
Relevance Rating Scale Item Ratings

What is your age? ( 65, 65-75, 75) Suggestions: What is your race? (Black, White/Asian) Suggestions: What is your sex? Suggestions: Have you ever been diagnosed with osteoporosis? Suggestions: How many cups/glasses per day do you drink of caffeinated beverages? (0-1, 2-3, Suggestions: How many alcohol-containing beverages do you drink per day? (0, 1-2, 2) Suggestions: Do you now/have you ever smoked cigarettes? Suggestions: Do you take estrogen? Suggestions:

1 1 1 1 3) 1 1 1 1

2 2 2 2 2 2 2 2

3 3 3 3 3 3 3 3

4 4 4 4 4 4 4 4

Note. Instructions are as follows. Items listed below were selected for inclusion in the Atkins ORAT. Please review the tool and rate each item and its selected responses for relevance in assessing risk for osteoporosis. Under each item, please provide any suggestions/recommendations for item revision. Ratings are 1 not relevant, 2 unable to assess relevance without item revision, 3 relevant but needs minor alteration, and 4 very relevant.

type of knowledge is often shown by the individual experts professional certications and credentials, publications in peer-reviewed journals, presentations at professional meetings, and funded research (Davis & Grant, 1993; Grant & Davis, 1997). Frequently researchers desire experts from various disciplines because of the broad nature of the content area (Davis, 1992). Experts selected for reviewing the ORAT included physicians and nurses with current, clinical, advanced practice in the areas of womens health and rheumatology. Nurse researchers studying various elements of osteoporosis were also selected to participate as expert panelists. Additionally, there was widespread geographic representation, from Washington, DC to Portland, Oregon, to account for differences in colloquial terms that could affect instrument comprehension by many diverse groups (Grant & Davis, 1997). A panel of 12 experts was identied through the research team members professional contacts with nationally and locally recognized professionals. It was important to provide the experts with enough information about the ORAT to allow them to make intelligent and helpful reviews. Therefore, experts were given the following materials: (1) a cover letter, (2) a copy of the ORAT itself, and (3) a copy of the relevance rating scale.

The cover letter asked experts to participate in the instrument review and informed them of their selection based on recognized expertise in the area of osteoporosis. The purpose and signicance of the content validation study were explained together with the potential uses of the ORAT. Also, instructions were provided for the entire expert review process. The relevance rating scale allowed experts to rate each separate item by using the four-point Likert scheme. Eventually, the scale was used to quantitatively analyze agreement for each item and the entire tool. Instructions guided the experts to rate each of the 23 original items on the ORAT and to review the weighted responses and scoring mechanism. Table 1 provides examples of items from the relevance rating scale. Experts were also instructed to comment qualitatively about item content and to suggest revision of items that were inconsistent with the domain of content. Experts were encouraged to address the wording of each item and the clarity of instrument instructions. Problems with wording, clarity of meaning, construct of the item in terms of English grammar, and repetition were reviewed and revisions made. Finally, experts were requested to evaluate the entire instrument for comprehensiveness in dealing with osteoporosis risk factors and



to identify any areas omitted from the ORAT. Qualitative comments were recorded, opinions synthesized, and recommendations incorporated into item and instrument revisions (Grant & Davis, 1997; Tilden, Nelson, & May, 1990). Experts were asked to respond within a 4-week period, and a stamped, self-addressed envelope was enclosed with the materials to facilitate that response. Two weeks after the deadline was past, responses were compiled, summarized, and a thank-you letter was sent to all experts who participated in the process.

Table 2. CVI Values and Kappa Coefcients for Individual ORAT Items
ORAT Item/Risk Factor CVI Values Kappa Coefcient

Eight out of the 12 experts selected by the research team participated in the content validity study by returning the ORAT and completing the relevance rating scale (a 67% response rate). The 23 items and the entire tool were reviewed. Fifteen items received adequate validity at 100% agreement. One item, asking subjects about a past history of kidney stones, achieved a rating of 57% agreement, signifying unacceptable validity. This item was eliminated without further consideration. Seven items, rated at a level of questionable validity (86% agreement), were closely examined using qualitative remarks from the panel of experts. These seven items assessed previous bone densitometry readings; caffeine consumption; and the use of medications including steroids, aluminumcontaining antacids, barbiturates, heparin, and cyclosporine. The CVI for the entire ORAT resulted in a 65% (0.65) agreement, and the kappa coefcient analysis was equal to 0.0394 (p .5000), indicating the need for revisions to improve content validity. Chance agreement appeared to be greater than the observed value of rater agreement. Kappa coefcients for individual ORAT items were analyzed and a cutoff point for questionable items was established at 0.476. Items below this value were either eliminated or reworded, based on the experts recommendations (Table 2). A nal 15 items remained in the ORAT after all quantitative and qualitative data from experts were reviewed. Items receiving both high-percent agreements and higher kappa coefcients included age, gender, race, history of fractures, previous diagnosis of osteoporosis, history of thyroid disease and use of thyroid replacement medication, use of estrogen replacement therapy, amount of weight-

Age Gender Previous fractures to the hip, spine, or wrist Diagnosis of thyroid disease Use of thyroid replacement medication Race Previous diagnosis of osteoporosis Use of estrogen replacement medication Amount of weight-bearing exercises per week History of previous bone densitometry Use of steroid medication Family history of fractures Diagnosis of kidney stones Age at onset of menopause Use of calcium supplements Alcohol consumption per day Servings per day of foods rich in calcium Smoking history Use of heparin medication Use of cyclosporine medication Use of antacids Use of barbiturates Caffeine consumption per day

1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.86 0.86 1.00 0.57 1.00 1.00 1.00 1.00 1.00 0.86 0.86 0.86 0.86 0.86

.714 .714 .714 .714 .714 .524 .524 .524 .524 .476 .476 .429 .429 .429 .429 .429 .429 .429 .333 .333 .286 .286 .286

bearing exercises performed per week, family history of fractures, age at onset of menopause, use of calcium supplements, servings of calcium-rich foods per day, alcohol consumption per day, and smoking history. Qualitative comments from the experts led to revisions in the wording of items, especially with an emphasis on the history of fractures occurring in adulthood, a maternal family history of hip fractures, current cigarette smoking, and the important additions of height, weight, and body mass index measurements to assess thinness. Experts conrmed that these conditions represented the most frequently reported risk factors in the current research literature. Experts advised using lay terminology to dene medical words such as osteoporosis, menopause, hysterectomy, and estrogen therapy. Also, experts identied the need to provide examples of calciumrich foods and weight-bearing exercises.



Suggestions were made to combine the two questions addressing thyroid disease and treatment and to add items extending assessment of the natural onset of menopause to surgical and exerciseinduced cessation of menstruation. Even though questions about medications received low CVI and kappa ratings, it was decided to retain the one item asking about steroid use because steroids are known to seriously affect bone density. Current smoking status is more risky than simply having a history of smoking; therefore the assessment of current use was incorporated into the revised item. Finally, one expert advised that excessive alcohol consumption is a known risk factor for osteoporosis, but an occasional drink may prove benecial to bone health, thus careful weighting of responses was required in terms of assessing the number of drinks per day. Experts pointed out that bone densitometry is the diagnostic tool used to identify individuals with osteoporosis who are at risk for current or future fractures. Although bone density is the strongest predictor of fracture risk, other major risk factors include a history of previous fractures as an adult, maternal history of hip fracture, thinness (weight 127 pounds), and frailty (age). It was suggested that a risk assessment tool for fractures might be of even greater value to clinicians and several experts recommended calling the instrument the osteoporotic fracture risk assessment tool (OFRAT).


Content validity of the ORAT was carefully reviewed through the use of a panel of eight experts knowledgeable about osteoporosis. Lynns (1986) two-stage process for content validation was used to develop 23 items for the original instrument based on a review of literature. Next, the expert panel reviewed the ORAT and quantitatively rated the relevance of the instrument and each of its items to the domain of content. The CVI procedure as well as the kappa coefcient of agreement rated 15 items at values high enough to retain after revisions were made. Eight items were eliminated. Qualitative comments from the experts were used to improve the nal 15 items, and a suggestion to focus the tool on assessment of osteoporotic fractures was accepted. The tool was renamed the OFRAT. The next step in the development of this important tool is to pilot the OFRAT with a sample of adult subjects. Internal consistency of items and predictive validity of fracture risk will be evaluated.
ACKNOWLEDGMENTS The authors wish to acknowledge and extend gratitude for data entry assistance provided by graduate students Mary Dalpiaz, Karen Davis, Anne Gunther, and Brenda Fuller. The authors are also grateful to Dr. Bruce Schmidt, Director of Nursing Research and Staff Development, and Ms. Debra Anastasiadis, Coordinator of the Womens Center at Akron General Medical Center.

