Appendix 3: Methodology for Tables and Figures
Bryan Sayer, M.H.S.
This appendix provides information on the sources and computations for the tables and figures used in the chapters on digestive diseases.
I. Data Sources
The number of ambulatory care visits, hospital discharges, and deaths in the tables and figures came from four sources (see Appendix 2 for descriptions):
- Ambulatory care visits data in tables and figures came from the combined National Ambulatory Medical Care Survey (NAMCS)/National Hospital Ambulatory Medical Care Survey (NHAMCS) years 1992–2005.
- Data on hospital discharges in the figures came from the National Hospital Discharge Survey (NHDS), years 1979–2004.
- Hospital data in the tables came from the Healthcare Cost and Utilization Project Nationwide Inpatient Sample (HCUP NIS) for the year 2004.
- Mortality data came from the National Vital Statistics System Multiple Cause Mortality data years 1979–2004, as prepared by the National Bureau of Economic Research.
For digestive cancers (Chapters 4–12), cancer incidence and survival were derived from the Surveillance, Epidemiology, and End Results (SEER) program of the National Cancer Institute (NCI). Data in the tables for 2004 came from the 17 registry sites that SEER used at that time. Data in the figures for 1979–2004 were from the nine sites in operation during the entire period. Population data corresponding to the definition of the SEER sites were provided by SEER.
II. Disease Definitions
Digestive diseases were coded into 1 of 29 digestive disease categories based on the International Classification of Diseases (ICD)-9 CM (Clinical Modification) code for morbidity, and either ICD-9 (1979–1998) or ICD-10 (1999–2004) for mortality. See Appendix 1 for the complete list of codes for each of the 29 diseases. The first-listed diagnosis was considered the primary diagnosis for tables and figures for primary digestive disease. All remaining diagnoses were considered secondary and were included under the category “All-Listed Diagnoses.” In the tables and figures for ambulatory care visits, hospital discharges, and mortality, diagnoses were counted only once under the all-listed category, irrespective of the number of actual diagnoses. For example, in the chapter on all digestive diseases, only one digestive disease diagnosis was counted, even though more than one could have been listed on a medical record or death certificate.
While the coding for digestive disease mortality is generally consistent between ICD-9 and ICD-10, the World Health Organization (WHO), which produces the ICD code definitions, advises that series are not necessarily comparable across versions of the ICD code book. This change was portrayed as a vertical line at 1999 on the mortality figures.
III. Demographic Categories
For the purpose of calculating rates for the U.S. population, population data were derived from the national population estimates program of the U.S. Census Bureau and the Centers for Disease Control and Prevention (CDC). Population counts were specific for each of the demographic subgroups shown in the tables.
|Demographic Subgroup||Population Count, 2004|
Race was coded as “White” or “Black”; or “Other,” if another category was specified. Missing race data were not considered “Other.” The HCUP NIS data combine Hispanic origin with race, so it was impossible to know whether Hispanics were white or black. In order not to undercount the totals, we assumed all Hispanics were white. As a result, discharges for whites were slightly overstated and for blacks slightly understated.
HCUP NIS data came from the individual States, and 11 States did not report race in 2004. To adjust for this limitation, we created a separate weight for race, based on the existing weight times the inverse of the proportion of each race in the States that did report race to the total for the United States. Note that these are counts of persons, based on the 2004 mid-year population estimate, and not the proportion of discharges. We did not report separate counts for “Other” race, because the definition in the HCUP NIS and the population counts may not be the same.
Age-adjustment through direct standardization allowed for comparisons across race, sex, and time that were not influenced by differences in age distribution for the groups being compared. Year-specific population data in 19 age groups (PDF, 260 KB) , plus the National Center for Health Statistics (NCHS) standard year 2000 population, were used for age-adjusting. Age-specific rates were calculated for each of the 19 age groups (age 0, age 1–4, 5-year age groups through age 84, and age 85 and older), and the results were multiplied by the year 2000 standard population proportion in each of the age groups. These results then were summed to arrive at the age-adjusted population rate estimate. Further details can be found in Anderson and Rosenberg.38
1. Ambulatory Care Visits
Estimates in the tables for ambulatory care visits in 2004 were from combined NAMCS/NHAMCS files for the years 2003–2005. Multiple years were combined in order to have sufficient observations to meet the minimum threshold for reporting and for more stable estimates. The 3 years of data were averaged by dividing the sampling weight by 3, in accordance with the general instructions from NCHS. The combined file included visits to freestanding physician offices and physician offices at hospitals, and emergency room visits that did not result in an overnight stay in the hospital.
The primary diagnosis for an outpatient visit was the first diagnosis listed in the record. A visit was considered to have been for 1 of the 29 digestive diseases if the first of the diagnoses listed on the record fell into the subject category. Estimates for first-listed diagnosis for digestive diseases included the number of visits and the rate of visits per 100,000 of the population. The rate per 100,000 was the number of visits, not the number of individuals with a visit, divided by the number of persons (in 100,000s) in the population in the specific subgroup.
The weighted count of visits with a first-listed diagnosis of each of the digestive diseases was the count (in thousands) listed in the table under “Ambulatory Care Visits,” “First-Listed Diagnosis,” “Number in Thousands.” The “Rate per 100,000” was calculated by dividing the count of visits by the number of persons (in 100,000s) in the population in the specific subgroup.
Each outpatient record could have multiple diagnoses listed. A visit was considered to have been for a specific digestive disease if any of the diagnoses listed on the record fell into the subject category. Therefore, any individual record could be counted for more than one digestive disease. However, a given record was not counted more than once for a specific disease. For example, a record having the ICD-9-CM diagnostic codes of “001” and “002” was only counted once in the category of Gastrointestinal Infections. The weighted count of visits with all-listed diagnoses of each of the digestive diseases was the count (in thousands) listed in the table under “Ambulatory Care Visits,” “All-Listed Diagnoses,” “Number in Thousands.” The “Rate per 100,000” was calculated by dividing the count of visits by the number of persons in the population (in 100,000s) in the demographic subgroup.
2. Hospital Discharges
Hospital discharges were based on inpatient stays of at least 1 night. Emergency room visits that did not result in an admission to the hospital with an overnight stay were not counted. Data in the tables came from the 2004 HCUP NIS file of hospital discharges from participating States. Sampling weights inflated the discharges to the U.S. total, based on information from the American Hospital Association. Data in the figures showing age-adjusted hospital discharges over time were based on the NHDS, 1979–2004.
The primary diagnosis for a hospital discharge was the first diagnosis listed in the record. Inpatient estimates for first-listed diagnosis for digestive diseases included the number of discharges and the rate of discharges per 100,000 of the population. The weighted count of hospital discharges with a primary diagnosis of each of the digestive diseases was the count (in thousands) listed in the table under “Hospital Discharges,” “First-Listed Diagnosis,” “Number in Thousands.” The “Rate per 100,000” was the number of discharges, not the number of individuals with an inpatient stay, divided by the number of persons (in 100,000s) in the population in the specific subgroup.
Each hospital discharge record could have multiple diagnoses listed. A discharge was considered to have been for a specific digestive disease if any of the diagnoses listed on the record fell into the subject category. Therefore, any individual record could be counted for more than one digestive disease. As with ambulatory care visits, a given record was not counted more than once for a specific disease. For example, ICD‑9-CM diagnostic codes of “001” and “002” were only counted once in the category of Gastrointestinal Infections. The weighted count of hospital discharges with all-listed diagnoses of each of the digestive diseases was the count (in thousands) listed in the table under “Hospital Discharges,” “All-Listed Diagnoses,” “Number in Thousands.” The “Rate per 100,000” was calculated by dividing the count of hospital discharges by the number of persons in the population (in 100,000s) in the demographic subgroup.
Counts for 2004 for deaths from digestive disease were derived from the Multiple Cause-of-Death data files from the Division of Vital Statistics, CDC. These data are a complete accounting of all deaths in the United States (although not necessarily for all U.S. citizens). Cause of death is organized on a record axis, with a specific underlying cause of death and contributing causes for each decedent.
1. Underlying Cause of Death
The underlying cause of death was determined from the list of all causes on the death certificate by professional coders. Underlying cause is analogous to a first-listed diagnosis for morbidity. The “Number of Deaths” column for “Underlying Cause” was a count of the number of records in the file with each digestive disease as the underlying cause of death.
The “Rate per 100,000” column was determined by dividing the number of deaths with the underlying cause by the population (in 100,000s) in the demographic subgroup. The race- and sex-specific estimates were age-adjusted, while the age-specific rates and the total were not age-adjusted.
“Years of Potential Life Lost” assumed life expectancy of 75 years, had individuals not died before that age. Because age at death is reported in full years, we added 0.5 years to each age at death. Thus, for the purpose of calculating years of life lost, a person whose age at death was listed as 65 was counted as having been 65.5 years old. The age 65.5 represented the average age of all persons who died at age 65, and each contributed 9.5 years of potential life lost (75-65.5 = 9.5). The tables showed the total number of years of life lost to age 75 in thousands.
2. Underlying or Other Cause of Death
The record axis of the death certificate can contain up to 20 contributing causes in addition to the underlying cause. A recording of any of the 29 unique digestive diseases was noted for each of the 21 total possible causes, and any duplicate digestive diseases were eliminated. A death was attributed to one of the digestive diseases if any of the unduplicated digestive diseases were recorded. Therefore, a death could appear under more than one of the digestive diseases in the “Underlying or Other Cause” column of the tables. Unlike the underlying cause, only the “Number of Deaths” and the “Rate per 100,000” were shown for “Underlying or Other Cause.” “Years of Potential Life Lost” were irrelevant.
“Number of Deaths” (in 100,000s) was the count of all deaths that had the specified digestive disease listed in any position on the record axis. A death could appear under more than one disease if any of the diagnoses were listed; however, no death appeared more than once for a given disease.
The “Rate per 100,000” column was determined by dividing the number of deaths for underlying or other cause by the population (in 100,000s) in the demographic subgroup. The race- and sex-specific estimates were age-adjusted, while the age-specific rates and the total were not age-adjusted.
Cancer incidence and 5-year survival rates in Chapters 4–12 were derived from SEER registry data. The registries did not cover the entire United States, nor necessarily represent the entire population. Instead, each registry covers a specific set of counties, usually statewide, across diverse sections of the country. (For more information on registries, see SEER.2, 3) Population counts used for rates and age-adjustment were also restricted to the counties covered by the registry. Only estimates based on unweighted counts of 17 or more cases were shown, following the reporting standard set by NCI.
Cancer incidence was estimated for the entire country from the rates for the 17 registries in 2004, multiplied by the 2004 U.S. population. This yielded an estimated number of new cases for the United States in 2004. The unadjusted and age-adjusted incidence rates were based only on the 17 registry areas. Unadjusted rates were calculated from the number of new cases in 2004 divided by the population in the demographic subgroup. Age-adjusted incidence rates were calculated from the age-specific rates within the demographic subgroup multiplied by the U.S. standard 2000 population as described in section IV. Age-Adjustment.
The figures showing trends in ambulatory care visits and hospital discharges for the period 1979–2004 used the all-listed diagnoses. The all-listed diagnoses were defined the same as for the tables. However, the data source for hospital discharges was the NHDS because HCUP NIS data were unavailable over the entire timeframe. Because of the smaller sample size for the ambulatory care surveys, estimates derived from NAMCS/NHAMCS files were 3-year averages, except for the 1992 estimates, which were averages of 1992 and 1993 data. This approach provided more stable estimates across time. The year 1992 was the starting point, because this was the first year of the NHAMCS. All rates were age-adjusted.
The figures showing mortality data for the period 1979–2004 used the multiple cause-of-death data for each year. Because these were observed counts for the United States and not samples, they were not considered estimates. The age-adjusted mortality rates were shown for both underlying cause and underlying or other cause for the total population per year. The vertical line at 1999 represented the change from ICD-9 to ICD-10.
Cancer Incidence and 5-Year Survival
For digestive cancers (Chapters 4–12), the figures for age-adjusted cancer incidence and 5-year survival were derived from data obtained by the nine registries that SEER used through the entire period 1979–2004. Five-year survival was the proportion of those diagnosed in a given year who were still known to be alive 5 years later. Five-year survival ended at 1999, because it was impossible to know the 5-year status of patients diagnosed after that year. Absolute survival is shown in these figures, whereas SEER typically publishes relative survival. Relative survival takes into account the expected survival of the population as a whole and is higher than absolute survival, especially for cancers that concentrate in groups with high underlying mortality, such as the elderly.