nogilnick.
About
Blog
Plots















Mortality in the United States and Its Causes

Sat, 24 Feb 2018

Data Science, Data Visualization, Death, Medicine, Statistics

In this chapter, vital statistics for the United States of America are explored. The Center for Disease Control maintains several datasets containing vital statistics for the nation. These datasets contain records of deaths organized by year. Each record includes age, gender, race, cause of death, and other details. This chapter explores data for the year 2016.

The Human Lifespan

Figure 1 shows the distribution of age at death for all records. The plot shows a right-skewed distribution as expected. The leftmost bar stands out somewhat. This bar enumerates infant mortality.

Figure 1: Distribution of Age at Death


Considering only the data in this first bar, another histogram is constructed. This histogram shows the age in months for these records. Figure 2 shows that infant deaths occur most frequently after birth and sharply decline thereafter.

Figure 2: Infant Mortality


Next, the rightmost bars of the histogram are considered. These bars contain records for those older than 100 years old. The records are grouped by gender and race and displayed in a bar plot. The y-axis represents the percentage of centenarians for each race.

Figure 3: Percentage of Centenarians by Race


Figure 3 shows several things. The first is that the majority of people who live at least 100 years are women. In fact, females account for roughly 82% of this number. The second is that people of some races are more likely to survive their first century. Japanese and Chinese are significantly more likely to do so.

Next, records of all ages are grouped by gender. The distributions for men and women are plotted in both a line and bar chart. Figure 4 shows that men die earlier than women. This trend begins in the late teenage years and continues into adulthood. The count of female records outpaces men only near the end of the human lifespan.

Figure 4: Age Distribution at Death by Gender


The average lifespans of men and women are compared. It is found that men live roughly 6.6 years shorter than women. This difference is highly significant. A Welch's t-test for the difference of means has \(t \approx -303\).

Next, the average and standard deviation age at death is computed for each race. The result is shown in a bar plot.

Figure 5: Age at Death by Race


Figure 5 shows that white and Asian people live longer than other races on average. Japanese people have the longest average lifespan together with the lowest standard deviation. The low standard deviation suggests fewer Japanese people die early in life.

Figure 6: Distribution of Age at Death by Race


This is confirmed by plotting the distribution of several races side by side. Figure 6 shows that relatively fewer Japanese people die before reaching the end of the human lifespan.

Manner of Death

Next, the manner of death is explored. The dataset classifies the manner of death into 7 categories. The categories and their counts are listed in Table 1.

DescriptionClass
Natural2212118
Unspecified294239
Accident160768
Suicide45155
Homicide20544
Unknown12467
TBD4573

Table 1: Distribution of Age at Death by Race


The average age at death for each category is shown in Figure 7. Deaths from natural causes have the greatest average age. Homicides have the least.

Figure 7: Average Age at Death by Manner of Death


Next, the records are grouped by manner of death and race. Bar charts for accidents, suicides, and homicides are constructed. The y-axis represents the percentage of all deaths for each race accounted for by a specific manner.

Figure 8: Manner of Death by Race


The chart for homicides show that Japanese and Chinese have the lowest homicide rates among all races. This factor contributes to the longevity of these races as death by homicide typically occurs earlier in life. Conversely, homicide rates are highest among blacks. This factor contributes to the relatively shorter average lifespan of the race.

Underlying Cause of Death

Next, underlying cause of death is explored. Each record is labeled with an ICD-10 code indicating the underlying cause of death. The cumulative percentage of records accounted for by top diseases is computed and the result is shown in Figure 9. As can be seen, a small number of causes are responsible for a large number of deaths. Well over 60% of all deaths are the result of less than 50 causes of mortality.

Figure 9: Cumulative Percentage of Deaths by Top Diseases


Next, the records are grouped by ICD-10 code and the counts of each are computed. The result is shown in a bar chart in Figure 10. The corresponding ICD-10 codes are listed in Table 2.

Figure 10: Leading Causes of Death

ICD-10AgeStd. AgeCountDesc
I25179.912.9161079Atherosclerotic Heart Disease
C34971.711.0146786Malignant Neoplasm: Bronchus or Lung
J44977.211.0116117Chronic Obstructive Pulmonary Disease
G30986.97.7113096Alzheimer Disease
I21974.514.0107594Acute Myocardial Infarction
F0387.47.9100901Dementia
I50083.611.664439Congestive Heart Failure
I25071.914.862909Atherosclerotic Cardiovascular Disease
I6481.212.161818Stroke
J18979.714.342189Pneumonia

Table 2: Leading Causes of Death


Heart disease accounts for the largest number of deaths. Atherosclerosis, the build-up of plaque on the arterial walls, is involved in several of the leading causes of death. Lung cancer and COPD are also responsible for a sizable portion of the records. Both of these pulmonary conditions are strongly associated with smoking.

Next, causes of death in those under the age of 50 are explored. A similar bar chart and table are constructed from these records.

Figure 11: Leading Causes of Death under 50

ICD-10AgeStd. AgeCountDesc
X4240.813.019167Accidental Poisoning by and Exposure to Narcotics
X4442.213.516872Accidental Poisoning by and Exposure to Unspecified Drugs
X9532.313.311466Assault by Unspecified Firearm Discharge
X7040.016.78425Intentional Self-Harm by Hanging Strangulation and Suffocation
V89243.021.57900Person Injured in a Motor-Vehicle Accident
X7450.119.66826Intentional Self-Harm by Unspecified Firearm Discharge
I21974.514.05375Acute Myocardial Infarction
C50968.714.94880Malignant Neoplasm: Breast
R9955.928.84873Other Ill-Defined and Unspecified Causes of Mortality
I25071.914.84284Atherosclerotic Cardiovascular Disease

Table 3: Leading Causes of Death Under 50


The leading causes of death in those under 50 are not due to disease processes. Drug overdose, homicide, and suicide lead. The only diseases present in the top 10 leading causes of death are breast cancer and heart disease.

Next, deaths caused by cancer are considered for all ages. Lung cancer accounts for a clear majority of deaths due to cancer. The large number of deaths due to pancreatic cancer are presumed due to the present difficulty in treating it. The prognosis for breast cancer is better, though it is a more common disease.

Figure 12: Leading Causes of Death by Cancer

ICD-10AgeStd. AgeCountDesc
C34971.711.0146786Malignant Neoplasm: Bronchus or lung
C25971.811.942121Malignant Neoplasm: Pancreas
C50968.714.941913Malignant Neoplasm: Breast
C18972.014.039249Malignant Neoplasm: Colon
C6178.610.530396Malignant Neoplasm: Prostate
C8072.413.427845Malignant Neoplasm: Unspecified Site
C67977.711.616586Malignant Neoplasm: Bladder
C71964.116.115303Malignant Neoplasm: Brain
C15969.511.915285Malignant Neoplasm: Esophagus
C5669.813.014242Malignant Neoplasm: Ovary

Table 4: Leading Causes of Death by Cancer


Next, methods of suicide are considered. A similar table and bar chart are constructed only from records due to suicide.

Figure 13: Most Common Methods of Suicide

ICD-10AgeStd. AgeCountDesc
X7450.119.613948Intentional Self-Harm by Unspecified Firearm Discharge
X7040.016.711682Intentional Self-Harm by Hanging Strangulation and Suffocation
X7250.319.86116Intentional Self-Harm by Handgun Discharge
X6450.315.13241Intentional Self-Poisoning by and Exposure to Unspecified Drugs
X7347.719.52892Intentional Self-Harm by Rifle, Shotgun and Larger Firearm Discharge
X6747.916.71369Intentional Self-Poisoning by Exposure to Gases (CO2, Helium, etc)
X8043.318.01123Intentional Self-Harm by Jumping from a High Place
X6149.115.51064Intentional Self-Poisoning by Exposure to Sedatives

Table 5: Most Common Methods of Suicide


The most common method of suicide, by a significant margin, is via firearm. Hanging is also prevalent. Intentional poisoning is a distant third.

Education Levels

Finally, records are grouped by education level. Education level is recorded as a categorical variable with 9 categories based on different educational milestones. The descriptions for each of the categories are shown in Table 5.

CategoryDescription
18th grade or less
29 - 12th Grade, No Diploma
3High School Graduate or GED Completed
4Some College Credit, but No Degree
5Associate Degree
6Bachelors Degree
7Masters Degree
8Doctorate or Professional Degree
9Unknown

Table 6: Education Levels with Categorical Labels


The categories increase with level of education. Records with unknown education level are discarded; they account for less than 2% of all records.

A bar chart is constructed of the average age of each group. The result is shown in Figure 14. People who complete at least a bachelor's degree live longer on average.

Figure 14: Average Age at Death by Education Level


The bar chart also suggests a modest increasing trend with education. To further explore this trend, a scatter plot is constructed from the data points. A trend line is fit to the data and the coefficient of determination is computed.

Figure 15: Relationship Between Education and Lifespan


The \(R^{2}\) of the fit is 0.641, the general F-statistic of the model is 12.512 with a corresponding p-value = 0.008. The coefficient of age is 0.907 and is significant. The coefficient indicates that average lifespan increases by roughly 1 year for each educational milestone completed.