
There are patterns in causes of deaths of U.S. people. I plotted the data as individual states with death rates along years and causes of deaths (c.o.d.) variables. Three layers are visible in these graphs. By far the most frequently occuring c.o.d. are heart disease and cancer which constitute the top layer. C.o.d. in the middle layer are moderately frequent. The middle layer comprises various diseases. The most commonly found c.o.d. in there are diseases of the respiratory, digestive, or nervous system, and external causes. The bottom layers comprises c.o.d. that are relatively rare. When plotted in single c.o.d., death rates of several c.o.d. show well-defined curved trends, suggesting their predictability potential. However, the overall patterns in these single c.o.d. graphs are not as marked and as consistent as the death rates plotted in single states.
I built a dashboard from the dataset, so that viewers may explore the data themselves. The dashboard is available below (or here).
THE DATA
The United States’s Center for Disease Control and Prevention (CDC) openly publishes dataset about multiple cause of death of U.S. people that covers the year 1999 to 2020. Information in the dataset were gathered by CDC from death certificates which provide certain details about the person’s passing. This is the data I analyze with the hope of finding useful and actionable insights.
The dataset covers 50 states and Washington D.C. Apart from state names and years, there were information about category of causes of deaths and deaths per 100000 of population (death rates). The deaths per 100000 of population values were calculated from the total number of deaths divided by the total population of the respective state. The causes of deaths were classified into 19 categories, namely:
• Certain infectious and parasitic diseases
• Neoplasms
• Diseases of the blood and blood-forming organs and certain disorders involving the immune mechanism
• Endocrine, nutritional and metabolic diseases
• Mental and behavioural disorders
• Diseases of the nervous system
• Diseases of the circulatory system
• Diseases of the respiratory system
• Diseases of the digestive system
• Diseases of the skin and subcutaneous tissue
• Diseases of the musculoskeletal system and connective tissue
• Diseases of the genitourinary system
• Pregnancy, childbirth and the puerperium
• Certain conditions originating in the perinatal period
• Congenital malformations, deformations and chromosomal abnormalities
• Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified
• External causes of morbidity and mortality
• Codes for special purposes
• Diseases of the ear and mastoid process
It should be noted that multiple causes of deaths information differ from underlying cause of death information. Multiple causes of deaths refer to underlying cause of death plus other contributory and intermediate conditions leading up to the person’s passing. In this analysis, the two major c.o.d., namely diseases of the circulatory system and neoplasms, will be assumed to be heart disease and cancer, respectively. It should also be noted that the c.o.d. in the data are diseases or injuries or abnormalities that lead to a person’s passing. Such concepts are different from diseases or injuries or abnormalities that occur but did not result in death.
THE PATTERNS


The data were visually inspected through two kinds of line graphs. The first one is the death rates plotted as individual states, with on graph for one state. In this kind of graph, the death rates are visualized with dimensions of year and c.o.d. From this kind of graph we can see how the prevalence all of c.o.d. changes (or stay consistent) throughout the years in one particular state. The second kind of graphs is death rates plotted in individual c.o.d., with one graph for one c.o.d. In this kind of graph, the death rates are visualized with dimensions of year and state. From this kind of graph we see how the death rates of a particular c.o.d. in a state change (or stay consistent) in comparison with the other states throughout the years.
As previously mentioned, three layers are visible in the line graphs of death rates of single states. The top layer comprises heart disease and cancer. The top layer appear consistently far above the other c.o.d. This means that by far the top layer is the most prevalent. Heart disease is almost always the the most frequently occuring c.o.d. at all years and all states. There is almost always a large gap between heart disease and cancer. There are exceptions, however. In Minnesota, Alaska, and Maine the gap between cancer and heart diseases is very narrow. The death rate of cancer surpassed that of heart disease only in year 2008 and 2013 in Alaska.
The middle layer is less well-defined. Its position in the middle indicates that it is moderately frequent compared to the top layer. The middle layer comprises various diseases. Most commonly found diseases in the middle layer are diseases of the respiratory, digestive, or nervous system and external causes. There are significant gaps between c.o.d. inside the middle layer. In several states, the middle layer may be divided further into multiple layers because of the clustering of lines there. One peculiar pattern in the middle layer is consistent appearance of both diseases of the respiratory system and external causes there as somewhat a couple. In Wyoming, these two c.o.d. even seem to superimpose on each other.
The bottom layer is the most rarely occuring groups of c.o.d. The c.o.d. that form this layer appear to vary more than the middle layer. The c.o.d. in the bottom layer seem densely packed as they occur closely together.

In single states line graphs, the general trends of all the c.o.d. appear to be stable because most of the lines are nearly flat and there is no conspicuous steep trend. However, as we see later, this conclusion would differ from the conclusion we get from looking at the single c.o.d. line graphs. I think this difference of impressions is due to the existence of the top layer that scale down the gaps of c.o.d. in the bottom and middle layers and make them appear tightly clustered together.
From visual inspection of single c.o.d. line graphs, it seems that death rates of many of the c.o.d. have well-defined and curved shaped patterns. Many of the graphs have jagged or abrupt lines, which I would suspect are artifacts of data processing. Several graphs are relatively consistently flat.

The c.o.d. that have curved, well-defined patterns would be interesting and feasible to model. I think it would be very insightful and useful to be able to uncover important features that predict the trends there. Insights of such features could be a base for the policy-makers to act upon. The c.o.d. that have curved, well-defined patterns in the data are:
• Diseases of the circulatory system
• Diseases of the digestive system
• Diseases of the nervous system
• Endocrine, nutritional and metabolic diseases
• External causes of morbidity and mortality
• Mental and behavioural disorders
Other c.o.d. trends are either relatively flat or probably artifacts of data processing.
In both single state and single c.o.d. graphs there are outliers and genuine abrupt change in trends. Such trends and outliers may be worthy to be investigated further and should be a cause for concern. Here are some notable consistent outliers and abrupt change in the upper end of the charts:
• Rapid incline of death rates caused by external causes is seen in District of Columbia (since 2008) and Maryland (since 2014).
• Alaska and New Mexico have consistent high death rates caused by external causes compared with other c.o.d. in the same state.
• Alabama always occurs in the top two places in death rates caused by diseases of the circulatory system.
• Wyoming, New Mexico, West Virginia are in top spots for death rates caused by diseases of the digestive system.
• West Virginia is always in top three spots for death rates caused by diseases of the genitourinary system.
• South Dakota is always in top three spots for death rates caused by diseases of the musculosceletal system and connective tissue.
• West Virginia is always in top spot for death rates caused by diseases of the respiratory system and endocrine, nutritional and metabolic diseases by relatively large gap above the other states.
• New Mexico is always in top two spots for death rates caused by external causes.
• Maine is always in top spot for death rates caused by mental and behavioural disorders except in 2016 where it is in the second place.
• West Virginia is always in top spot for death rates caused by neoplasms.
• West Virginia show abrupt incline of death rates caused by external causes since 2010. These inclining death rates consistently becomes the top most death rates, leading all other states by a large margin.

Looking at the single c.o.d. graphs, there are several c.o.d. that show concerning general trends. These c.o.d. are:
• Death rates caused by diseases of the circulatory system decline until 2012. After 2012 they steadily rise again.
• Death rates caused by diseases of the digestive system are always rising. They show gradual rise until 2018. After 2018 many of the states show abrupt rise.
• Death rates caused by diseases of the genitourinary system are always steadily rising.
• Death rates caused by diseases of the nervous system is steadily rising at a steeper slope than most other c.o.d. However, in 2013 and afterwards they show an abrupt pattern. Several states show steep decline from 2017 to 2019, but after that the steep rise continues.
• Death rates caused by endocrine, nutritional and metabolic diseases show gradual rise until 2018. Afterwards, they show abrupt incline. West Virginia, the state in the upper most position, show abrupt incline here since 2015.
• Death rates caused by external causes show steady incline. The incline is getting steeper since 2013. In 2017 there are several states showing steep incline, but after 2019 most of them show abrupt incline.
• Generally death rates caused by mental and behavioural disorders have always been in steep incline. Several states show steep decline from 2013 to 2018 but many of them rise steeply again from 2019 afterwards.
• Eventhough death rates caused by neoplasms or cancer has alwas been the second most dominant in every state, the overal trend of death rates caused by neoplasms is suprisingly on a steady decline.

To conclude this analysis, I would like to point out two things. First, West Virginia seem to appear in top spots for death rates of many c.o.d. This finding should be investigated further, among other things are by checking whether these are effects of peculiarities in the data processing and checking whether it is a reflection of the actual conditions of the people.
Second, the trend of high death rates caused by external causes in certain states are concerning, whether those trends are consistently high or abruptly incline. Such external causes, I assume, are accidents or violence (either violence inflicted by others or by themselves). Areas that have this pattern are Alaska, District of Columbia, West Virginia and New Mexico. This pattern is also very important to investigate further, so that we may see what really happens in these areas. Insights from the investigation would hopefully help policy-makers and policy-enforcers act to reduce such external causes.
The dataset in this analysis were obtained via CDC Wonder online database. The data dashboard visualization was built using Microsoft Power BI. The data from the database were cleaned using R.
REFERENCE
Centers for Disease Control and Prevention, National Center for Health Statistics. National Vital Statistics System, Mortality 1999-2020 on CDC WONDER Online Database, released in 2021. Data are from the Multiple Cause of Death Files, 1999-2020, as compiled from data provided by the 57 vital statistics jurisdictions through the Vital Statistics Cooperative Program. Accessed at http://wonder.cdc.gov/mcd-icd10.html on May 7, 2022 4:45:48 AM