In [ ]:

My Report:¶

This code comes from the Data Catalog from Delaware.gov.

Link:https://catalog.data.gov/dataset/death-rates-for-suicide-by-sex-race-hispanic-origin-and-age-united-states-020c1

First, I removed all rows with ‘All ages’ listed in the AGES column. This was 1624 instances of it in the rows, meaning 812 rows had it. For my project, I wanted to see all the different lines for each age, and this ‘All ages’ category appeared to do the work for me.

I removed all rows with ‘65 years and over’ listed in the AGES column. This was 2048 of the rows, meaning 1024 rows had it. Some rows had values in ESTIMATE, but like with the ‘All ages,’ I wanted to see all the different lines for each race and age–this appears to work for me.

I removed all rows with 85 years and over’ listed in the AGES column. This was 424 rows, meaning 212 rows had it. Some rows had values in ESTIMATE, but like with the ‘All ages,’ I wanted to see all the different lines for each race and age–this appears to work for me.

I removed all rows with ‘Hispanic or Latino: All races:’ in the STUB_LABEL category. There were 258 instances in the rows, so 129 rows had it. I wanted to look at individual races. Also, a bunch of these lines had no value in ESTIMATE.

I removed all rows that dealt with just ‘Sex, ‘' Age, ' and ‘Sex and age’ in the STUB_NAME category. This was a combined total of 1,513 rows. I wanted to deal with all three, so the ‘Sex, age, and race’ category was the only one that I kept.

I removed all 774 rows in ‘Not Hispanic or Latino: (insert race)’ in the STUB_Label category. Whether each race was Hispanic or Latino was of no importance to me.

I also removed the FLAG category. This was of no importance, as most rows didn’t have any data in this category anyway.

This left me with 1,177 usable rows, five categorical, and six quantitative columns.

Question: Given the demographic data of individuals I did (which included age, race, and sex), can we predict whether they are more likely to have a high or low risk of suicide? It seems like we can to an extent.

For a prediction of a quantitative outcome: Question: Can we predict the estimated suicide rate per 100,000 resident population based on singular demographic factors such as age, race, or sex? Would the plots look the same? I personally believe that more things have to be accounted for, that’s why I got rid of the less descriptive ones.