import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
/var/folders/gc/0752xrm56pnf0r0dsrn5370c0000gr/T/ipykernel_71186/555797462.py:1: DeprecationWarning: Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0), (to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries) but was not found to be installed on your system. If this would cause problems for you, please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466 import pandas as pd
#Loading the dataset
data = pd.read_csv("Death_rates_for_suicide__by_sex__race__Hispanic_origin__and_age__United_States.csv")
What this code snippit does:¶
It provides both numerical summary statistics and visual representations (histograms) of the distribution of each numerical column in a dataset. It begins by generating summary statistics using the describe() method. Then, it creates histograms for each numerical column, displaying the frequency distribution of values with specified bin sizes and optional kernel density estimates.
# Display numerical summary
numerical_summary = data.describe()
print("Numerical Summary:")
print(numerical_summary)
# Visualize numerical columns
numerical_columns = data.select_dtypes(include=['int64', 'float64']).columns
for column in numerical_columns:
plt.figure(figsize=(8, 6))
sns.histplot(data[column], bins=20, kde=True)
plt.title(f"Distribution of {column}")
plt.xlabel(column)
plt.ylabel("Frequency")
plt.show()
Numerical Summary: UNIT_NUM STUB_NAME_NUM STUB_LABEL_NUM YEAR YEAR_NUM \ count 1176.0 1176.0 1176.000000 1176.000000 1176.000000 mean 2.0 5.0 5.169719 1996.214286 21.500000 std 0.0 0.0 0.051632 14.948365 12.126075 min 2.0 5.0 5.112000 1950.000000 1.000000 25% 2.0 5.0 5.123750 1987.000000 11.000000 50% 2.0 5.0 5.142500 1997.500000 21.500000 75% 2.0 5.0 5.223250 2008.000000 32.000000 max 2.0 5.0 5.244000 2018.000000 42.000000 AGE_NUM ESTIMATE count 1176.000000 1012.000000 mean 3.307143 14.309585 std 1.067134 11.471759 min 2.000000 1.200000 25% 2.000000 5.400000 50% 3.000000 10.850000 75% 4.000000 20.725000 max 5.200000 61.900000
What this code snippit does:¶
It provides a categorical summary of the dataset by iterating over categorical columns. For each categorical column, it prints the value counts of unique categories and their corresponding frequencies. Also, if the column has less than 10 unique values, it creates a pie chart to visualize the distribution of categories. The pie chart displays each category's proportion relative to the total count, with percentages labeled on each slice.
# Display categorical summary
categorical_columns = data.select_dtypes(include=['object']).columns
for column in categorical_columns:
print(f"Value counts for {column}:")
print(data[column].value_counts())
print()
# Create pie chart for categorical columns with less than or equal to 10 unique values
if len(data[column].unique()) <= 5:
plt.figure(figsize=(8, 6))
counts = data[column].value_counts()
plt.pie(counts, labels=counts.index, autopct='%1.1f%%', startangle=140)
plt.title(f"Pie Chart of {column}")
plt.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.
plt.show()
Value counts for INDICATOR: INDICATOR Death rates for suicide 1176 Name: count, dtype: int64
Value counts for UNIT: UNIT Deaths per 100,000 resident population, crude 1176 Name: count, dtype: int64
Value counts for STUB_NAME: STUB_NAME Sex, age and race 1176 Name: count, dtype: int64
Value counts for STUB_LABEL: STUB_LABEL Female: Black or African American: 45-64 years 43 Male: Black or African American: 45-64 years 43 Male: White: 15-24 years 42 Male: White: 25-44 years 42 Female: Asian or Pacific Islander: 25-44 years 42 Female: Asian or Pacific Islander: 15-24 years 42 Female: American Indian or Alaska Native: 25-44 years 42 Female: American Indian or Alaska Native: 15-24 years 42 Female: Black or African American: 25-44 years 42 Female: Black or African American: 15-24 years 42 Female: White: 45-64 years 42 Female: White: 25-44 years 42 Female: White: 15-24 years 42 Male: Asian or Pacific Islander: 45-64 years 42 Male: Asian or Pacific Islander: 25-44 years 42 Male: Asian or Pacific Islander: 15-24 years 42 Male: American Indian or Alaska Native: 25-44 years 42 Male: American Indian or Alaska Native: 15-24 years 42 Male: Black or African American: 75-84 years 42 Male: Black or African American: 65-74 years 42 Male: Black or African American: 25-44 years 42 Male: Black or African American: 15-24 years 42 Male: White: 75-84 years 42 Male: White: 65-74 years 42 Male: White: 45-64 years 42 Female: Asian or Pacific Islander: 45-64 years 42 Male: American Indian or Alaska Native: 45-64 years 41 Female: American Indian or Alaska Native: 45-64 years 41 Name: count, dtype: int64 Value counts for AGE: AGE 15-24 years 336 25-44 years 336 45-64 years 336 65-74 years 84 75-84 years 84 Name: count, dtype: int64