In [ ]:
 
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
/var/folders/gc/0752xrm56pnf0r0dsrn5370c0000gr/T/ipykernel_71186/555797462.py:1: DeprecationWarning: 
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd
In [2]:
#Loading the dataset
data = pd.read_csv("Death_rates_for_suicide__by_sex__race__Hispanic_origin__and_age__United_States.csv")

What this code snippit does:¶

It provides both numerical summary statistics and visual representations (histograms) of the distribution of each numerical column in a dataset. It begins by generating summary statistics using the describe() method. Then, it creates histograms for each numerical column, displaying the frequency distribution of values with specified bin sizes and optional kernel density estimates.

In [3]:
# Display numerical summary
numerical_summary = data.describe()
print("Numerical Summary:")
print(numerical_summary)

# Visualize numerical columns
numerical_columns = data.select_dtypes(include=['int64', 'float64']).columns
for column in numerical_columns:
    plt.figure(figsize=(8, 6))
    sns.histplot(data[column], bins=20, kde=True)
    plt.title(f"Distribution of {column}")
    plt.xlabel(column)
    plt.ylabel("Frequency")
    plt.show()
Numerical Summary:
       UNIT_NUM  STUB_NAME_NUM  STUB_LABEL_NUM         YEAR     YEAR_NUM  \
count    1176.0         1176.0     1176.000000  1176.000000  1176.000000   
mean        2.0            5.0        5.169719  1996.214286    21.500000   
std         0.0            0.0        0.051632    14.948365    12.126075   
min         2.0            5.0        5.112000  1950.000000     1.000000   
25%         2.0            5.0        5.123750  1987.000000    11.000000   
50%         2.0            5.0        5.142500  1997.500000    21.500000   
75%         2.0            5.0        5.223250  2008.000000    32.000000   
max         2.0            5.0        5.244000  2018.000000    42.000000   

           AGE_NUM     ESTIMATE  
count  1176.000000  1012.000000  
mean      3.307143    14.309585  
std       1.067134    11.471759  
min       2.000000     1.200000  
25%       2.000000     5.400000  
50%       3.000000    10.850000  
75%       4.000000    20.725000  
max       5.200000    61.900000  
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
In [ ]:
 

What this code snippit does:¶

It provides a categorical summary of the dataset by iterating over categorical columns. For each categorical column, it prints the value counts of unique categories and their corresponding frequencies. Also, if the column has less than 10 unique values, it creates a pie chart to visualize the distribution of categories. The pie chart displays each category's proportion relative to the total count, with percentages labeled on each slice.

In [4]:
# Display categorical summary
categorical_columns = data.select_dtypes(include=['object']).columns
for column in categorical_columns:
    print(f"Value counts for {column}:")
    print(data[column].value_counts())
    print()

    # Create pie chart for categorical columns with less than or equal to 10 unique values
    if len(data[column].unique()) <= 5:
        plt.figure(figsize=(8, 6))
        counts = data[column].value_counts()
        plt.pie(counts, labels=counts.index, autopct='%1.1f%%', startangle=140)
        plt.title(f"Pie Chart of {column}")
        plt.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
        plt.show()
Value counts for INDICATOR:
INDICATOR
Death rates for suicide    1176
Name: count, dtype: int64

No description has been provided for this image
Value counts for UNIT:
UNIT
Deaths per 100,000 resident population, crude    1176
Name: count, dtype: int64

No description has been provided for this image
Value counts for STUB_NAME:
STUB_NAME
Sex, age and race    1176
Name: count, dtype: int64

No description has been provided for this image
Value counts for STUB_LABEL:
STUB_LABEL
Female: Black or African American: 45-64 years           43
Male: Black or African American: 45-64 years             43
Male: White: 15-24 years                                 42
Male: White: 25-44 years                                 42
Female: Asian or Pacific Islander: 25-44 years           42
Female: Asian or Pacific Islander: 15-24 years           42
Female: American Indian or Alaska Native: 25-44 years    42
Female: American Indian or Alaska Native: 15-24 years    42
Female: Black or African American: 25-44 years           42
Female: Black or African American: 15-24 years           42
Female: White: 45-64 years                               42
Female: White: 25-44 years                               42
Female: White: 15-24 years                               42
Male: Asian or Pacific Islander: 45-64 years             42
Male: Asian or Pacific Islander: 25-44 years             42
Male: Asian or Pacific Islander: 15-24 years             42
Male: American Indian or Alaska Native: 25-44 years      42
Male: American Indian or Alaska Native: 15-24 years      42
Male: Black or African American: 75-84 years             42
Male: Black or African American: 65-74 years             42
Male: Black or African American: 25-44 years             42
Male: Black or African American: 15-24 years             42
Male: White: 75-84 years                                 42
Male: White: 65-74 years                                 42
Male: White: 45-64 years                                 42
Female: Asian or Pacific Islander: 45-64 years           42
Male: American Indian or Alaska Native: 45-64 years      41
Female: American Indian or Alaska Native: 45-64 years    41
Name: count, dtype: int64

Value counts for AGE:
AGE
15-24 years    336
25-44 years    336
45-64 years    336
65-74 years     84
75-84 years     84
Name: count, dtype: int64

No description has been provided for this image