# We need pandas for data manipulation, sklearn for machine learning algorithms and metrics.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
import warnings
# Suppress runtime warnings
warnings.filterwarnings("ignore", category=RuntimeWarning)
# We use pandas to load the CSV file into a DataFrame.
df = pd.read_csv('crime_data.csv')
# We convert dates to datetime format and then to Unix timestamp (a numerical format).
# We also convert categorical variables to numeric ones using LabelEncoder.
# We drop rows with missing values as they can cause issues with many machine learning algorithms.
df['Date Rptd'] = pd.to_datetime(df['Date Rptd'])
df['DATE OCC'] = pd.to_datetime(df['DATE OCC'])
df['Date Rptd'] = df['Date Rptd'].apply(lambda x: x.timestamp())
df['DATE OCC'] = df['DATE OCC'].apply(lambda x: x.timestamp())
le = LabelEncoder()
df['Crm Cd Desc'] = le.fit_transform(df['Crm Cd Desc'].astype(str))
for col in df.columns:
if df[col].dtype == 'object' and col != 'Crm Cd Desc':
df[col] = LabelEncoder().fit_transform(df[col].astype(str))
df = df.dropna()
# We split the data into features (X) and target (y), and then into training set and test set.
X = df.drop('Crm Cd Desc', axis=1)
y = df['Crm Cd Desc']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# We initialize the RandomForestClassifier and train it on the training set.
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
RandomForestClassifier(random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
RandomForestClassifier(random_state=42)
# We make predictions on the test set and print a classification report to evaluate the model's performance.
y_pred = model.predict(X_test)
# Create a dictionary that maps labels to descriptions
label_to_desc = dict(zip(le.transform(le.classes_), le.classes_))
# Replace labels with descriptions in y_test and y_pred
y_test_desc = [label_to_desc[label] for label in y_test]
y_pred_desc = [label_to_desc[label] for label in y_pred]
# Print a classification report with actual descriptions
print(classification_report(y_test_desc, y_pred_desc, zero_division=1))
precision recall f1-score support ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT 1.00 0.00 0.00 1 BATTERY - SIMPLE ASSAULT 1.00 1.00 1.00 1 BRANDISH WEAPON 1.00 1.00 1.00 4 INTIMATE PARTNER - AGGRAVATED ASSAULT 1.00 1.00 1.00 1 KIDNAPPING 1.00 0.00 0.00 1 ORAL COPULATION 1.00 0.00 0.00 2 ROBBERY 0.50 1.00 0.67 1 SEXUAL PENETRATION W/FOREIGN OBJECT 0.00 1.00 0.00 0 SODOMY/SEXUAL CONTACT B/W PENIS OF ONE PERS TO ANUS OTH 1.00 0.00 0.00 1 accuracy 0.58 12 macro avg 0.83 0.56 0.41 12 weighted avg 0.96 0.58 0.56 12
Here’s what each term means:
Precision: This is the ability of the classifier not to label a positive sample as negative. It’s calculated as the number of true positives (TP) over the number of true positives plus the number of false positives (FP).
Recall: This is the ability of the classifier to find all the positive samples. It can be calculated as the number of true positives (TP) over the number of true positives plus the number of false negatives (FN).
F1-Score: This is the weighted harmonic mean of precision and recall. The best possible F1-score would be 1.0 and the worst would be 0.0. F1-score is a good way to summarize the evaluation of the model into a single number.
Support: This is the number of samples of the true response that lie in that class.
The ‘macro avg’ row calculates the metric for each class and then takes the average (hence treating all classes equally), whereas the ‘weighted avg’ row calculates the metric for each class and takes the average weighted by the number of true instances for each class.
The ‘accuracy’ row is the ratio of correct predictions to total predictions.
In your case, the model has an accuracy of 0.58, which means it made correct predictions for 58% of the input samples
The classification report you have here provides a detailed breakdown of your model's performance for each class in your dataset. Here's what we can interpret from the results:
Precision: This is the ratio of true positives to the sum of true positives and false positives. A high precision indicates a low false positive rate. For example, the model has a precision of 1.00 for 'BATTERY - SIMPLE ASSAULT', which means it correctly identified all instances of this class and did not mistakenly classify any other instances as this class.
Recall: This is the ratio of true positives to the sum of true positives and false negatives. A high recall indicates a low false negative rate. For example, the model has a recall of 1.00 for 'BRANDISH WEAPON', which means it correctly identified all instances of this class and did not miss any.
F1-score: This is the harmonic mean of precision and recall, and it tries to balance the two. An F1 score of 1 is perfect, and 0 means that either the precision or the recall is zero. For example, the model has an F1 score of 1.00 for 'INTIMATE PARTNER - AGGRAVATED ASSAULT', indicating a good balance of precision and recall for this class.
Support: This is the number of actual occurrences of the class in the test data set. For example, there is only 1 instance of 'ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT' in your test data.
Accuracy: This is the ratio of correct predictions to total predictions. The overall accuracy of your model is 0.58, which means it made correct predictions for 58% of the instances in the test data.
Macro Avg: This is the average of the unweighted mean per label.
Weighted Avg: This is the average of the support-weighted mean per label.
From the results, it seems like the model is performing well for some classes ('BATTERY - SIMPLE ASSAULT', 'BRANDISH WEAPON', 'INTIMATE PARTNER - AGGRAVATED ASSAULT'), but not for others ('ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT', 'KIDNAPPING', 'ORAL COPULATION'). This could be due to a variety of factors, such as class imbalance in your training data, or it could be that your model needs further tuning or a different approach.