1. Create Dummy Data for Classification
  2. Classify Dummy Data
  3. Breakdown of Metrics Included in Classification Report
  4. List of Other Classification Metrics Available in sklearn.metrics

1. Create Dummy Data for Classification

In [1]:
import seaborn as sns 
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.datasets import make_blobs

data, labels = make_blobs(n_samples=100, n_features=2, centers=2,cluster_std=4,random_state=2)

plt.scatter(data[:,0], data[:,1], c = labels, cmap='coolwarm');

2. Classify Data

In [2]:
#Import LinearSVC
from sklearn.svm import LinearSVC

#Create instance of Support Vector Classifier
svc = LinearSVC()

#Fit estimator to 70% of the data
svc.fit(data[:70], labels[:70])

#Predict final 30%
y_pred = svc.predict(data[70:])

#Establish true y values
y_true = labels[70:]

3. Breakdown of Metrics Included in Classification Report

Precision Score

TP – True Positives
FP – False Positives

Precision – Accuracy of positive predictions.
Precision = TP/(TP + FP)

In [3]:
from sklearn.metrics import precision_score

print("Precision score: {}".format(precision_score(y_true,y_pred)))
Precision score: 0.9285714285714286

Recall Score

FN – False Negatives

Recall (aka sensitivity or true positive rate): Fraction of positives That were correctly identified.
Recall = TP/(TP+FN)

In [4]:
from sklearn.metrics import recall_score

print("Recall score: {}".format(recall_score(y_true,y_pred)))
Recall score: 0.8666666666666667

F1 Score

F1 Score (aka F-Score or F-Measure) – A helpful metric for comparing two classifiers. F1 Score takes into account precision and the recall. It is created by finding the the harmonic mean of precision and recall.

F1 = 2 x (precision x recall)/(precision + recall)

In [5]:
from sklearn.metrics import f1_score

print("F1 Score: {}".format(f1_score(y_true,y_pred)))
F1 Score: 0.896551724137931

Classification Report

Report which includes Precision, Recall and F1-Score.

In [6]:
from sklearn.metrics import classification_report

print(classification_report(y_true,y_pred))
             precision    recall  f1-score   support

          0       0.88      0.93      0.90        15
          1       0.93      0.87      0.90        15

avg / total       0.90      0.90      0.90        30

Confusion Matrix

Confusion matrix allows you to look at the particular misclassified examples yourself and perform any further calculations as desired.

In [7]:
from sklearn.metrics import confusion_matrix
import pandas as pd

confusion_df = pd.DataFrame(confusion_matrix(y_true,y_pred),
             columns=["Predicted Class " + str(class_name) for class_name in [0,1]],
             index = ["Class " + str(class_name) for class_name in [0,1]])

print(confusion_df)
         Predicted Class 0  Predicted Class 1
Class 0                 14                  1
Class 1                  2                 13

4. List of Other Classification Metrics Available in sklearn.metrics

  • accuracy_score
  • auc
  • average_precision_score
  • brier_score_loss
  • cohen_kappa_score
  • dcg_score
  • fbeta_score
  • hamming_loss
  • hinge_loss
  • jaccard_similarity_score
  • log_loss
  • matthews_corrcoef
  • ndcg_score
  • precision_recall_curve
  • precision_recall_fscore_support
  • roc_auc_score
  • roc_curve
  • zero_one_loss

sklearn.metrics also offers Regression Metrics, Model Selection Scorer, Multilabel ranking metrics, Clustering Metrics, Biclustering metrics, and Pairwise metrics.

Categories: scikit-learn

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts

scikit-learn

Preparing Data – Scaling and Normalization

Most machine learning algorithms have a hard time dealing with features which contian values on a widely differeing scale. As a result, it is fairly important to scale our data before fitting and predicting. Most Read more…

scikit-learn

Dimensionality Reduction – PCA

Principal Component Analysis (PCA) offers an effective way to reduce the number of dimensions of the data. This reduction of data allows for improved training speeds for machine learning and easier visualization of the data. Read more…

scikit-learn

Metrics – Regression

This page briefly goes over the regression metrics found in scikit-learn. The metrics are first calculated with NumPy and then calculated using the higher level functions available in sklearn.metrics. 1. Generate data and fit with Read more…