*Confusion Matrix* is a matrix built for binary classification problems.
It is an important starting tool in understanding how well a binary
classifier is performing and provides a whole bunch of metrics to be
analysed and compared.

Here, I present an intuitive visualization given that most of the times the definition gets confusing.

Before we go ahead and read the visualization, let us remember the definitions.

*True Negatives*- All samples that were identified as negative labels and were truly negative*False Negatives*- All samples that were identified as negative labels and were in fact positive*True Positives*- All samples that were identified as positive labels and were truly positive*False Positives*- All samples that were identified as positive labels and were in fact negative

Now, each array in the visualization above specifies the name of the metric that we are going to measure, and the start point of each ray represents the numerator of that metric and the span of the ray represents the summation of the adjacent terms. Note that each metric is essentially a fraction.

Let us read the most popular ones from the visualization.

\[ \text{Recall} = \frac{TP}{TP+FN} \] \[ \text{Precision} = \frac{TP}{TP+FP} \]

These metrics come in handy when trying to determine the best threshold to separate the positive classes from the negative classes in a binary classification problem.

For instance, a popular trade-off is the *precision-recall trade-off* which
is realized in the graph below. Precision tends to be more wriggly by nature.

More simply we might just choose a *Precision v/s Recall Curve*. This curve
shows that we still have scope for improvement towards the right as it
suddenly shows a dip in precision with increase in recall.

Or another popular curve called the ROC-Curve which maps between the
*True Positive Rate* and *False Positive Rate*. It can also be seen
as the *Sensitivity v/s 1-Specificity*. The closer this curve is
to the left-top corner, the better the classifier. Or alternatively,
the closer the curve is to the center line, the more likely it is to be
just as good as a random classifier.

The scope of what is useful when is more sample dependent but these curves should be a good starting point in the analysis of the first binary classifier that one builds.

If you see mistakes or want to suggest changes, please create issues or revisions against source.