In this post we introduce some similarity and dissimilarity measures. Most content come from the Data Mining from Vipin Kumar at University of Minnesota.

Distances

  • Euclidean Distance
  • Minkowski Distance
  • Mahalanobis Distance

Common Properties of a Distance

Similarity Between Binary Vectors

SMC versus Jaccard

Jaccard: Compute the overlap proportion.

Others

Cosine Similarity: a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them, independent of magnitude.

Correlations (linear relation)

Information based metrics

Mutual Information

Maximal Information Coefficient