Skip to content

Latest commit

 

History

History

t-SNE

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

t-SNE (t-distributed Stochastic Neighbor Embedding)

To map/reduce high-dimensional data to a lower-dimensional (2D or 3D) space for visualization, while preserving and embedding significant structure in the original data.


Key Concepts

Term Definition
Similarity * The similarity of datapoint (x_J) to (x_I) is the conditional probability, p(J | I);
* That is, (x_I) would pick (x_J) as its neighbor if neighbors were picked in proportion to their probability density under a Gaussian or Student's t-distribution centered at (x_I).
Perplexity * A user-specified number of neighbors (a hyperparameter);
* Usually ranging between (5, 50);
* A balance between local and global aspects of the data

Implmentation

Steps Description
#1 In the high-dimensional space, convert high-dimensional Euclidean distances to similarity under a Gaussian distribution, while constrained by the perplexity parameter
#2 In the low-dimensional space, compute the similarity between two points under a Student's t-distribution (df=1) rather than a Gaussian, to avoid the "crowding problem"
#3 * To reduce the mismatch between the corresponding high and low dimension conditional probabilities, minimize the sum of Kullback-Leibler (KL) divergences over all datapoints using a gradient descent method;
* The cost function favors retaining nearby map points, preserving local structure in the data.

Example 1: Credit Card Transactions

Using a dataset of credit card transactions, I performed my own variation of a R code.

The dataset has 284,807 transactions (492 fraud transactions and 284,315 legitimate transactions) and 29 feature variables. Using perplexity = 29, 1,000 iterations, and a balanced dataset (492 fraud transactions and 492 legitimate transactions), the t-SNE algorithm reduces the complicated higher-dimensional relationships between the 29 feature variables to a 2D space:


Note. t-SNE preserves much of the distinction between the two classes using the 29 feature variables without knowing the Class variable.


Example 2: The Digits Dataset

Using the digits dataset, I ran my modified version of t-SNE Python code. The results are as following:


Comparison

Algorithm Implementation for visualization Preserving structure in the original data
PCA (Principal component analysis) 1. Identifying a directional axis that results in the highest variance of the high-dimensional data (1st PC);
2. Linearly projecting the high dimensional dataset into a lower dimensional space as described by the 1st PC and its orthagonal direction;
3. Visualizing the transformed data in 2D.
As a linear dimension reduction technique, PCA is unable to preserve complex relationships between features in the original data.
t-SNE 1. Creating a 2D embedding of the high-dimensional data;
2. Selecting the points' locations in the 2D space by using a probability distribution proportional to a similarity measure of two data points in the original data;
3. Minimizing the KL divergence between the probability distribution of the 2D embedding and the original data.
* Able to well capture much of the local structure of the high-dimensional data;
* Able to reveal global structure such as the presence of clusters at several scales.

References