UMAP (Uniform Manifold Approximation and Projection)
To project higher-dimensional info onto a lower-dimensional space (2D or 3D) for visualization, while preserving the global structure.
Note: Compared to t-SNE, UMAP "arguably preserves more of the global structure with superior run time performance".
1. Credit fraud data (my version of R code)
The dataset has 284,807 transactions (492 fraud transactions and 284,315 legitimate transactions) and 29 feature variables. Using a balanced dataset (492 fraud transactions and 492 legitimate transactions), the UMAP algorithm projects the complicated higher-dimensional relationships between the 29 feature variables to a 2D space, while preserving the global structure:
Note. UMAP preserves much of the distinction between the two classes using the 29 feature variables without knowing the Class variable.
2. Penguin data (Python code)
Higher dimensions include the following:
- culmen (bill) length (mm)
- culmen (bill) depth (mm)
- flipper length (mm)
- body mass (g)
3. Digits data (Python code)
Higher dimensions include 8x8 images of integer pixels in the range 0...16.