The analysis of high-dimensional data often relies on two-dimensional visualizations, commonly generated using t-distributed stochastic neighbor embedding (t-SNE). However, when dealing with large data sets, these visualizations may not be optimal due to unsuitable hyperparameters. Increasing these parameters often leads to computationally expensive workflows. In this article, we propose a sampling-based embedding approach that can overcome these issues. We demonstrate the importance of carefully selecting hyperparameters based on the sampling rate and desired final embedding. Additionally, we show how this approach improves computation speed and enhances the quality of the embeddings.
Enhancing perplexity tuning and computing sampling-based t-SNE embeddings
by instadatahelp | Aug 31, 2023 | AI Blogs