Detection and removal of specific types of outliers present in different data formats, including contextual outliers from textual data using LOF, outliers from tabular numeric data using LOF, Gaussian noise from image data using NLM, and Gaussian noisy image frames from video data using autoencoder.
Synthetic textual data is generated based on a prompt and contaminated with 40% outliers (non-contextual data). LOF is used to detect and remove outlier sentences in the anomalous synthetic data.
Synthetic data is generated - scaled and unscaled, scored in the data frame, and contaminated with 40% outliers. LOF is used to detect and remove outlier sentences in the anomalous synthetic data.
Gaussian noise is added in intervals from 0 to 40% and filtered using Gaussian Filter and NLM (Non-Local Means) filter. NLM shows superior performance in terms of the SSIM metric. The standard SSIM score has been taken as 0.77.