Skip to content

Summary of Noise Analysis

Anand Gavai edited this page Oct 22, 2015 · 6 revisions

Backround information:

Here we considered only the noise from the image data. Our purpose here is to investigate; if by using simple statistical statistics, are we able to identify patterns that would help us cluster similar images together. All these data are in binary format.

Following steps were carried:

Step 1. We had a visual inspection of these data. Data shows continuous uniform distribution. The variation appears to be subtle. Therefore we used Fourier transformed data to look at the visual patterns.

Step 2. Based on our initial analysis (without data transformatioin) we were not able to cluster the images together. Therefore our main problem here was to extract meaningful features from this dataset.

Step 3. We focused on simple statistics such as minimum, maximum, mean, median, standard deviation, skewness and kurtosis. There are several advantages to this approach fast computation time and reduced dimensionality

Step 4. We extracted these features initially focusing only on a subset of this data that represents Kodak Camera. These consisted of 304 image data.

Step 5. Ideally we should use HPC and perform spectrum analysis to perform feature selection. This would also enable us to extract frequency and period of each spectrum from within the distribution. During this sprint we used similar approximate measures “if not exact” in the form of “skewness”and “kurtosis”.

Step 6. We used two clustering algorithms, namely; Hierarchical and K-Means

Results of Hierarchical Cluster:

Image in Github

Results of K-Means Cluster:

a. Assumption based on 4 clusters: 87 63 43 111

b. Assumption based on 6 cluster : 15 50 63 39 75 62

Future Direction:

Based on current information we were not able to investigate spectrum statistics as provided by “seewave” package. We could use additional features from this package to enrich our dataset for more accurate clustering.

Partitioning of the distribution and performing summary statistics on each of them could also be considered as a next step.

R code to run this analysis is in the git repository

Clone this wiki locally