Improve fake data generation #59

mrbrianevans · 2022-02-22T11:28:25Z

The fake data generated is currently sufficient to demonstrate the main visualisation of each of the supported files (with a few exceptions like YouTube Search History).
However, the data lacks the quality needed to demonstrate the analysis features such as: time series analysis, frequency analysis and topic extraction.
To properly demonstrate these features, at least some of the data needs to be based on real world data.
For example, some of the text in tweets could be randomly selected from a corpus of real world statements. This would better show the topic extraction feature and word frequency.
The time series analysis is not well demonstrated when using a uniform random distribution of dates. It would be better to either use some real world event dates or to generate them according to a different distribution, taking into account day of week etc.

Ideas for data sources:

public tweet corpus such as https://github.com/zfz/twitter_corpus/blob/master/full-corpus.csv
kaggle SMS corpus for whatsapp and telegram
youtube8m data set of youtube videos for search and watch history
most followed on instagram for instagram data

Signed-off-by: Brian Evans <[email protected]>

Switched to 24-hour clock cycle. Signed-off-by: Brian Evans <[email protected]>

mrbrianevans added the enhancement New feature or request label Feb 22, 2022

mrbrianevans added a commit that referenced this issue Apr 8, 2022

Added Spotify random test data generator GH-59

0d003a2

Signed-off-by: Brian Evans <[email protected]>

mrbrianevans added a commit that referenced this issue Apr 8, 2022

Fix Spotify random data times GH-59

fa77299

Switched to 24-hour clock cycle. Signed-off-by: Brian Evans <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve fake data generation #59

Improve fake data generation #59

mrbrianevans commented Feb 22, 2022

Improve fake data generation #59

Improve fake data generation #59

Comments

mrbrianevans commented Feb 22, 2022