Skip to content
This repository has been archived by the owner on Dec 24, 2022. It is now read-only.

Improve fake data generation #59

Open
mrbrianevans opened this issue Feb 22, 2022 · 0 comments
Open

Improve fake data generation #59

mrbrianevans opened this issue Feb 22, 2022 · 0 comments
Labels
enhancement New feature or request

Comments

@mrbrianevans
Copy link
Owner

The fake data generated is currently sufficient to demonstrate the main visualisation of each of the supported files (with a few exceptions like YouTube Search History).
However, the data lacks the quality needed to demonstrate the analysis features such as: time series analysis, frequency analysis and topic extraction.
To properly demonstrate these features, at least some of the data needs to be based on real world data.
For example, some of the text in tweets could be randomly selected from a corpus of real world statements. This would better show the topic extraction feature and word frequency.
The time series analysis is not well demonstrated when using a uniform random distribution of dates. It would be better to either use some real world event dates or to generate them according to a different distribution, taking into account day of week etc.

Ideas for data sources:

@mrbrianevans mrbrianevans added the enhancement New feature or request label Feb 22, 2022
mrbrianevans added a commit that referenced this issue Apr 8, 2022
mrbrianevans added a commit that referenced this issue Apr 8, 2022
Switched to 24-hour clock cycle.

Signed-off-by: Brian Evans <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant