-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
constraints of input datasets for each data-mining algorithm #13
Comments
@larjohn as described in mail, I will add the constraints of each data-mining algorithm in dam.json.
I changed the dam.json to following format,
The difficult part is how to understand the "Semantics meaning" of " time dimension with years as values", should us only check the "dimension title" to make sure it has "year" inside, or we have to focus on the value, guarantee the regex filter like "^(19|20)\d{2}$". Actually , how to write/read this condition, we have to listen more to your opinion. |
@wk0206 time dimensions should be found in the package query, with datetime dimension type:
Regarding the constraints, please put them inside the configuration, as each configuration (usually facts vs aggregates) could have different starting points, requiring different thins, so they can't have the same constraints every time. Also note, that most of the constraints might be better to be applied at DAM level, so that instead of requiring the algorithms with a generic request, indigo should request per dataset and only get the datasets that apply. A good strategy would be to cache the constraints analysis to avoid overhead. |
@larjohn why there is a random tail at each global key? e.g. what is the function of "__0eba1" for "global__organization__0eba1"? |
@larjohn let us take the data-mining function 'time series' as the example. The applicable datasets must have a dimension 'fiscalPeriod' and there shall be 3 different values in the dimension 'fiscalPeriod'. "time_series": { |
@HimmelStein sorry for the delay - I have been sick since last week... The 'random' tail ensures that datasets from the same region that have similar last URI parts get different name. I can't recall exactly what led me to this, but here is an example: http://datasets.obeu.com/athens/2016/expenditure In order to select a simple name for those two (not containing dashes etc.) one would use the last part, but it is the same here. So creating a hash of the URI and taking a part of it minimizes name clashes. The restriction seems good, give me some time to implement it in Indigo. |
@HimmelStein I can't find the updated dam.json. Can you check so that I can update the running instance on the Fraunhofer server? |
@larjohn we have not checked in. As we are waiting for your feedback to the format (conditions used for time series), see my last comment above (the json structue) |
@larjohn I update the dam.json , please check. |
So I revisited the constraints, here are my comments:
|
when a user select a dataset, and moves on to the data-mining service. Indigo shall only display data-mining algorithms which can be applied for the selected dataset.
so, please describe constraints of input datasets for your developed data-mining algorithm (send me through email before this Thursday).
The text was updated successfully, but these errors were encountered: