Hello, could you tell me how to calculate this value, std_split ? #1

Mr-Wu-H · 2022-04-13T08:31:17Z

std_split : float
the standard deviation from the mean to subdivide the time series of a class into subclasses.

benibaeumle · 2022-04-13T09:32:44Z

Hello,
in their paper the authors just write Next, we calculate the standard deviation value of these adjacent discrepancies. At last, we can separate the data into subclasses by splitting at the sequence that has difference larger than half of the computed standard deviation..
Unfortunately, I have no justified argument for you on how to best choose a value for std_split. The obvious thing is, with higher values you sample fewer time series as you pack more and more distinct time series into the same subclass.
Maybe visualizing the adjacent discrepancies along with different split values might give you an indication.

Mr-Wu-H · 2022-04-13T09:40:22Z

Thank you very much for your reply.

…

------------------ 原始邮件 ------------------ 发件人: "benibaeumle/FSS-Algorithm" ***@***.***>; 发送时间: 2022年4月13日(星期三) 下午5:32 ***@***.***>; ***@***.******@***.***>; 主题: Re: [benibaeumle/FSS-Algorithm] Hello, could you tell me how to calculate this value, std_split ? (Issue #1) Hello, in their paper the authors just write Next, we calculate the standard deviation value of these adjacent discrepancies. At last, we can separate the data into subclasses by splitting at the sequence that has difference larger than half of the computed standard deviation.. Unfortunately, I have no justified argument for you on how to best choose a value for std_split. The obvious thing is, with higher values you sample fewer time series as you pack more and more distinct time series into the same subclass. Maybe visualizing the adjacent discrepancies along with different split values might give you an indication. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

Mr-Wu-H · 2022-04-25T11:51:39Z

Hello, in their paper the authors just write Next, we calculate the standard deviation value of these adjacent discrepancies. At last, we can separate the data into subclasses by splitting at the sequence that has difference larger than half of the computed standard deviation.. Unfortunately, I have no justified argument for you on how to best choose a value for std_split. The obvious thing is, with higher values you sample fewer time series as you pack more and more distinct time series into the same subclass. Maybe visualizing the adjacent discrepancies along with different split values might give you an indication.

Hello, this sentence“we can separate the data into subclasses by splitting at the sequence that has difference larger than half of the computed standard deviation” means that remove the part greater than standard deviation and the rest time series as the sample time series.Am I right?

Mr-Wu-H · 2022-04-25T12:00:33Z

Hello, this method FastShapeletCandidates is to get shapelet candidates of one class, right?

benibaeumle · 2022-04-25T21:09:29Z

Hello, in their paper the authors just write Next, we calculate the standard deviation value of these adjacent discrepancies. At last, we can separate the data into subclasses by splitting at the sequence that has difference larger than half of the computed standard deviation.. Unfortunately, I have no justified argument for you on how to best choose a value for std_split. The obvious thing is, with higher values you sample fewer time series as you pack more and more distinct time series into the same subclass. Maybe visualizing the adjacent discrepancies along with different split values might give you an indication.

Hello, this sentence“we can separate the data into subclasses by splitting at the sequence that has difference larger than half of the computed standard deviation” means that remove the part greater than standard deviation and the rest time series as the sample time series.Am I right?

I am not sure if I understand you correctly. Please, see the paper chapter 3.1 for how this particular step is computed (I do not have Latex support when answering here, so having a look on the paper should be more comfortable for you). But in words, what is computed is:

Calculate the sum of the time steps of each time series
Calculate the mean over the sums
Select the time series which is closest to the mean over the sums
Calculate the euclidean distances of each time series to the time series we selected in 3. and sort the resulting list of distances
Calculate the standard deviation of the differences between each pair of neighboring distances
Now, for each pair in the sorted list of distances we check if the difference is larger than 1.5x the standard deviation we calculated in 5.
If the standard deviation is larger than 1.5 we consider the time series between the last split point and the current split point as a subclass.
Repeat 7 until we iterated over all neighboring distance pairs

The result after computing the 8 steps above is the set of subclasses.

benibaeumle · 2022-04-25T21:09:48Z

Hello, this method FastShapeletCandidates is to get shapelet candidates of one class, right?

Yes.

Mr-Wu-H · 2022-04-26T00:16:46Z

Thanks a lot.

Mr-Wu-H · 2022-04-26T00:22:02Z

Hello, in their paper the authors just write Next, we calculate the standard deviation value of these adjacent discrepancies. At last, we can separate the data into subclasses by splitting at the sequence that has difference larger than half of the computed standard deviation.. Unfortunately, I have no justified argument for you on how to best choose a value for std_split. The obvious thing is, with higher values you sample fewer time series as you pack more and more distinct time series into the same subclass. Maybe visualizing the adjacent discrepancies along with different split values might give you an indication.

Hello, this sentence“we can separate the data into subclasses by splitting at the sequence that has difference larger than half of the computed standard deviation” means that remove the part greater than standard deviation and the rest time series as the sample time series.Am I right?

I am not sure if I understand you correctly. Please, see the paper chapter 3.1 for how this particular step is computed (I do not have Latex support when answering here, so having a look on the paper should be more comfortable for you). But in words, what is computed is:

Calculate the sum of the time steps of each time series

Calculate the mean over the sums

Select the time series which is closest to the mean over the sums

Calculate the euclidean distances of each time series to the time series we selected in 3. and sort the resulting list of distances

Calculate the standard deviation of the differences between each pair of neighboring distances

Now, for each pair in the sorted list of distances we check if the difference is larger than 1.5x the standard deviation we calculated in 5.

If the standard deviation is larger than 1.5 we consider the time series between the last split point and the current split point as a subclass.

Repeat 7 until we iterated over all neighboring distance pairs

The result after computing the 8 steps above is the set of subclasses.

Hello，how should I understand the last split point and the current split point in step 7?

benibaeumle · 2022-04-27T08:11:55Z

See here.

Mr-Wu-H · 2022-04-27T13:23:38Z

See here.

Thanks a lot.In your demo,the data set ,fordA_sample, will generate 6300 shapelets.Do you know how to remove those that may overlap shapelets to reduce time complexity?

benibaeumle closed this as completed Apr 23, 2022

benibaeumle reopened this Apr 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hello, could you tell me how to calculate this value, std_split ? #1

Hello, could you tell me how to calculate this value, std_split ? #1

Mr-Wu-H commented Apr 13, 2022

benibaeumle commented Apr 13, 2022

Mr-Wu-H commented Apr 13, 2022 via email

Mr-Wu-H commented Apr 25, 2022

Mr-Wu-H commented Apr 25, 2022

benibaeumle commented Apr 25, 2022

benibaeumle commented Apr 25, 2022

Mr-Wu-H commented Apr 26, 2022

Mr-Wu-H commented Apr 26, 2022

benibaeumle commented Apr 27, 2022

Mr-Wu-H commented Apr 27, 2022

Hello, could you tell me how to calculate this value, std_split ? #1

Hello, could you tell me how to calculate this value, std_split ? #1

Comments

Mr-Wu-H commented Apr 13, 2022

benibaeumle commented Apr 13, 2022

Mr-Wu-H commented Apr 13, 2022 via email

Mr-Wu-H commented Apr 25, 2022

Mr-Wu-H commented Apr 25, 2022

benibaeumle commented Apr 25, 2022

benibaeumle commented Apr 25, 2022

Mr-Wu-H commented Apr 26, 2022

Mr-Wu-H commented Apr 26, 2022

benibaeumle commented Apr 27, 2022

Mr-Wu-H commented Apr 27, 2022