Help implementing "Multivariate Time Series Representation Learning" paper #2059

sc12752 · 2022-10-03T09:08:46Z

sc12752
Oct 3, 2022

Hi,

I'm trying to implement a classifier part from Transformer-based Framework for Multivariate Time Series Representation Learning paper using DJL, my current code can be found in this github repo, the most important parts there are SupervisedClassifier block and dataset test.

While doing this I follow a tsai Python implementation using the same dataset and parameters they have chosen in that collab, a relevant tsai source code can be found here.

The general paper idea is as follows: use a self-attention encoder part from "Attention is all you need" paper, then flatten resulting features and use a final feed-forward layer as a classifier.

My problem is it seems that I have implemented everything right, but it does not work well enough. In particular, while training on tsai implementation an accuracy starts at 0.5 and monotonically rises to 0.7 after 100 epochs. In my current implementation accuracy starts at 0.5, jumps between 0.5 and ~0.62 as training is progressing and then ends at about ~0.5 again after 100 epochs.

At this point I don't have an idea what I'm doing wrong and looking for maybe some tips on what I could be missing in my implementation.

For reference, here's tsai model printout:

TST(
  (W_P): Linear(in_features=144, out_features=128, bias=True)
  (dropout): Dropout(p=0.3, inplace=False)
  (encoder): _TSTEncoder(
    (layers): ModuleList(
      (0): _TSTEncoderLayer(
        (self_attn): _MultiHeadAttention(
          (W_Q): Linear(in_features=128, out_features=128, bias=False)
          (W_K): Linear(in_features=128, out_features=128, bias=False)
          (W_V): Linear(in_features=128, out_features=128, bias=False)
          (W_O): Linear(in_features=128, out_features=128, bias=False)
        )
        (dropout_attn): Dropout(p=0.3, inplace=False)
        (batchnorm_attn): Sequential(
          (0): Transpose(1, 2)
          (1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): Transpose(1, 2)
        )
        (ff): Sequential(
          (0): Linear(in_features=128, out_features=256, bias=True)
          (1): GELU()
          (2): Dropout(p=0.3, inplace=False)
          (3): Linear(in_features=256, out_features=128, bias=True)
        )
        (dropout_ffn): Dropout(p=0.3, inplace=False)
        (batchnorm_ffn): Sequential(
          (0): Transpose(1, 2)
          (1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): Transpose(1, 2)
        )
      )
    )
  )
  (head): Sequential(
    (0): GELU()
    (1): fastai.layers.Flatten(full=False)
    (2): Dropout(p=0.9, inplace=False)
    (3): Linear(in_features=7936, out_features=2, bias=True)
  )
)

And here's my current classifier printout:

SupervisedClassifier(64, 62, 144) {
	projection(64, 62, 144) -> (64, 62, 128)
	posEncoder(64, 62, 128) -> (64, 62, 128)
	selfAttEncoder(64, 62, 128) {
		selfAttention(64, 62, 128) {
			keyProjection(-1, 128) -> (-1, 128)
			queryProjection(-1, 128) -> (-1, 128)
			valueProjection(-1, 128) -> (-1, 128)
			resultProjection(-1, 128) -> (-1, 128)
			probabilityDropout(-1, 128) -> (-1, 128)
		} -> (64, 62, 128)
		attentionNorm(64, 62, 128) -> (64, 62, 128)
		outputBlock(64, 62, 128) {
			Linear(64, 62, 128) -> (64, 62, 256)
			LambdaBlock(64, 62, 256) -> (64, 62, 256)
			Linear(64, 62, 256) -> (64, 62, 128)
		} -> (64, 62, 128)
		outputNorm(64, 62, 128) -> (64, 62, 128)
	} -> (64, 62, 128)
	batchFlatten(64, 62, 128) -> (64, 7936)
	output(7936) -> (2)
} -> (2)

Thanks!

sc12752 · 2022-10-04T13:42:03Z

sc12752
Oct 4, 2022
Author

Problem got solved by setting dropout rate in TransformerEncoderBlock to 1F (no dropout during training), this boosted accuracy to ~0.69 with a single encoder block which is comparable to tsai implementation results.

I assume probability dropout is harmful when applied to time series, especially given that another dropout gets applied once self-attention results are flattened.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help implementing "Multivariate Time Series Representation Learning" paper #2059

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Help implementing "Multivariate Time Series Representation Learning" paper #2059

sc12752 Oct 3, 2022

Replies: 1 comment

sc12752 Oct 4, 2022 Author

sc12752
Oct 3, 2022

sc12752
Oct 4, 2022
Author