-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
about temporal stream #19
Comments
hi @nandiya , have you figured it out the answer? if yes then please share here. thank you. |
You're confusing 20 channels with 20 frames. At time t, the current frame is sent into the spatial stream as it is. The optical flow for t to t+10 frames is computed and stacked together as 20 channels (10*2 for x-y axis). Now this 20 channel input is used for the temporal stream. This produces a class score at each t and they are fused. The final video score is obtained by averaging over all frame scores. Have a look at this paper: https://arxiv.org/pdf/1406.2199v2.pdf |
Basically we stack consecutive 10 optical flow images and form a single 10*2 input (x,y) . If a video has less than 10 frames then we discard that video IMO. Can you confirm @stillbreeze ? |
There's no video with <10 frames. Even at 30 fps, 10 frames just means a 0.33 ms video! |
I modified the code a little bit. in wadhwasahil code, it takes optical flow ( x and y) every 5 frames in 1 video and i still can't figure it out how to solve the different length videos. So i modified it a little bit for my thesis proposal (since my videos data are variance of lengths). I generate 1 video to many frames ( i don't care about the different length, which means let's say 1 have 4 videos which have length 3s, 3s, 4s , 5s. it could generate 124, 127, 143, 150 frames). let's say i wish to take optical flow (x and y) every 20 frames, it will be like this: n = sum of frames( 1 video) % 20 (because i wish to take it every 20 fames) explanation : 124%20 = 4 --> 120/(120/20) --> will get 20 optical flow ( x and y) That way i could get the same sum of optical flow (x & y) every video, and i don't have to care about the different length. next i just i need to use the optical result to cnn^^. |
sorry, i do not still quite understand in temporal stream code.
ufc video's length is different from each other, which makes it produce different length of frames, let's say video 1 produces 30 frames while video 2 produces 15 frames.
but it seems that in temporal stream code you just take 10 optical frames which means only 20 frames. does that mean the rest of the frames are useless?? how about video which generates fewer than 20 frames??
The text was updated successfully, but these errors were encountered: