Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about temporal stream #19

Open
nandiya opened this issue Jun 6, 2018 · 5 comments
Open

about temporal stream #19

nandiya opened this issue Jun 6, 2018 · 5 comments

Comments

@nandiya
Copy link

nandiya commented Jun 6, 2018

sorry, i do not still quite understand in temporal stream code.
ufc video's length is different from each other, which makes it produce different length of frames, let's say video 1 produces 30 frames while video 2 produces 15 frames.
but it seems that in temporal stream code you just take 10 optical frames which means only 20 frames. does that mean the rest of the frames are useless?? how about video which generates fewer than 20 frames??

@sudonto
Copy link

sudonto commented Jul 5, 2018

hi @nandiya , have you figured it out the answer? if yes then please share here. thank you.

@stillbreeze
Copy link
Collaborator

You're confusing 20 channels with 20 frames.

At time t, the current frame is sent into the spatial stream as it is. The optical flow for t to t+10 frames is computed and stacked together as 20 channels (10*2 for x-y axis). Now this 20 channel input is used for the temporal stream. This produces a class score at each t and they are fused. The final video score is obtained by averaging over all frame scores.

Have a look at this paper: https://arxiv.org/pdf/1406.2199v2.pdf

@wadhwasahil
Copy link
Owner

Basically we stack consecutive 10 optical flow images and form a single 10*2 input (x,y) . If a video has less than 10 frames then we discard that video IMO. Can you confirm @stillbreeze ?

@stillbreeze
Copy link
Collaborator

stillbreeze commented Jul 6, 2018

There's no video with <10 frames. Even at 30 fps, 10 frames just means a 0.33 ms video!

@nandiya
Copy link
Author

nandiya commented Jul 18, 2018

I modified the code a little bit. in wadhwasahil code, it takes optical flow ( x and y) every 5 frames in 1 video and i still can't figure it out how to solve the different length videos. So i modified it a little bit for my thesis proposal (since my videos data are variance of lengths). I generate 1 video to many frames ( i don't care about the different length, which means let's say 1 have 4 videos which have length 3s, 3s, 4s , 5s. it could generate 124, 127, 143, 150 frames). let's say i wish to take optical flow (x and y) every 20 frames, it will be like this:

n = sum of frames( 1 video) % 20 (because i wish to take it every 20 fames)
for j in range( 1 , sum of frames - n , round((sum of frames-n)/20)):
-- do take the optical flow ( more or less is the same with wadhwasahil code, mine is a little bit longer since there are some problems with my OpenCV and PIL ).

explanation : 124%20 = 4 --> 120/(120/20) --> will get 20 optical flow ( x and y)
143%20 = 3 --> 140/(140/20) --> will get 20 optical flow ( x and y)
if you still don't understand try to imagine the math calculation by yourselves.

That way i could get the same sum of optical flow (x & y) every video, and i don't have to care about the different length. next i just i need to use the optical result to cnn^^.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants