Add dataloader and Semcova data #10

LittlePea13 · 2020-03-31T11:46:00Z

Added pytorch dataset and dataloader.
Added Semcova data.

Sample data

MalcolmMielle · 2020-03-31T12:27:45Z

Excellent! I have a couple of questions before validating the PR though.

Referencing this issue you're saying that each video processing step takes around 3min.. What does that include exactly? Loading all the frames and calculating the ppg for a single video?

MalcolmMielle · 2020-03-31T13:19:44Z

Hi again,

I got a little time to look into the dataloader you've implemented. It's really nice, I have never used pytorch for this but it seems useful. Since the function to calculate the PPG is done in numpy and openCV however, I could look into that a little more.

One of the reason the method is slow to load (I think) is due to the calculation of the mean and std with an axis argument. When I timed it it was much slower than taking a view of the frame and calculating the mean and std on those views.

Here is a version of the class where every functoin is timed and I've implemented a transform_faster function which runs around 100ms faster on my machine. here is the time of the old function:

reshape function took 0.011 ms
mean_t function took 188.751 ms
transform function took 188.848 ms

And the times of the new one:

get_channels function took 0.016 ms
mean_fast function took 13.938 ms
std_fast function took 33.375 ms
transform_faster function took 47.533 ms

I've only tried it on one video file so far. Let me think what you think of it and if you feel like it add it to the push-request :)

def timing(f):
    def wrap(*args):
        time1 = time.time()
        ret = f(*args)
        time2 = time.time()
        print('{:s} function took {:.3f} ms'.format(f.__name__, (time2-time1)*1000.0))

        return ret
    return wrap


class Spo2Dataset(Dataset):
    """Spo2Dataset dataset.
        It preprocess the data in order to create a Dataset with the average and std of each channel per frame. 
        The process is slow so it may take a while to create the Dataset when first initated.
    """    
    @timing
    def reshape(self, frame):
        return frame.reshape(-1,3)
    
    @timing
    def mean_t(self, frame):
        return np.array([frame.mean(axis=0), frame.std(axis=0)]).T
    
    @timing
    def transform(self,frame):
        frame = self.reshape(frame)
        ret = self.mean_t(frame)
        return ret
    
    @timing
    def get_channels(self, frame, blue = 0, green = 1, red = 2):
        blue_channel = frame[:,:,blue]
        green_channel = frame[:,:,green]
        red_channel = frame[:,:,red]
        
        return blue_channel, green_channel, red_channel
    
    @timing
    def mean_fast(self, blue_channel, green_channel, red_channel):
        blue_channel_mean = blue_channel.mean()
        green_channel_mean = green_channel.mean()
        red_channel_mean = red_channel.mean()
        
        return blue_channel_mean, green_channel_mean, red_channel_mean
    
    @timing
    def std_fast(self, blue_channel, green_channel, red_channel):
        blue_channel_mean = blue_channel.std()
        green_channel_mean = green_channel.std()
        red_channel_mean = red_channel.std()
        
        return blue_channel_mean, green_channel_mean, red_channel_mean
    
    @timing
    def transform_faster(self, frame):
        blue_channel, green_channel, red_channel = self.get_channels(frame)
        
        blue_channel_mean, green_channel_mean, red_channel_mean = self.mean_fast(blue_channel, green_channel, red_channel)
        blue_channel_std, green_channel_std, red_channel_std = self.std_fast(blue_channel, green_channel, red_channel)
        
        return np.array([[blue_channel_mean, blue_channel_std],
                         [green_channel_mean, green_channel_std],
                         [red_channel_mean, red_channel_std]])
    
    def __init__(self, data_path):
        """
        Args:
            data_path (string): Path to the data folder.
        """
        self.data_path = data_path
        self.video_folders = [folder for folder in os.listdir(data_path) if os.path.isdir(os.path.join(data_path,folder))]
        self.videos_ppg = []
        self.labels_list = []
        self.meta_list = []
        
        for video in self.video_folders:
            print("video")
            ppg = []
            video_path = os.path.join(self.data_path, video)
            video_file = os.path.join(video_path, [file_name for file_name in os.listdir(video_path) if file_name.endswith('mp4')][0])
            vidcap = cv2.VideoCapture(video_file)
            meta = {}
            meta['video_fps'] = vidcap.get(cv2.CAP_PROP_FPS)
            (grabbed, frame) = vidcap.read()
            frame_count = 0
            while grabbed:
                frame = self.transform(frame)
                ppg.append(frame)
                (grabbed, frame) = vidcap.read()
                if(frame_count % 50 == 0):
                    print("Frame:", frame_count)
                frame_count += 1
            with open(os.path.join(video_path, 'gt.json'), 'r') as f:
                ground_truth = json.load(f)

            labels = torch.Tensor([int(ground_truth['SpO2']), int(ground_truth['HR'])])
            self.videos_ppg.append(torch.Tensor(np.array(ppg)))
            self.meta_list.append(meta)
            self.labels_list.append(labels)
            print("done")
    def __len__(self):
        return len(self.video_folders)

    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()
        return [self.videos_ppg[idx],self.meta_list[idx],self.labels_list[idx]]

LittlePea13 · 2020-03-31T13:44:46Z

That sounds amazing, thank you Malcolm. I tried it and indeed it speeds up the process considerably. Also by 3 minutes I was talking about the whole while loop for each video, now it took 50 seconds on my machine (I am using the sample_data videos).

I won't be working on this for the rest of the day probably, if you can update the PR with the changes that would be great. We can also get rid of the threading version as it won't be of much help.
Thanks again!

Do not calculate the mean and std using axis so that the function goes much faster

MalcolmMielle

ok should be all good

gianlucatruda · 2020-04-01T11:04:31Z

Nice work, guys! 👏

LittlePea13 added 10 commits March 29, 2020 18:42

dataloader with two sample folders

fd56b54

change names and add to healthwatcher main

5caae47

add dataloader

6e77407

Merge pull request #6 from gianlucatruda/sample-data

549a9eb

Sample data

revert healthwatcher and push changes in Dataset

0fb27a7

add semcova data

6f2fb65

Fix Readme

d2e42f1

move things and add Readme

61c1774

Thread version

0a3005d

revert main

f203ce6

LittlePea13 added the data Any task related to data obtainment, preprocessing, or loading label Mar 31, 2020

LittlePea13 added this to the Phase 1: Reimplement existing methods, ready for testing on collected data milestone Mar 31, 2020

LittlePea13 self-assigned this Mar 31, 2020

MalcolmMielle added 2 commits March 31, 2020 18:54

malcolm: updated loader with faster transform

10dfda7

Do not calculate the mean and std using axis so that the function goes much faster

malcolm: printing the number of videos during load

b34203f

MalcolmMielle approved these changes Mar 31, 2020

View reviewed changes

MalcolmMielle merged commit 466ed30 into master Mar 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dataloader and Semcova data #10

Add dataloader and Semcova data #10

LittlePea13 commented Mar 31, 2020

MalcolmMielle commented Mar 31, 2020

MalcolmMielle commented Mar 31, 2020

LittlePea13 commented Mar 31, 2020

MalcolmMielle left a comment

gianlucatruda commented Apr 1, 2020

Add dataloader and Semcova data #10

Add dataloader and Semcova data #10

Conversation

LittlePea13 commented Mar 31, 2020

MalcolmMielle commented Mar 31, 2020

MalcolmMielle commented Mar 31, 2020

LittlePea13 commented Mar 31, 2020

MalcolmMielle left a comment

Choose a reason for hiding this comment

gianlucatruda commented Apr 1, 2020