Skip to content

Commit 9d93836

Browse files
committedDec 17, 2020
update readme
1 parent 29495ac commit 9d93836

File tree

4 files changed

+19
-7
lines changed

4 files changed

+19
-7
lines changed
 

‎README.md

+14-5
Original file line numberDiff line numberDiff line change
@@ -70,11 +70,20 @@ Train DSMIL on TCGA Lung Cancer dataset (precomputed features):
7070
```
7171

7272
## Training on your own datasets
73-
You could modify train_tcga.py to easily let it work with your datasets. You will need to:
74-
1. For each bag, generate a .csv file where each row contains the feature of an instance. The .csv file should be named as "_bagID_.csv" and put into a folder named "_dataset-name_".
75-
2. Generate a "_dataset-name_.csv" file with two columns where the first column contains _bagID_, and the second column contains the class label.
76-
3. Replace the corresponding file path in the script with the file path of "_dataset_.csv" file, and change the data directory path in the dataloader to the path of the folder "_dataset-name_"
77-
4. Configure the number of class for creating the DSMIL model.
73+
You could modify train_tcga.py to easily let it work with your datasets. After you have trained your embedder, you will need to compute the features and organize them as:
74+
1. For each bag, generate a .csv file where each row contains the feature of an instance. The .csv file should be named as "_bagID_.csv" and put into a folder named "_dataset-name_".
75+
<div align="center">
76+
<img src="thumbnails/bag.png" width="400px" />
77+
</div>
78+
2. Generate a "_dataset-name_.csv" file with two columns where the first column contains the paths to all _bagID_.csv files, and the second column contains the bag labels.
79+
<div align="center">
80+
<img src="thumbnails/bags.png" width="400px" />
81+
</div>
82+
3. Replace the corresponding file path in the script with the file path of "_dataset_.csv".
83+
```
84+
bags_path = pd.read_csv(PATH_TO_[_dataset-name_.csv])
85+
```
86+
4. Configure the corresponding number of classes argument for creating the DSMIL model.
7887

7988
## Citation
8089
If you use the code or results in your research, please use the following BibTeX entry.

‎thumbnails/bag.png

229 KB
Loading

‎thumbnails/bags.png

130 KB
Loading

‎train_tcga.py

+5-2
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ def get_bag_feats(csv_file_df, args):
1919
if args.simclr == 0:
2020
feats_csv_path = 'datasets/tcga-dataset/tcga_lung_data_feats/' + csv_file_df.iloc[0].split(os.sep)[1] + '.csv'
2121
else:
22-
feats_csv_path = 'datasets/wsi-tcga-lung/' + os.path.join(csv_file_df.iloc[0].split(os.sep)[-2], csv_file_df.iloc[0].split(os.sep)[-1])
22+
feats_csv_path = csv_file_df.iloc[0]
2323
df = pd.read_csv(feats_csv_path)
2424
feats = shuffle(df).reset_index(drop=True)
2525
feats = feats.to_numpy()
@@ -127,7 +127,7 @@ def main():
127127
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, args.num_epoch, 0)
128128

129129
if args.simclr == 0:
130-
bags_path = pd.read_csv('datasets'+os.sep+'tcga-dataset'+os.sep+'TCGA.csv')
130+
bags_csv = 'datasets/tcga-dataset/TCGA.csv'
131131
else:
132132
luad_list = glob.glob('datasets'+os.sep+'wsi-tcga-lung'+os.sep+'LUAD'+os.sep+'*.csv')
133133
lusc_list = glob.glob('datasets'+os.sep+'wsi-tcga-lung'+os.sep+'LUSC'+os.sep+'*.csv')
@@ -140,6 +140,9 @@ def main():
140140
bags_path = luad_df.append(lusc_df, ignore_index=True)
141141
bags_path = shuffle(bags_path)
142142
bags_path.to_csv('datasets/wsi-tcga-lung/TCGA.csv', index=False)
143+
bags_csv = 'datasets/wsi-tcga-lung/TCGA.csv'
144+
145+
bags_path = pd.read_csv(bags_csv)
143146
train_path = bags_path.iloc[0:int(len(bags_path)*0.8), :]
144147
test_path = bags_path.iloc[int(len(bags_path)*0.8):, :]
145148

0 commit comments

Comments
 (0)
Please sign in to comment.