[framework update] multimodal draft #304

zzachw · 2024-10-28T15:15:12Z

No description provided.

jhnwu3 · 2024-10-28T15:56:47Z

pyhealth/datasets/featurizers/signal.py

Can we make a bandpass filter optional here? According to jathurshan:

IIIC data may not use explicitly a bandpass filter

ECOG data may do raw signal or band stop filtering

jhnwu3

Just want to make sure TextFeaturizer, BioSignal Featurizer are not too heavy or restrictive.

And, I also added some suggestions table processing for CXR and notes using the MIMIC4. But, I think we probably need to figure out how we want to initialize the dataset, and the pathing for it, because CXR and Notes will have different related pathings. We can probably ask the user to just put everything in one directory, or we can add additional optional filepathing variables and check if None else: parse_{table}.

jhnwu3 · 2024-10-28T16:02:26Z

pyhealth/datasets/featurizers/text.py

I think this is mostly fine. I think maybe any option for the user to define their own AutoTokenizer and AutoModel outside might be less heavy, because sometimes people may want to throw in their own existing finetuned model

jhnwu3 · 2024-10-28T16:17:28Z

pyhealth/datasets/medical_transriptions.py

Looks good to me

jhnwu3 · 2024-10-28T16:36:12Z

pyhealth/datasets/mimic4.py

Hey Zhenbang, did we forget about MIMIC Note, and CXR here?

def parse_discharge(self, patients: Dict[str, Patient]) -> Dict[str, Patient]: table = "discharge" # hardcoded, again might need user to explicitly download files into same directory/. df = pd.read_csv(os.path.join(self.tables_dir[table], f"{table}.csv"), dtype={"subject_id": str, "hadm_id": str}) df = df.dropna(subset=["subject_id", "hadm_id", "text", "charttime"]) df = df.sort_values(["subject_id", "hadm_id"], ascending=True) group_df = df.groupby("subject_id") def discharge_unit(p_id, p_info): events = [] for v_id, v_info in p_info.groupby("hadm_id"): for text in v_info["text"]: attr_dict = {"text" : text, "vocabulary" : "text", "visit_id"=v_id, "patient_id"=p_id} event = Event( attr_dict = attr_dict, timestamp=strptime(v_info["charttime"].values[0]) ) events.append(event) return events group_df = group_df.parallel_apply( lambda x: discharge_unit(x.subject_id.unique()[0], x) ) patients = self._add_events_to_patient_dict(patients, group_df) return patients def parse_cxr(self, patients: Dict[str, Patient]) -> Dict[str, Patient]: table = "cxr" cxr_file = "mimic-cxr-2.0.0-metadata" # hardcoded, might need to explicitly just have a CXR path in init. df = pd.read_csv(os.path.join(self.tables_dir[table], f"{cxr_file}.csv"), dtype={"subject_id": str, "hadm_id": str}) # combine date and time to create timestamp df = df.dropna(subset=["subject_id", "study_id", "dicom_id"]) df.StudyDate = df.StudyDate.astype(str) df.StudyTime = df.StudyTime.astype(str) # process all the dates and times df['StudyDateTime'] = df.apply(lambda row: self.transform_study_datetime(str(row['StudyDate']), str(row['StudyTime'])), axis=1) df = df.sort_values(["subject_id", "study_id"], ascending=True) group_df = df.groupby("subject_id") def cxr_unit(p_id, p_info): events = [] for v_id, v_info in p_info.groupby("study_id"): for dicom_id, timestamp in zip(v_info["dicom_id"], v_info["StudyDateTime"]): attr_dict = { "dicom_id"=dicom_id, # used for the dicom_id pathing "vocabulary"="cxr"} event = Event( visit_id=v_id, patient_id=p_id, timestamp=strptime(timestamp) ) events.append(event) return events group_df = group_df.parallel_apply(lambda x: cxr_unit(x.subject_id.unique()[0], x)) patients = self._add_events_to_patient_dict(patients, group_df) return patients

zzachw added 2 commits October 28, 2024 04:27

merged commits for multimodal pr

e52b7eb

fix small bug in Mortality30DaysMIMIC4

7f25966

jhnwu3 reviewed Oct 28, 2024

View reviewed changes

jhnwu3 requested changes Oct 28, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[framework update] multimodal draft #304

[framework update] multimodal draft #304

zzachw commented Oct 28, 2024

jhnwu3 Oct 28, 2024

jhnwu3 left a comment

jhnwu3 Oct 28, 2024

jhnwu3 Oct 28, 2024

jhnwu3 Oct 28, 2024

jhnwu3 Oct 28, 2024

[framework update] multimodal draft #304

Are you sure you want to change the base?

[framework update] multimodal draft #304

Conversation

zzachw commented Oct 28, 2024

jhnwu3 Oct 28, 2024

Choose a reason for hiding this comment

jhnwu3 left a comment

Choose a reason for hiding this comment

jhnwu3 Oct 28, 2024

Choose a reason for hiding this comment

jhnwu3 Oct 28, 2024

Choose a reason for hiding this comment

jhnwu3 Oct 28, 2024

Choose a reason for hiding this comment

jhnwu3 Oct 28, 2024

Choose a reason for hiding this comment