-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REF] AIBL-to-BIDS : code refactoring for sessions file creation #1359
base: dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @AliceJoubert !
I made a first pass and made some suggestions along the way. I also have a few questions.
Looking at the diff, it's pretty difficult to know if the old and new versions of the code are equivalent.
|
||
bids_id = bids_id_factory(StudyName.AIBL).from_original_study_id(str(rid)) | ||
# in general age metadata is present only for baseline session | ||
sessions["age"] = sessions["age"].ffill() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this. If you have the age at the baseline session, it won't necessarily be the age at later sessions, right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right ! I did not totally modify what was done before ie storing the birth date inside the age column and compute the age based on this. Now it should be more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep ! Much better now ! 👍
visit_code = df.loc[(df["RID"] == rid), "VISCODE"] | ||
|
||
for field in sessions_fields_to_read: | ||
if field in list(df.columns.values) and field == "MMSCORE": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see these hardcoded values in your code.
I'm probably missing something, but how do you handle these in your version ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To explain what my code does quickly :
- I read the specifications file where it is indicated for each metadata of interest : name in the study, name for BIDS, location in csv files
--> I don't really need to hard code anything for metadata extraction since I loop over what is in the spec file. The process for "extraction" is the same, which you can find in the _format_metadata_for_rid function. - If needed I perform some mapping which is similar to what is done here but I use the BIDS name instead. For ex my "diagnosis" mapping corresponds to what was done for field "DXCURREN"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks for the explanations @AliceJoubert !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few minor things. LGTM otherwise
@@ -1,6 +1,8 @@ | |||
from multiprocessing.managers import Value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from multiprocessing.managers import Value |
@@ -168,263 +225,118 @@ def create_sessions_tsv_file( | |||
""" | |||
import glob |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import glob |
if not exam_date.empty and exam_date.iloc[0].EXAMDATE != "-4": | ||
return exam_date.iloc[0].EXAMDATE | ||
return None | ||
|
||
|
||
def _get_csv_files(clinical_data_dir: Path) -> List[str]: | ||
def _get_csv_files_for_alternative_exam_date(clinical_data_dir: Path) -> Tuple[Path]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can improve this a bit. Currently this function builds a tuple of Paths from a tuple of patterns in order for the caller to iterate over the paths. We can directly return a generator object which is a bit cleaner and memory efficient (not a major concern here I admit...):
from typing import Generator
def _get_csv_files_for_alternative_exam_date(clinical_data_dir: Path) -> Generator[Path]:
"""Return the paths to CSV files in which an alternative exam date could be found."""
for pattern in (
"aibl_mri3meta_*.csv",
"aibl_mrimeta_*.csv",
"aibl_cdr_*.csv",
"aibl_flutemeta_*.csv",
"aibl_mmse_*.csv",
"aibl_pibmeta_*.csv",
):
try:
yield next(clinical_data_dir.glob(pattern))
except StopIteration:
continue
(None, "n/a"), | ||
], | ||
) | ||
def test_mapping_diagnosis(diagnosis, expected): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_mapping_diagnosis(diagnosis, expected): | |
def test_map_diagnosis(diagnosis, expected): |
Close #1350