Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Medcat refresh #623

Merged
merged 10 commits into from
Dec 14, 2023
Merged

Medcat refresh #623

merged 10 commits into from
Dec 14, 2023

Conversation

eroell
Copy link
Collaborator

@eroell eroell commented Dec 12, 2023

PR Checklist

  • This comment contains a description of changes (with reason)
  • Referenced issue is linked
  • If you've fixed a bug or added code that should be tested, add tests!
  • Documentation in docs is updated

Can I look at examples of how to use this version?
Yes, in the parallel PR for the ehrapy tutorial here(to be inserted).

Description of changes
Resolves #503.

The goal is to simplify the use of medcat. In #503, the following TODOs have been outlined.

  • Look at how MedCAT currently works outside of ehrapy
  • Look at how we implemented it in ehrapy
  • Remove the custom class MedCATsomething object in ehrapy
  • ehrapy should just have as few functions as possible that take an AnnData object and maybe a MedCAT object that then annotate the text and store something in an AnnData slot. Maybe we have our own object but nothing that builds on top of the AnnData object like it does now. This first draft works without a MedCAT object. Something like this might be reintroduced if it enhances user experience and/or maintenance.
  • Then plot the stuff in the AnnData slot with ehrapy functions This first draft works by requiring to use add_medcat_annotation_to_obs before plotting, simplifying the overhead.
  • Implement everything.
  • Add tests
  • Update the tutorials

This draft removes the MedCAT and EhrapyMedcat classes and temporary moving-annotations-to-plotting.
Instead, it introduces scanpy-like methods

  • ep.tl.annotate_text
  • ep.tl.get_medcat_annotation_overview
  • ep.tl.add_medcat_annotation_to_obs

Technical details

This does not move the CAT instance out of sight from the user: e.g. vanilla tutorials for MedCAT as well as sophisticated user adjustment can be done all on the CAT instance, nothing special here.

The desired CAT can then be passed to ep.tl.annotate_text, and ehrapy takes care of shaping the output into the right fields in the right format: such that ep.tl.add_medcat_annotation_to_obs can be used, followed by any plotting function which can use .obs entries.

Questions

  • What do you think?
  • Better suggestions for naming?
  • Should the run_unsupervised_training convenience method be re-enabled?
  • Should temporary move-to-obs be re-enabled?

Todos if path is outlined

  • Add tests
  • Add examples into docstrings

Copy link
Member

@Zethson Zethson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this looks much better. It's going back to a design that we once had but then killed in favor of what we currently have.

  1. I saw your PR on the tutorials repository. I assume that you ran it?
  2. Cannot access medcat model #591 - are we resolving this issue with this? Basically, can we make it easier for people to download the models? Or is that MedCat's problem?
  3. Are there any new MedCat features that we could make use of? It's under active development and I only looked at it one year ago.
  4. Related to 3 - can we get hierarchies somehow?
  5. Is there a reasonable way for us to offer transferring the annotations into some kind of word2vec that people can append to a layer? Overkill?

Thank you so much!

ehrapy/tools/nlp/_medcat.py Outdated Show resolved Hide resolved
ehrapy/tools/nlp/_medcat.py Show resolved Hide resolved
ehrapy/tools/nlp/_medcat.py Outdated Show resolved Hide resolved
ehrapy/tools/nlp/_medcat.py Outdated Show resolved Hide resolved
ehrapy/tools/nlp/_medcat.py Show resolved Hide resolved
ehrapy/tools/nlp/_medcat.py Outdated Show resolved Hide resolved
ehrapy/tools/nlp/_medcat.py Outdated Show resolved Hide resolved
ehrapy/tools/nlp/_medcat.py Outdated Show resolved Hide resolved
ehrapy/tools/nlp/_medcat.py Outdated Show resolved Hide resolved
ehrapy/tools/nlp/_medcat.py Outdated Show resolved Hide resolved
@eroell
Copy link
Collaborator Author

eroell commented Dec 12, 2023

  1. I saw your PR on the tutorials repository. I assume that you ran it?

Yes, I ran it, using this branch of ehrapy: it used a stored medmen_wstatus_2021_oct.zip though, which is why it worked with an outdated link. This link has now been fixed there.

  1. Cannot access medcat model #591 - are we resolving this issue with this? Basically, can we make it easier for people to download the models? Or is that MedCat's problem?

Yes, we are resolving it. This was an outdated link, which has been updated in the tutorial now.

  1. Are there any new MedCat features that we could make use of? It's under active development and I only looked at it one year ago.

Did not check this in more detail yet.

  1. Related to 3 - can we get hierarchies somehow?

Did not check this in more detail yet.

  1. Is there a reasonable way for us to offer transferring the annotations into some kind of word2vec that people can append to a layer? Overkill?

Did not check this in more detail yet. Think this would be a very nice future addition, probably not doable in a clean way for the next release though?

@Zethson
Copy link
Member

Zethson commented Dec 13, 2023

Did not check this in more detail yet. Think this would be a very nice future addition, probably not doable in a clean way for the next release though?

Yeah, something for later

@eroell eroell marked this pull request as ready for review December 14, 2023 14:16
@Zethson Zethson merged commit 983a282 into theislab:main Dec 14, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simplify medcat
2 participants