Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify "technical" metrics about evaluated NLP Tools that we can capture #4

Open
tschaffter opened this issue Oct 11, 2020 · 2 comments
Assignees

Comments

@tschaffter
Copy link
Member

tschaffter commented Oct 11, 2020

@thomasyu888 @gkowalski In the current design of the infrastructure, the Orchestrator gets a page (as in "paginated response") of clinical notes from a Data Node (e.g. 50 clinical notes), sends them to NLP Tool being evaluated, receives the results and repeat with the next page of clinical notes. In addition of allowing us to controller the flow of information to the NLP Tool, which limit its memory need, we can evaluate and ideally report the following metrics:

  • Completion rate: number of notes processed / number of notes in the dataset
  • Time required to process a clinical note (average, std)
    • the timer start after the request has been sent (clinical notes sent to the NLP Tool)
    • the timer stops when all the responses have been received from the NLP Tool for the clinical notes sent

The motivation for reporting the completion rate to the user is that it will allow him/her to better predict when the results are out. This can also be used by the user to track whether the tool takes too my time to complete. For the staff maintaining a Data Hosting Site, it would be nice to have a report in ELK that shows the Tools that are being evaluated and their completion rate.

The motivation for reporting information about the processing time is that a hospital who is looking for a tool to use in production by visiting a Leaderboard of the NLP Sandbox may identify that a Tool would take too much time to process their volume of clinical note. One option could be to extrapolate and show the time required to process 1 million of notes. It's important that any time information are very much dependent on the spec of the infrastructure used (number and frequency of CPU cores, etc.). We should be able to provide information about the spec used when reporting a time information. Note that this spec may vary from one Data Hosting Site, in which case we would probably want to report the time for each dataset / Data Hosting Site used to evaluate a NLP Tool.

@tschaffter tschaffter added the help wanted Extra attention is needed label Oct 11, 2020
@thomasyu888
Copy link
Member

thomasyu888 commented Oct 11, 2020

One important distinction to make is that currently the orchestrator doesn't do any of those things. The workflow that you see in this repository would be in charge of doing those things and the orchestrator is responsible for connecting participant submissions with this workflow.

One of my biggest concerns is that there isn't an "elegant" way with CWL to

  1. Get 50 nodes
  2. Process 50 nodes
  3. annotate with metrics for those 50
  4. Repeat step 1 until finished

Currently the workflow would be

  1. Get a million nodes but split into chunks of 50
  2. Process the chunks of 50 In parallel and annotate metrics.

I think the above metrics are obtainable, will just take an example submission to figure out what is and isn't possible.

@tschaffter
Copy link
Member Author

Step 3 would be out of the loop: we process all the clinical notes and then we evaluate the performance. The loop 1-2 could be implemented in the NLP Sandbox Client as one command.

@tschaffter tschaffter removed the help wanted Extra attention is needed label Jan 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants