Interest in GSoC 2026 Project 1: Dashboard for tracking MD simulation progress + Queries for the same #5264
Replies: 6 comments 3 replies
-
I wouldn't say that MDAnalysis is shifting fundamentally — I think for the foreseeable future the majority of use cases remains processing of "on disk" files. We rather want to broaden the way it can be used — see the image below.
The issue tracker has a few (but not many) issues marked. Given that we are really trying to figure out how you engage with us, I'd suggest you pick something reasonable and get started. We want to see that you can do distributed development with git/GitHub, code on your own, that you ask questions when you're stuck, and take advice on board. Don't use AI tools as we want to see you and MDAnalysis does not accept AI generated content in its code base.
Totally open for discussion. Convince us that your choice is the right choice. Weigh ease of use and development with maintainability.
Generally anything that can be either calculated from a single time frame and presented as a time series or a cumulative quantity. It also depends on what is being simulated. If it's a solvated biomolecule such as a protein then RMSD, radius of gyration, or secondary structure (possibly cumulative or in a sliding window). If you're looking at a liquid then the radial distribution function can be of interest (@HeydenLabASU @amruthesht could say more and see also the streaming workshop https://github.com/amruthesht/imd-workshop-2025 ). For more background on MD look at the LiveCOMS article Best Practices for Foundations in Molecular Simulations https://doi.org/10.33011/livecoms.1.1.5957
I'd say start simple with single user. |
Beta Was this translation helpful? Give feedback.
-
|
@orbeckst @HeydenLabASU I will start by:
For the dashboard, I'll first focus on a single-user local prototype including key parameters (like RMSD, radial distributions etc.) as suggested with suitable GUI interfaces I look forward to sharing my progress and engaging with you to resolve my further queries as well as your advise for drafting the proposal |
Beta Was this translation helpful? Give feedback.
This comment was marked as off-topic.
This comment was marked as off-topic.
-
|
@orbeckst @HeydenLabASU @amruthesht Greetings for the day! Based on my research about the MDAnalysis, IMDv3 and possible technologies to use for the project, below is my approach for the project:
Feedback by Matthias:- On point 3: IMDclient does have an internal ring buffer for stability whose size can be set by the user. However, this buffer is not meant to support time-dependent analysis. Storing an time history would need to be implemented on the “consumer” end, e.g., in an MDAnalysis script. 4. Real-time analysis using MDAnalysis and IMD Reader 5. Backend using FastAPI to manage the analyzed data and stream it to frontend using WebSocket for real-time updates 6. Frontend Dashboard built using Dash + Plotly for better visualization, support for event detections and show relevant warnings 7. Advanced: 3D molecular visualization using Blender and molecular nodes. I chose FastAPI for the backend due to following factors:
I chose Dash + Plotly for the frontend due to following reasons:
Feedback by Matthias:- I actually do not have experience with webbrowser frontends. Thus your input here is more than welcome. One thing to note is that this should run natively and, most important, locally on someones webbrowser, maybe requiring the user to install a Google Chrome “extension” but not much else. That is something to look out for. In particular, we are not planning to create a webserver on our end to run the dashboard, since that would require sending the simulation data to us, possibly around the world, and then back to the user. As said, I do not know whether Dask is a good option in this context or not. I just wanted to point out this limitation (or extra challenge). Possible challenges (at this point of time):
Feedback by Matthias:- This problem is partially handled on the IMDclient end, which sends a message to the MD engine to pause the simulation if its buffer is reaching capacity. It then sends a second message to resume the simulation when the buffer has been emptied (which happens through processing the data on the client side). However, of course, you will want to avoid slowing down the simulation by a slow analysis process on the Dashboard. Multiprocessing and adaptive frameskipping, if appropriate for a given analysis, seem like a good strategy. 2. Memory management for Time-dependent analysis: Some parameters like RMSD require historical data. Allocating large memory will eventually lead the dashboard to crash - Possible Solution: Circular buffering Feedback by Matthias:- Circular buffering is indeed a good idea for time-dependent analysis like time-correlation functions etc. However, for RMSD’s this is not needed because only a single time point is used as a reference. 3. Network Instability: Since this is a real-time analysis dashboard, network instability needs to be addressed. - Possible Solution: Developing an automated reconnection routine that attempts to re-establish IMD v3 handshake without a full dashboard restart leading to loss of data Feedback by Matthias:- Certainly an issue, but in my opinion less critical. The Dashboard is intended as a convenient way to have a quick peak into a running simulation. The actual data produced by the simulation will be stored elsewhere. Thus preventing data loss during an interrupted connections is not a major priority. I would appreciate your suggestion in moving ahead with the project. |
Beta Was this translation helpful? Give feedback.
-
|
@HeydenLabASU Thank you for your valuable feedback. This gives a clear direction to move forward with the project.
As per my understanding now, the IMD client ring buffer is intended for network stability rather than storing data for time-dependent analysis. Therefore, I will implement a separate history buffer on the consumer side to store limited number of frames for time-dependent analyses such as time-correlation functions
As for the dashboard architecture, I confirm that the Dash application will run locally on the user's machine, so the simulation data will not be sent to a remote server.
It is helpful to know that IMDclient supports a pause/resume workflow to prevent buffer overflow. Further, depending on real-time analysis scenario, we can go with techniques such as multiprocessing and adaptive frameskipping.
I will explore circular buffering as a good option for time-dependent analysis like time-correlation as you mentioned.
That makes sense - the dashboard serves as a monitoring and exploratory tool as the simulation data is stored independently. In that case, i will focus on graceful handling of temporary connection interruptions rather than complex recovery mechanism. Meanwhile, I am also looking for issues to contribute to in order to get more familiar with the codebase |
Beta Was this translation helpful? Give feedback.
-
|
@orbeckst , @HeydenLabASU , @amruthesht , I have submitted the pre-proposal form with the updated proposed solution. As advised by you, I explored the various issues and subsequently raised a PR #5294 aimed at fixing #5247. This involved implementing a helper function to centralize residue name matching logic and refactoring various classes, to call this helper function with appropriate arguments, that previously implemented very similar logic independently, thereby reducing code duplicity. The PR got successfully merged. Thank you for your guidance. I am currently looking out for resolving more issues and refining the project architecture we discussed. |
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone 👋,
My name is Parth Uppal, and I am a B.E student with strong experience in Python, data analysis workflows, and building structured software projects. I am very much interested in contributing to MDAnalysis for GSoC 2026, particularly to the Project 1: “Dashboard for tracking MD simulation progress with the new streaming interface.”
Over the past year, I have:
From my understanding, this project aims to shift MD analysis from a post-processing model to a real-time interactive workflow using IMDv3 streams via imdclient and MDAnalysis.
The idea to combine my skills in software development and my passion in the fields of science (Chemistry and Astrophysics are the fields I am really passionate about) is really exciting for me.
As preparation, I am currently:
I have a few initial questions to help me prepare:
I am highly motivated to start contributing early, engage with the community, and shape a detailed proposal based on mentor feedback.
Looking forward to your suggestions and thank you for this exciting project idea!
Best regards,
Parth Uppal
Beta Was this translation helpful? Give feedback.
All reactions