-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance + Partitioning of Spaces Question #774
Comments
Hi @nworb999 ,
Hyperon-experimental codebase performance was significantly improved recently, but there is still room for improvement of cause. At the moment I am working on better implementation of the in-memory atomspace which uses more compact representation and should work faster for large knowledge bases. It is worth to mention two other implementations which were focused on performance from the very beginning: https://github.com/trueagi-io/metta-wam and https://github.com/trueagi-io/MORK @TeamSPoon and @Adam-Vandervorst can say more about them.
Hyperon-experimental has couple of performance tests more microbenchmarks really you can find them at https://github.com/trueagi-io/hyperon-experimental/tree/main/lib/benches. I think I will be able to provide something more solid after finishing of new atomspace implementation.
Do you mean Redis will keep different atomspaces under different keys and you want to load/unload them into memory before processing? Not sure what goal do you pursue here. Hyperon-experimental is not able to work with Redis out of the box. Do you mean loading knowledge base into memory from Redis should be faster then loading it from the disk?
There is DAS (Distributed AtomSpace) project https://github.com/singnet/das which is an implementation of the distributed atomspace storage. It is integrated to hyperon-experimental to some extent. @andre-senna who is lead of the DAS project and @CICS-Oleg who integrates it with hyperon-experimental can say more about this. |
Hi @nworb999 , do you some example MeTTa files and queries for your application? Would love to work through them with MORK. |
Thank you so much for your responses!
Yes, the idea is to add a service that grabs relevant atomspaces at run-time for different users when the conversation starts -- different users will have different "memories" from past conversations as well as a shared pool of general information that will be run on start-up ( I know that loading from Redis won't necessarily be faster than loading it from a disk, I was more curious whether there was a more conventional way of partitioning atomspaces by user in a "database" for an application with multiple users. I've only worked with SQL/NoSQL databases in the past for production applications, and haven't used something like atomspaces before. If we have 100,000 users, I wouldn't normally have 100,000 growing flat files locally at all times.
However, it seems like the level of complexity + scale of our data will be small by MeTTa's standards (compared to robotics datasets, for example). Here is some stubbed-out data, I haven't finished the queries for it yet but they will be pretty basic CRUD capabilities.
An example of something we would do is manage facts about objects, create a game, create game rules, game roles, game instance + role instances, game actions, and game events. Also fetching each of these types of data, and updating GameRoleInstances. The functionality is very basic, but scaled to a hundred-thousand "rows", I was wondering if there was an argument for thematically sharding atomspaces to only be pulled in-memory ( That was what we had in mind for Redis as well, on top of storing user's specific knowledge bases separately. |
One way is to keep database for each user in a separate atomspace. And these atomspaces can be inserted into a main atomspace as atoms as well. There is a small example written in MeTTa to demonstrate the idea:
If the root atomspace is kept in DAS (which I mentioned above) it should provide persistence out of the box and you should not bother about atomspace size. I think @andre-senna or someone from DAS team or may be @CICS-Oleg could explain whether this example works with current DAS version. If root atomspace is kept inside in-memory space then one need to persist it manually. Unfortunately I think persisting nested atomspace doesn't work out of the box but it should be relatively easy to write it in Python I believe.
Yes, it is possible way to manage it if size of the data is too big. One can write a code similar to the example above in Python. It should load atomspace with user's data and add it as an atom into the root atomspace for instance when user opens session. And it should delete corresponding atomspace atom from the root atomspace when user closes session. This should work more correctly than |
We are testing consistency of current vesions of metta and DAS ATM, but it seems like those examples should work. |
Yes, I also think so. However, I should remark that DAS is still experimental code (even compared to the MeTTa interpreter) so its public API is very unstable. The main side-effect of this is that integration with the MeTTa interpreter is also unstable. We expect to have a reasonably stable version of the API by the EOY. You can read more info about the ideas behind DAS here: https://github.com/singnet/das and our current planned development roadmap here: singnet/das#41 |
Thank you all! |
Hello,
I'm considering using MeTTa for a conversational AI application and have some questions about its performance with large datasets.
In the OpenCog Atomspace Metagraphs paper, it's mentioned that:
This was in the context of discussing how "the arity of hyperedges is typically small." However, I couldn't find much information about performance considerations in the MeTTa documentation.
people_kb.metta
as a value). Would this approach be beneficial, or is it only necessary for exceptionally large datasets?Any insights or resources you can provide would be greatly appreciated before I start implementing my project.
Thank you for your time and assistance!
The text was updated successfully, but these errors were encountered: