-
Notifications
You must be signed in to change notification settings - Fork 101
Clarification on intent handling / Remote Server needed #217
Comments
If your intents are already handled via Snips MQTT Hermes protocol then you won’t have to do much using the next version of Rhasspy (2.5) as it is completely compatible with the Hermes protocol already. It is even going to propose Snips NLU. For the last few months, Rhasspy has gone through an intense restructuring of its services for improved modularity based on the MQTT Hermes protocol. I think the official release of the 2.5 version is approaching rapidly. For more info, maybe this can help: |
Okay so youre saying that the "remote server" / HTTP based variant is going to be deprecated soon? |
I think the remote HTTP handler will go on via a separated Rhasspy service. As the Hermes MQTT protocol will be used as the underlying glue between all services, it might be simpler to interface directly with it instead of relying on an additional service just to forward intents and dialogue handling messages. If your intents are already handling Snips topics then they should be completely compatible with Rhasspy next version (2.5). How did you handle your skills with Snips? We’re you using snips-skill-server? |
Okay I see. I wanted to implement both connectors anyway (HTTP + hermes), since its not so much of an effort. Yes I was using the snips-skill-server previously, so basically I am trying to make a replacement for it since snips is dead. In the current version (2.4.x) I can see there is options for MQTT/snips/hermes already. Is there going to be a bigger change regarding this interface in the upcoming 2.5 release? |
Hi @patrickjane, the short answer is that the same intent JSON should be returned (in 2.4). My original idea was that an intent handler could alter the intent before it got passed to Home Assistant (maybe add some extra information). Going forward in 2.5, remote HTTP handling is fully supported. Like any other Hermes-compatible service, rhasspy-remote-http-hermes listens for intents via MQTT and POSTs them to some HTTP endpoint. It only expects a JSON object back with an optional "speech" property (with a "text" sub-property). If you're using NodeRED, you have many choices in 2.5 to handle intents: directly via MQTT (Hermes protocol), via WebSocket ( |
Okay I see. Meanwhile I've switched to MQTT, and I am receiving the intents from rhasspy via I have used node-red before with home assistant to do automations, however at some point I have dropped node-red and decided to just do everything in home assistant. Also I liked the way snips did it, in that you could pull existing skills from their store and just plug them into your system without much effort. This is why I started working on a skill-server replacement. |
So in 2.4 rhasspy does not yet listen on
If you think this project might be useful for rhasspy I'd be happy to put it on github, so far I guess its satisfying my own personal needs for the voice assistant. Idea of the skill server would be:
I might add a little cli tool for this to handle skill installation & setup (same as we had with snips). |
@synesthesiam https://github.com/patrickjane/hss-server I'd be happy to work on some kind of skill-platform/marketplace thingy, if you guys be up for it. |
This is pretty neat 😊👍 I’d be even better if languages outside of python could be used for skills. Like executing a command line and using stdin/stdout to communicate over a simple JSON protocol. A simple JSON/YAML at skill root level with skill properties (name, description, author, intents to handle, command to execute, etc) maybe ? Just thinking 🤔🤗 |
I think we have that simple JSON protocol already, which is hermes over MQTT. Introducing a similar transport-/language-agnostic protocol on top of it might not be that useful, since you could achieve that with something like node-red already I guess. I think it boils down a little on how the overall workflow of "skills" shall be, and if it is meant to be more for like developers and hackers, or more for the average user. [edit] By the way, when it comes to sharing skills and installing skills from other developers, one half of it right now cannot be easily shared, which is the sentences & slots. Is there any idea/concept for this to enable easy sharing in the future of rhasspy? |
Fair enough ;) It boils down to how the sentences/slots are registered and forwarded to the ASR and NLU services. Maybe @synesthesiam could provide more insights on how Rhasspy 2.5 will handle the dataset. |
Just to feed back, @patrickjane (and others), I've been experimenting with your https://github.com/patrickjane/hss-server and rhasspy 2.5, and have managed to proof-of-concept (very roughly) a dialogue-based countdown timer skill, and a skill for adding reactions to RocketChat's most recent Gnome notification (happy to share once tidied a little) - for keeping those modular, I find your hss-skill pattern works quite well, and can see a couple of other itches I'll likely to use it to scratch. However, keen to know if you or others have had any more thoughts on trajectory. |
I am using the Yet,
Some thoughts on 2): |
Hi @patrickjane, impressive work! I have been thinking about the same functionality and implementing part of it too. Some remarks:
Are you active on the forum? I have discussed about these and other topics here:
The result of my thoughts in the first forum post is a helper library for Rhasspy apps, rhasspy-hermes-app. This is just a wrapper library around rhasspy-hermes to make it easy as possible to create Rhasspy apps. It's still a proof of concept, but already quite usable. It seems to me that your hss-server and hss-skill are tightly intercoupled. I'm not sure that's the best way to go forward. Ultimately a skill server should be able to install skills developed in various languages, as @mathquis already remarked. So that's why I'm not too fond of the idea to couple a skill server to an app library. But even with Python alone it would be better to make the architecture more flexible. There are a couple of initiatives to create libraries to develop Rhasspy apps (@daniele-athome is also working on a proof of concept for Rhasspy Hermes apps in AppDaemon) and it would probably better if we could share some parts of the API. Because one of Rhasspy's strong points is its flexibility in which services you can use with it, I think we should try to keep our options open for the creation and distribution of Rhasspy apps, so it's nice that there are various app platform implementations. But it would be good if we could share some resources. Another idea I have created a proof of concept for is running each Rhasspy app in a Docker container. You can see my thoughts about it in the third forum link mentioned above. Coupled with Mosquitto's access control list and a username and password for each app, we can precisely limit what an app is able to do. My goal is to work on this idea further, because I don't like the idea of apps being able to do what they want on my machine or my network. So an alternative "skill server" could just install Docker containers this way to add Rhasspy apps. These are just some ideas :-) Don't let this dissuade you from working on hss-server and hss-skill, I think there's still a lot to explore in this domain and having multiple implementations for Rhasspy is good. |
Well mainly rhasspy would need to support the following workflow:
As I am still on rhasspy 2.4, I have no clue whether or not this already works.
Will do.
Agreed. All which I have written on this thread should be considered as support for rhasspy, and while I've already implemented a working skillserver, it should be merely a first draft, and subject to change. I would be happy to contribute, and agree to share resources. When I first started using rhasspy, I found that there is no real intent handling in place, other than publishing intents via MQTT, HTTP or to home assistant, all of which were not suitable for me.
Nope, didn't even know a forum exists :D
Well, I think both approaches have their pros and cons, and as I have stated earlier, it pretty much boils down to how rhasspy is meant to do intent handling. There is the option already to publish intents over MQTT, HTTP and send to home assistant. Lets ignore the HTTP stuff, then MQTT alone is already bringing the language agnostic decoupling, since it would allow anyone to just hook on the MQTT message using their favourite language. It would still work with a running When I've started working on Some of the reasons for using a server-approach over standalone app runtimes were:
So basically what I want to say is, if rhasspy decides to offer intent handling, but at the same time make it possible to use any given programming language, then it would get a bit hard to bring all this together. As I said, you've got the decoupling via MQTT already, I see no real benefit of further decoupling within the skill-server, only to enable developers to implement skills in other programming languages. Especially when we're talking about installing skills from other developers, skill marketplace/ecosystem, I think all this might get really complicated, when it shall be possible to install skills written in arbitrary programming languages. Just think about all the stuff that might need to be set up, like dependencies, tools, libraries. Right now,
Then, you would need some sort of protocol between the skill server and the docker container. And this would essentially mean you're back to zero, as hermes is that protocol already. So either your docker containers which contain the skills just need to implement the hermes protocol again, or we're talking about a second, non-standard protocol. However maybe the skill server could support two kinds of skills; docker based and python based, and docker based skills would have no interaction with the skill server at all. But even then, the user would need to configure MQTT connection parameters for every docker container, which is pretty much what I wanted to avoid in the first place.
No worries, all I want is to contribute to rhasspy's functionality. Although it is named BTW: you're using the term "app" for what I consider "skill", so in the above just read "skill" as "app" :) [edit] So maybe as some kind of rough requirements list for intent handling:
(to be continued) |
This should already work on the Rhasspy 2.5 pre-release :-) Have a look at Rhasspy Voltron. That's why I was a bit puzzled why you would need a skill server for this. |
I can confirm that this flow seems to work for me with hss-server and Rhasspy Voltron - my understanding (from up the issue) was that that was where you were targetting @patrickjane ?
I'm not sure if this is quite what you mean, but I have added a couple of small tweaks to my local version of hss-server and hss-skill to add a Working example:
Rhasspy seems to implement that fine, and afaiu the dialogue state handling parameters from the Hermes protocol are implemented (but haven't tried) - I have tested the "no-matching-intent" dialogue event too, and that can be picked up, for conversational-response misses. The main downside is that, as it still has to match the intent (even if filtered), the possible responses must be sentences for that intent in Rhasspy, just as the original command is. I do recall seeing suggestion in the forums of modally switching the STT for follow-up, which would be nice, but at least if there was some way of making an intent, or certain sentences that trigger it, only matchable on follow-up dialogue (so words like "no" and "yes" wouldn't technically be valid opening commands), the next step, of switching speech-to-text from e.g. PocketSphinx to DeepSpeech in follow-up to give greater freedom, would be a bonus. |
On the language-independence as @koenvervloesem , I was thinking that too - I can see your reservations @patrickjane but if IMO a simple option from a skill-maker's perspective (which should be an almost drop-in replacement from skill-maker-flow perspective) would be WAMP with Autobahn - I have used this on a number of projects for near-transparent RPC between languages in a Python-native-feeling way (it also has the bonus of supporting event subscription ootb). Happy to PoC that, if it would be a potential option. That said, having MQTT already there, there's maybe an argument for RPC over MQTT, but it those options don't seem nearly as mature as either RPyC or Autobahn. A second benefit of this is that it'll work fine with venv or dockerized processes (Python or otherwise), and not increase the code a skill-creator would write. To @koenvervloesem 's question about where a skill-server would fit - I think @patrickjane 's point about abstracting MQTT protocol interaction away is important. I probably wouldn't have bothered getting started with those if it wasn't just a case of "fill in this And of course the bullets @patrickjane mentioned sound like things that, given the modular nature of Rhasspy, it would want to defer to a handler such as |
(and a language-independent RPC framework would avoid every language having to have a Hermes implementation as a library for skill-makers) |
Yeah, thats exactly the idea. Although I would have named it I havent worked on this since 2.5 is not yet released.
Thats what I mean with "rhasspy needs to support this". Doing full/plain new intent recognition for a follow up question is probably not more than a workaround I guess.
See the above. I'm gonna check it out when 2.5 is released.
I think my point was not so much about the protocol between skill-server and skill (which can with no issues be language agnostic, e.g. HTTP/JSON), but more about the dependencies and different handling for different languages. For example, node.js based skills would require So while its perfectly possible and fine to me to use a non-python RPC protocol, you would still have issues when installing the skill via |
Maybe we should close this issue, and move the discussion to the forums? I think we have some really good ideas, and we should continue to discuss? |
Fair! Like I say, tidying required :)
Indeed - given an intent filter is part of the conversation response API (which Rhasspy implements, afaict, and is a start), it does seems like that approach is not inconsistent with Hermes, at least. However, it would make sense for Rhasspy to do some minimal implementation here - even just to allow marking sentences as ineligible for initial intent matching. Conversely, a potential use-case for full intent recognition (by specifying more than one, or no, intents in the filter) would be to ask a question that could switch path to a different skill.
True, but perhaps its a question of level-of-abstraction - if the decision is not made at the protocol level, but potential skill-family helper classes could be made, then language-specific-functionality is not quite so baked in and encapsulated to installation/provisioning functionality (a simple Python install-class for JS might use nodeenv, for instance).
Yes, I think this touches on some broader questions that would be great to get input from the Rhasspy architects on (as you'd suggested).
Agreed - I think it's safe to say this has turned into solution development rather than issue resolution! If you want to post a link, we can jump over - would be keen to @koenvervloesem in that loop - have to say, from my brief look, I like the decorator skill syntax of Rhasspy Hermes App - wondering how hard it might be to use both hss-server and Rhasspy Hermes App together 🤔 |
Okay so I have posted here: https://community.rhasspy.org/t/hermes-skill-server-for-intent-handling/1054/3 Currently I am working on a proper hermes dialog implementation, and I also had some idea for a low-cost marketplace-thingy, I'll see if I can get this up and running until tomorrow. |
I want to use rhasspy to build a voice assistant which (for now) contains the following functionality:
As you can see, not all of those tasks are handled by home assistant. In fact, I have existing implementations (in python) for Snips.ai for all those tasks.
To reuse them, I am developing some kind of plugin-based skill server, where I can hook on my existing code from snips (with, of course, slight modifications).
Now, regarding rhasspy, I understand that it can have several endpoints for intent handling:
From the documentation I understand that rhasspy will HTTP POST any intent which was detected to my server. This works. I can see the intent JSON coming in at my skill server. However, it is unclear to me what kind of HTTP response is expected. From the documentation I can see it must be JSON, however I fail to find a detailed description of how this JSON should look like.
If I return an empty JSON, rhasspy complains:
TypeError: e.data.intent is undefined
(I'll get this error as popup in the rhasspy browser, not in the rhasspy log files)The log looks like:
If I return
{ "intent": {}}
, rhasspy complains:TypeError: e.data.time_sec is undefined
(I'll get this error as popup in the rhasspy browser, not in the rhasspy log files)The log looks like:
From the documentation I can see that in case of outputting speech, this should be given:
and in case of forwarding something (what?) to home assistant, this should be given:
(and in this case: what is 'rest of input JSON'?)
So, long story short, what do I need to send back to rhasspy after my remote server has successfully handled some intent which was detected by rhasspy and send to my remote server?
And what is the idea of the "forward to home assistant" feature? I mean if my remote server shall handle the intent, why forward anything else to home assistant? Is this meant to be some kind of light-wrapper for the HA-API in order to enable the remote server to easily generate HA events in addition to its very own intent handling?
The text was updated successfully, but these errors were encountered: