Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a intelligent smart home chat bot to the UI #2995

Open
1 of 4 tasks
florian-h05 opened this issue Jan 9, 2025 · 8 comments
Open
1 of 4 tasks

Add a intelligent smart home chat bot to the UI #2995

florian-h05 opened this issue Jan 9, 2025 · 8 comments

Comments

@florian-h05
Copy link
Contributor

florian-h05 commented Jan 9, 2025

I am thinking of having a smart home chatbot for openHAB 5, a bit like HABot but more intelligent, integrated into Main UI and not only limited to smart home related stuff.

This would require the following bits:

  • A powerful, LLM-based human language interpreter available: Something like [ChatGPT] Enhance binding openhab-addons#17320.
  • New core APIs that not only allow answering with text to a prompt (like current HLI interface), but provide a payload for the UI that can e.g. make it render a card widget to show an Item state. / Extend the dialog process to support text as well and take care of chat history.
  • Probably: Shared core logic to be used by LLM based HLIs for Item control, so we don't have to duplicate code for OpenAI API, Google Vertex API etc.
  • The chatbot UI. Framework7 provides something we can use for that: https://framework7.io/docs/messages

Unfortunately anything LLM-based is unlikely to work locally on embedded hardware like Pis or NAS, however is OpenAI's GPT 4o Mini relatively cheap and I don't have large privacy concerns as it is only used to interpret a text prompt - it does not have direct access to openHAB.

@florian-h05
Copy link
Contributor Author

And we can add the enhanced ChatGPT binding there, because yes as far as I understand we should be able to have an internal representation for the function call definitions and the conversation history and expose them to OpenAI or a different provider, I have seen other tools that allow to switch between OpenAI and Ollama for example.

Originally posted by @GiviMAD in #2275 (comment)

The enhanced ChatGPT binding PR is linked above.
I think that the function calling definitions and the chat history stuff should be provided by core and exposed by the add-ons to the providers.
Wrt to easily switching providers: I am currently working much with LangChain4j, which allows that easily, but I fear that a binding using LangChain4j might be (too) large.

For me it seems like if we modify the dialog processor to persist a temporal history and to accept a text instead of just voice we can merge both functionalities and have something cool without creating too much new things, at least seems like something wroth to explore. Let me know want you think if you have a chance to take a look.

Originally posted by @GiviMAD in #2275 (comment)

Fully agreed, though I haven’t had a look at the code for now.

@rkoshak
Copy link

rkoshak commented Jan 9, 2025

New core APIs that not only allow answering with text to a prompt (like current HLI interface), but provide a payload for the UI that can e.g. make it render a card widget to show an Item state. / Extend the dialog process to support text as well and take care of chat history.

In the other thread I brought up HABot because it seems to already be rendering widgets using MainUI F7 widgets. If I ask to get a single Item, I get that Item's default stand-alone widget.

image

However, if I ask for more than one Item I don't think I get my default list item widgets.

image

But there has to be something there already.

From a usability perspective, if you do not intend to use the semantic model in any way, how does the model know where devices are located? Will there be yet another set of metadata to add to the Items to encode this information?

@digitaldan
Copy link
Contributor

Very excited about the all the work @GiviMAD has done on voice (really!) , and thanks @florian-h05 for now also diving into this. If i was not already working on the Matter binding and iOS client (plus some myopenhab stuff) this would be my next top priority.

About a year ago (err, maybe much longer) i spent several weeks prototyping an integration using a LLM to do basic control of openHAB. I ended up getting distracted and having to move on to other things, but i think there is a TON of opportunity there.

I know @rkoshak mentioned this in another thread, but one thing that quickly become apparent is the importance of the semantic model. It not actually something i use on my home system, so i did not start with it. But..... it became very, very necessary when trying to describe my openHAB to a LLM. You quickly end up needing a concept of rooms and equipment, including the room where the user is currently at (if they are speaking in the living room, then when they say "lights on" you want only those lights on) . This also helps discard all the non essential items that are usually not included in the model (item for rules, sensors, etc...).

And i think keeping an eye on the emerging open source voice hardware is also super important. I would have no problem dropping my 10+ Alexa's if there was a viable on prem solution through openHAB, even if that means spending a lot more on hardware (including some GPUs).

Very excited !

@florian-h05
Copy link
Contributor Author

In the other thread I brought up HABot because it seems to already be rendering widgets using MainUI F7 widgets. If I ask to get a single Item, I get that Item's default stand-alone widget.

That's kind of interesting - I know HABot is built with Quasar (which uses Vue as well), but I would expect it to be independent from Main UI, as it has been around earlier IIRC.

From a usability perspective, if you do not intend to use the semantic model in any way, how does the model know where devices are located?

I just checked the ChatGPT binding code and the location is injected together with Item name, type, label & state into the request prompt.

And i think keeping an eye on the emerging open source voice hardware is also super important. I would have no problem dropping my 10+ Alexa's if there was a viable on prem solution through openHAB, even if that means spending a lot more on hardware (including some GPUs).

Same for me - I currently have one HomePod Mini that very often does not understand me, this is the only smart speaker as I have privacy concerns with using Alexa or Google Assistant (mainly speech data being transmitted as well as controlling my devices over their cloud), so having TTS and STT locally and only the interpretation of the text in the cloud would be very nice (I am not planning to buy GPUs ;-)).

From my experience, the smaller 7B models are pretty usable for everyday stuff. (Chatting around with them locally on my laptop from time to time.) Unfortunately do the small Llama models in Ollama always or never call tools, and the suggested system prompt didn't fix that for me, so I gave up for the time being using a small Llama (it wouldn't run on "embedded" hardware like Pi or NAS though, so no chance for me to run in 24/7).
I have been working much with LLMs in the last weeks while building https://github.com/llamara-ai/llamara-backend for a university project, and I have noticed that they produce decent results when using RAG, but large models such as GPT-4o mini follow instructions much better than those 7B models.

@digitaldan
Copy link
Contributor

so I gave up for the time being using a small Llama

Yeah, was also was switching between local and openAI. I ended up on a strategy to continue with the bigger cloud models for development, making sure i was not using anything in the API that would not be portable to other models. I figured by the time the functionality was complete, there would be better, smaller models available and they would only get better over time and we would be able to offer them as a choice at some point. Have not played yet with llama 3.2 models for anything serious, but the 3b model seemed promising for such as use case.

@digitaldan
Copy link
Contributor

Alexa or Google Assistant (mainly speech data being transmitted as well as controlling my devices over their cloud)

Not to keep promoting the Matter binding, but i am now using this for Alexa/Siri integration for everything which means all control is local, its just the speech data as you mention thats being sent back and forth in the cloud (control is noticeably snappier now). In any case, either my system is getting more complex and harder to process, or their systems are getting less accurate, but i feel like voice accuracy has been declining as of late.

@GiviMAD
Copy link
Member

GiviMAD commented Jan 10, 2025

Hello,

As an aside, for those looking for a device to power AI locally, I bought an Nvidia Orin Nano some months ago which can be found right now for over 300 euros. It only has 8gb of ram but with the whisper large turbo model and with llama 3.2 3b though ollama works quite good. Just in case it matches what you are looking for.

I was looking at the ChatGPT binding PR and I think we could start by adding some classes from there to core (something similar to the ChatFunction and ChatMessage classes) and then also add a new interface than implement from HumanLanguageInterpreter (maybe LLMInterpreter) with a interpret method that allows to pass the chat messages and the chat tools on each execution.

After that I don't know what is the better way to proceed. The idea that I like most is if we allow to have like different agents (we can see an agent as the conjunction of system prompt + available tools + conversation expiration time + LLMInterpreter) . I'll try to explain it with an example: You open the chat ui component and it will let you choose if you'll talk with the built-in "openhab agent" or with another agent that you have defined your self for example with your workout routine in the system prompt and a bigger expiration time so it covers the time you need for your training. And I think a future update could be choosing the agent automatically.

Do you think something like that makes sense?

@ghys
Copy link
Member

ghys commented Jan 12, 2025

Just chiming in to give some thoughts and context (could be more relevant to #2275 but this one is newer) about my journey and what I've tried to achieve and build over the years. For the record I don't use voice control at all these days, I might have been traumatized :)

It started when I figured I could try to use Snips (since acquired by Sonos), as they seemed to make really great strides in local STT with cheap SOCs (I didn't want to use more than that for home automation); I added a MQTT broker, and a "flows" UI I had built, and the goal was to bring voice control to my living room with an omni mic and a speaker for feedback.
https://community.openhab.org/t/integrate-a-snips-voice-assistant-with-openhab-walkthrough/34301
I remember making my first ever openHAB speech at the time about this.

It didn't work nearly well enough though, mostly due to the limited local STT and keyword spotting (and perhaps the mic quality and ambient noise etc.). Amazon Alexa devices were just way ahead on all these but I insisted on something local and so cloud-based was a no-no.
The tech just wasn't there at the time (a recurring pattern in this area in fact).

So then I made the HABot for the use case of just mostly getting a piece of UI relevant to your situation and current need (like, I could pull out my phone from my pocket and ask "show me the lights in the living room" to get a bunch of switches and sliders and adjust with these, instead of directing by voice which would fail half of the time, and also instead of navigating through pages of pre-made sitemaps which may not even be what I need).
Also the history part was important - recalling previous queries - as you really keep asking the same things, and as such there was a "card deck" registry and bookmarks and suggestions so you could always have what you need at your fingertips.

Of course OpenNLP allowed HABot to have more skills than getting those pieces of UI: it could do stuff like sending various commands to items, but then you hit the frustrating and sometimes embarrassing outcomes again - like when you ask "turn off the lights in X" and it didn't understand "X" so the intended {Intent: TurnOff, {object: Lights, location: X}} became {Intent: TurnOff, {object: Lights}} and therefore every single light in your home turns off... I assure you it's neither a good showcase to your dinner party guests (a completely random and hypothetical situation 😉) nor a confirmation that you can rely on it.
The culprit was the limited model training because again having it running on a RPi was a design goal (a lot of users do run openHAB on such hardware and have no desire to upgrade) and a limitation.

What I want to ultimately say is that I've been burned multiple times by expecting too much from a local NLP, but I really think ultimately I want it to happen still, and LLMs open a new frontier, so I will keep apprised and possibly involved.

From my experience, the smaller 7B models are pretty usable for everyday stuff. (Chatting around with them locally on my laptop from time to time.)

Not to keep promoting the Matter binding, but i am now using this for Alexa/Siri integration for everything which means all control is local

All of this is super interesting (I'll admit I'm behind the curve).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants