Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable camera functionality and call aliyun vision llm api for furthe… #240

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

cyphercheng
Copy link

…r chatting

  1. apply your dashscope API key from https://bailian.console.aliyun.com/?apiKey=1
  2. export the applied key to your local environment variable like $env:DASHSCOPE_API_KEY="sk-xxx" or
    export DASHSCOPE_API_KEY=sk-xxx
  3. check your api key is successfully exported by echo $env:DASHSCOPE_API_KEY or
    echo $DASHSCOPE_API_KEY
  4. idf.py menuconfig to tick on CONFIG_ESP_TLS_INSECURE and CONFIG_ESP_TLS_SKIP_SERVER_CERT_VERIFY if they haven't been selected.
  5. idf.py menuconfig to select esp_sparkbot board since the function is only tested on this hardware
  6. idf.py flash monitor to enjoy the demo

…r chatting

1. apply your dashscope API key from https://bailian.console.aliyun.com/?apiKey=1
2. export the applied key to your local environment variable like
    $env:DASHSCOPE_API_KEY="sk-xxx"
or
    export DASHSCOPE_API_KEY=sk-xxx
3. check your api key is successfully exported by
    echo $env:DASHSCOPE_API_KEY
or
    echo $DASHSCOPE_API_KEY
4. idf.py menuconfig to tick on CONFIG_ESP_TLS_INSECURE and
   CONFIG_ESP_TLS_SKIP_SERVER_CERT_VERIFY if they haven't been selected.
5. idf.py menuconfig to select esp_sparkbot board since the function is
   only tested on this hardware
6. idf.py flash monitor to enjoy the demo
…local machine for programming.

run this script in IDF virtual environment
$python scripts/esp32_remote_burn.py -p COMx
@yusuhua
Copy link
Contributor

yusuhua commented Mar 1, 2025

Very interesting, I think I can test it for you on my LilyGo T-CameraPlus-S3 board.

@yusuhua
Copy link
Contributor

yusuhua commented Mar 1, 2025

看到注释是中文,我就不说洋文了。
LilyGo T-CameraPlus-S3测试功能没问题,但正如你在“拍照”指令中的那段注释一样,执行过程会影响对话,希望能够优化一下。

/* TODO: need to move this block in another task rather than the background task in Application main loop, otherwise it will block the TTS playback task. */

当然,如果可以将拍到的照片作为背景图显示在屏幕上(下轮对话时可以清除),让用户看到具体拍的图片内容,会更好一些。

测试步骤:

  1. 本地将xiaozhi-esp32更新到最新(支持LilyGo T-CameraPlus-S3)
  2. 合并你的代码
  3. 改动main/boards/lilygo-t-cameraplus-s3文件夹下config.h和lilygo-t-cameraplus-s3.cc(我的是OV2640版本)
    改动后的文件.zip
  4. 编译烧录
  5. 按照main/iot/things/camera.cc中定义的指令与小智对话

有个小细节说一下,不知道你有没有遇到。首次烧录进去后,打开摄像头,然后发出“拍照”指令,系统会重启,下一次就没事了。第一次没打开串口监视,没看到日志,当我打开串口监视后,没有复现,猜测可能是我的阿里百炼key是新申请的。

@cyphercheng
Copy link
Author

看到注释是中文,我就不说洋文了。 LilyGo T-CameraPlus-S3测试功能没问题,但正如你在“拍照”指令中的那段注释一样,执行过程会影响对话,希望能够优化一下。

/* TODO: need to move this block in another task rather than the background task in Application main loop, otherwise it will block the TTS playback task. */

当然,如果可以将拍到的照片作为背景图显示在屏幕上(下轮对话时可以清除),让用户看到具体拍的图片内容,会更好一些。

测试步骤:

  1. 本地将xiaozhi-esp32更新到最新(支持LilyGo T-CameraPlus-S3)
  2. 合并你的代码
  3. 改动main/boards/lilygo-t-cameraplus-s3文件夹下config.h和lilygo-t-cameraplus-s3.cc(我的是OV2640版本)
    改动后的文件.zip
  4. 编译烧录
  5. 按照main/iot/things/camera.cc中定义的指令与小智对话

有个小细节说一下,不知道你有没有遇到。首次烧录进去后,打开摄像头,然后发出“拍照”指令,系统会重启,下一次就没事了。第一次没打开串口监视,没看到日志,当我打开串口监视后,没有复现,猜测可能是我的阿里百炼key是新申请的。

感谢你的测试反馈,后续看怎么就block main loop这个问题跟原始仓库作者讨论,估计需要改动到主循环里的逻辑。
第一次系统会重启这个现象我也有碰到,这主要是因为很多ESP32-S3的开源硬件,CAM接口的PWDN和RESET脚没有单独接到ESP32-S3上,导致初始化后第一帧图像数据会出现VSYNC不同步或No SOI之类的错误。修改硬件会是个更好的解决方式。

@tju-wang
Copy link

tju-wang commented Mar 14, 2025

Mark! 目前已经实现了语音控制拍照,拍照结果上传aliyun实现图像内容识别吗? 后续是否会考虑用ESP-WHO库实现人脸检测和人脸识别呢? 可能得看下,像s3的板子,内存等资源是否能支持小智和本地的人脸识别功能

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants