Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contribute Guide on Deep Learning for Gesture Recognition on ESP32-S3 #350

Open
wants to merge 39 commits into
base: main
Choose a base branch
from

Conversation

BlakeHansen130
Copy link

Description

Hello Espressif team,

This PR adds an article discussing the implementation of deep learning-based gesture recognition on the ESP32-S3, covering the journey from training to deployment. The goal of this article is to provide developers with a complete guide, helping them understand:

  • How to train a deep learning model for gesture recognition.
  • How to optimize the model for embedded deployment on ESP32-S3.
  • How to deploy and run the model on the ESP32-S3.

I believe this article could serve as a valuable resource for developers looking to apply machine learning to embedded systems, and I'm eager to contribute to the Espressif community.

Related

This contribution is related to the discussion in issue #344. Please feel free to refer to it for more context.

Testing

The article has undergone thorough grammar and content reviews to ensure clarity and accuracy. I have also tested the example code provided in the guide and verified that the steps work properly in a typical setup.

Checklist

  • 🚨 This PR does not introduce breaking changes.
  • All CI checks (GH Actions) pass.
  • Documentation is updated as needed.
  • Tests are updated or added as necessary.
  • Code is well-commented, especially in complex areas.
  • Git history is clean — commits are squashed to the minimum necessary.

Please let me know if there are any areas where further improvements are needed or if there is additional information I can provide to make this article more helpful. I am open to any feedback and happy to make modifications as required.

Thank you very much for your time and consideration.

Best regards,
BlakeHansen130

@pedrominatel pedrominatel added the needs review Needs someone to be assigned to review label Dec 4, 2024
@pedrominatel
Copy link
Member

Thank you very much @BlakeHansen130 for your contribution!
We will start the review process and as soon as it's approved, this article will be published.

@BlakeHansen130
Copy link
Author

Thank you very much for starting the review process! I'll keep an eye out for any feedback and will be happy to make any changes necessary.

@@ -0,0 +1,382 @@
---
title: "Deep Learning for Gesture Recognition on ESP32-S3 from Training to Deployment"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest changing the title to:

"Deep Learning for Gesture Recognition on ESP32-S3: From Training to Deployment"

- ESP-DL
- Model Quantization
---
Integrating deep learning capabilities into embedded systems has become a crucial aspect of modern IoT applications. Although powerful deep learning models can achieve high recognition accuracy, deploying these models on resource-constrained devices poses considerable challenges. This article presents a gesture recognition system based on the ESP32-S3, detailing the entire workflow from model training to deployment on embedded hardware. The complete project implementation and code are available at [gesture-recognition-model](https://github.com/BlakeHansen130/gesture-recognition-model). By utilizing ESP-DL(master branch) and incorporating efficient quantization strategies with ESP-PPQ, this study demonstrates the feasibility of achieving gesture recognition on resource-limited devices while maintaining satisfactory accuracy. Additionally, insights and methodologies were inspired by the work described in [Espressif's blog on hand gesture recognition](https://developer.espressif.com/blog/hand-gesture-recognition-on-esp32-s3-with-esp-deep-learning/), which significantly influenced the approach taken in this article.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an extra line.

include($ENV{IDF_PATH}/tools/cmake/project.cmake)
project(gesture_recognition)
```
__Note: Ensure that CMake can locate the esp-dl library cloned from GitHub by using relative paths to reference the esp-dl directory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra line.

@@ -0,0 +1,382 @@
---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the featured image? If you have questions about the image, please ask here.
The file should be placed at the same level as this MD file and the name should contain features and the format MUST be Webp.

@@ -0,0 +1,382 @@
---
title: "Deep Learning for Gesture Recognition on ESP32-S3 from Training to Deployment"
date: 2024-11-30
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The date should be updated to the expected publication date. We will ask you to change ass soon as this PR is approved.

- ESP-DL
- Model Quantization
---
Integrating deep learning capabilities into embedded systems has become a crucial aspect of modern IoT applications. Although powerful deep learning models can achieve high recognition accuracy, deploying these models on resource-constrained devices poses considerable challenges. This article presents a gesture recognition system based on the ESP32-S3, detailing the entire workflow from model training to deployment on embedded hardware. The complete project implementation and code are available at [gesture-recognition-model](https://github.com/BlakeHansen130/gesture-recognition-model). By utilizing ESP-DL(master branch) and incorporating efficient quantization strategies with ESP-PPQ, this study demonstrates the feasibility of achieving gesture recognition on resource-limited devices while maintaining satisfactory accuracy. Additionally, insights and methodologies were inspired by the work described in [Espressif's blog on hand gesture recognition](https://developer.espressif.com/blog/hand-gesture-recognition-on-esp32-s3-with-esp-deep-learning/), which significantly influenced the approach taken in this article.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a link to ESP-DL(master branch).

[ESP-DL](https://github.com/espressif/esp-dl)

- ESP-DL
- Model Quantization
---
Integrating deep learning capabilities into embedded systems has become a crucial aspect of modern IoT applications. Although powerful deep learning models can achieve high recognition accuracy, deploying these models on resource-constrained devices poses considerable challenges. This article presents a gesture recognition system based on the ESP32-S3, detailing the entire workflow from model training to deployment on embedded hardware. The complete project implementation and code are available at [gesture-recognition-model](https://github.com/BlakeHansen130/gesture-recognition-model). By utilizing ESP-DL(master branch) and incorporating efficient quantization strategies with ESP-PPQ, this study demonstrates the feasibility of achieving gesture recognition on resource-limited devices while maintaining satisfactory accuracy. Additionally, insights and methodologies were inspired by the work described in [Espressif's blog on hand gesture recognition](https://developer.espressif.com/blog/hand-gesture-recognition-on-esp32-s3-with-esp-deep-learning/), which significantly influenced the approach taken in this article.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for the esp-ppq.


The development process requires two distinct Conda environments to handle different stages of the workflow. The primary training environment, designated as 'dl_env', manages dataset preprocessing, model training, and basic evaluation tasks. A separate quantization environment, 'esp-dl', is specifically configured for model quantization, accuracy assessment, and ESP-DL format conversion.

For the deployment phase, ESP-IDF version 5.x is used, with specific testing conducted on v5.3.1. The implementation relies on the master branch of ESP-DL and ESP-PPQ for enhanced quantization capabilities. The specific versions used in this implementation can be obtained through:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid confusion, please add this way:

The ESP-IDF version 5.3.1 is used for the deployment and testing phase.

@@ -0,0 +1,382 @@
---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The article folder should be placed in the following structure:

└── content
    └── blog
        └── 2024
            └── 12

So, move the article folder inside the blog/2024/12/ folder.

…-from-Training-to-Deployment/index.md to content/blog/2024/12/Deep-Learning-for-Gesture-Recognition-on-ESP32-S3-from-Training-to-Deployment/index.md
…ESP32-S3-from-Training-to-Deployment/features.webp
@BlakeHansen130
Copy link
Author

Thanks for the suggested fixes! I've updated the guide according to the review comments. Ready to merge after your approval.

Copy link

🎉 A preview for this PR is available at: https://preview-developer.espressif.com/pr350/

@BlakeHansen130
Copy link
Author

Sorry @pedrominatel, I accidentally clicked the refresh button on your review. I've already updated the date as well.


## Experimental Results

The gesture recognition system on ESP32-S3 showed strong performance across multiple metrics. Both quantitative and qualitative evaluations were conducted.
Copy link
Member

@horw horw Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If multiple metrics were mentioned here, they should also be specified somewhere in the text, along with what can be considered as strong performance.

@BlakeHansen130
Copy link
Author

Thank you for your review and valuable comments, @horw. I have addressed your feedback with the following updates:

  1. Updated Model Definition: I added the detailed modified code in the model definition section to provide more clarity.
  2. Adjusted Subheadings: Some subheading titles have been revised to make them more appropriate and consistent with the content.
  3. Clarification on UART: I explained the relevance of UART to the article, as it is related to the console output for debugging and configuring the ESP32-S3 during deployment.
  4. Additional Visuals: I included comparison charts of quantization results, images showing the model’s performance on the ESP32-S3, and a comprehensive workflow diagram.

Please let me know if there’s anything further I can clarify or improve. Thank you again for your insightful feedback!

Copy link

🎉 A preview for this PR is available at: https://preview-developer.espressif.com/pr350/

…ESP32-S3-from-Training-to-Deployment/img/features.webp to content/blog/2024/12/Deep-Learning-for-Gesture-Recognition-on-ESP32-S3-from-Training-to-Deployment/features.webp
Copy link

🎉 A preview for this PR is available at: https://preview-developer.espressif.com/pr350/

@horw
Copy link
Member

horw commented Dec 24, 2024

@BlakeHansen130 Thank you for the update. Overall, it is better than the previous version. However, for your further contributions, I would suggest the following improvements in my opinion:

  • The title and content should be more related.
  • The structure should be easier to follow.
  • The content should address the problem. If it's not relevant, perhaps you don’t need to mention it.
  • When you mention something, you need to explain it, even if briefly.
  • Avoid using general terms like "strong," "best," or "significantly." Provide concrete numbers; otherwise, the information is not valuable.
  • Include code snippets only if they are very important. For example, if you're demonstrate model structure, maybe it's better to use an image.
  • When you mention parameters, such as for deep learning layers or hyperparameters, try to explain why these numbers work better than others.

Consider whether your article offers enough information to effectively help readers solve the problem. Consider how your code snippets and images can help. When your article contains enough information to solve the related problem and helps me fully understand the steps you took and why, it will be a very valuable article.

P.S. In the ML field, I like how Andrej Karpathy explains things. For example, his explanation of transformer models. Or 3Blue1Brown

@f-hollow
Copy link
Collaborator

The folder name (URL slug) looks long, suggest shortening it from deep-learning-for-gesture-recognition-on-esp32-s3-from-training-to-deployment/ to esp32-s3-deep-learning-gesture-recongition.

@f-hollow
Copy link
Collaborator

f-hollow commented Dec 24, 2024

Hi @BlakeHansen130 ,

I am Kirill - a technical writer at Espressif. I do not possess sufficient technical knowledge in your particular area, but I have some experience structuring knowledge, shaping content, and presenting it to readers, both in technical documentation and articles for developers.

I have read through your article and checked your project's repo. From what I understand, your article describes the methodology that you used to implement your project and gives some insights into the steps you took and techniques you employed.

However, the purpose of your article is still not very clear. Let me explain what I mean.

You state clearly the following:

This article provides an overview of the complete development process for a gesture recognition system, encompassing dataset preparation, model training, and deployment.

I see that you provide an overview of the development process. However, you cover only certain sides of the development process and other sides are only briefly touched on with phrases, such as carefully designed to achieve optimal performance, implemenmts a sophisticated memory management strategy, and strategically allocated to optimize performance.

Usually before you write content, you decide on your target audience and the goal you want to achieve. I can see that your article might have the following potential purposes:

  1. Science
    • Target audience: Scientific researchers looking for insights into optimizing the algorithms to enable running AI applications on Espressif SoCs.
    • Your goal: Share your achievements and benchmarks in optimizing algorithms and data processing.
  2. Teaching
    • Target audience: Embedded developers who are interested in implementing AI applications on Espressif SoCs.
    • Your goal: Provide an easy-to-follow (ideally, as much as possible) tutorial that a regular developer can follow and achieve the same results as you have achieved.
  3. Advertizing
    • Target audience: Potential buyers or investors for your product.
    • Your goal: Demonstrate what you achieved and how it can be used commercially in order to earn income or attract investments.

Let's quickly go through all the purposes and how they compare against your current state of the article:

  1. Science: I am not very experienced with scientific and academic writing, but I can assume that you need to focus on certain achievements, present the current state-of-the-art benchmarks from reputable sources and share your benchmarks. In this way, it will be clear that you have proven methods to achieve better results that might be of interest to this target audience.
  2. Teaching: You only provide broad statements and a repo with data and code. A regular developer would have no idea what to do to teach their own model and deploy it on an Espressif SoC. If this is what you want, you need to turn your article into a tutorial - a number of reproducible steps that a developer can follow to achieve the goal of having a binary on your ESP32 that does gesture recognition without requiring too much research on the part of the developer (that's exactly what tutorials are for).
  3. Advertizing: If you want to advertize your advances, first and foremost, you need to provide some evidence that demonstrates your claims: the ESP32 recognizing your hand gestures in real time. For that, you might need to provide a video demonstration and maybe a way to download your binary and flash it onto a chip, so that others can try it and confirm that it works.

Now after this analysis let's get back to your article. Can you please tell me what target audience and what your goal is?

  • If you don't want to look at it this way, please share your vision on what you want to achieve with your content.
  • If you simply didn't think about the purpose, it is never too late to decide on what you want and restucture your article accordingly. I will gladly help you with that as an editor if your purpose still fits the interests of the Espressif Developer Portal.

Now please let me know what you think.

@BlakeHansen130
Copy link
Author

@f-hollow Thank you for the feedback! During my undergraduate design project, I was inspired by Bukharai's gesture recognition tutorial. After successfully implementing the complete workflow with newer versions of esp-idf, esp-dl and esp-ppq (as discussed in alibukharai/Blogs#10), I wanted to share this article to help others exploring similar implementations.

…s in the article, added some experimental data and model visualization picture
@BlakeHansen130
Copy link
Author

@horw Thank you for your feedback. Here are my implemented changes:

  1. Updated the title to better match content
  2. Added detailed model structure images
  3. Replaced subjective terms with concrete metrics

Regarding the structure - I've maintained it as is because it directly mirrors my project's implementation workflow. Each section progresses naturally from data preprocessing through model development and quantization to deployment, just like the organized hierarchy in my code repository. This parallel between article structure and actual development process helps readers understand both the theoretical concepts and practical implementation steps simultaneously.

I'm also a fan of 3Blue1Brown's excellent teaching style and continue learning from his videos, and I'm still working toward that level of clarity in my explanations.

Copy link

🎉 A preview for this PR is available at: https://preview-developer.espressif.com/pr350/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs review Needs someone to be assigned to review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants