New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Contribute Guide on Deep Learning for Gesture Recognition on ESP32-S3 #350

Open

BlakeHansen130 wants to merge 39 commits into espressif:main from BlakeHansen130:main

BlakeHansen130 commented Dec 4, 2024

Description

Hello Espressif team,

This PR adds an article discussing the implementation of deep learning-based gesture recognition on the ESP32-S3, covering the journey from training to deployment. The goal of this article is to provide developers with a complete guide, helping them understand:

How to train a deep learning model for gesture recognition.
How to optimize the model for embedded deployment on ESP32-S3.
How to deploy and run the model on the ESP32-S3.

I believe this article could serve as a valuable resource for developers looking to apply machine learning to embedded systems, and I'm eager to contribute to the Espressif community.

Related

This contribution is related to the discussion in issue #344. Please feel free to refer to it for more context.

Testing

The article has undergone thorough grammar and content reviews to ensure clarity and accuracy. I have also tested the example code provided in the guide and verified that the steps work properly in a typical setup.

Checklist

🚨 This PR does not introduce breaking changes.
All CI checks (GH Actions) pass.
Documentation is updated as needed.
Tests are updated or added as necessary.
Code is well-commented, especially in complex areas.
Git history is clean — commits are squashed to the minimum necessary.

Please let me know if there are any areas where further improvements are needed or if there is additional information I can provide to make this article more helpful. I am open to any feedback and happy to make modifications as required.

Thank you very much for your time and consideration.

Best regards,
BlakeHansen130

BlakeHansen130 added 6 commits

November 30, 2024 19:28


          Create _index.md

f3a8f9a


          Create gao-jiaxuan.json

f44db0b


          upload the body of blog

16c4622


          Update gao-jiaxuan.json

3a4f1e0


          Update index.md

2a546f3


          Merge pull request #1 from BlakeHansen130/deep-learning-gesture-esp32s3

98b895e

Deep learning gesture esp32s3

pedrominatel requested review from georgik, pedrominatel and f-hollow

December 4, 2024 15:33

pedrominatel added the needs review label

Member

pedrominatel commented Dec 4, 2024

Thank you very much @BlakeHansen130 for your contribution!
We will start the review process and as soon as it's approved, this article will be published.

Author

BlakeHansen130 commented Dec 5, 2024

Thank you very much for starting the review process! I'll keep an eye out for any feedback and will be happy to make any changes necessary.

pedrominatel reviewed

View reviewed changes

.../blog/Deep-Learning-for-Gesture-Recognition-on-ESP32-S3-from-Training-to-Deployment/index.md Outdated

		@@ -0,0 +1,382 @@
		---
		title: "Deep Learning for Gesture Recognition on ESP32-S3 from Training to Deployment"

Member

pedrominatel Dec 6, 2024

I suggest changing the title to:

"Deep Learning for Gesture Recognition on ESP32-S3: From Training to Deployment"

.../blog/Deep-Learning-for-Gesture-Recognition-on-ESP32-S3-from-Training-to-Deployment/index.md Outdated

+                - ESP-DL
+                - Model Quantization
+              ---
+              Integrating deep learning capabilities into embedded systems has become a crucial aspect of modern IoT applications. Although powerful deep learning models can achieve high recognition accuracy, deploying these models on resource-constrained devices poses considerable challenges. This article presents a gesture recognition system based on the ESP32-S3, detailing the entire workflow from model training to deployment on embedded hardware. The complete project implementation and code are available at [gesture-recognition-model](https://github.com/BlakeHansen130/gesture-recognition-model). By utilizing ESP-DL(master branch) and incorporating efficient quantization strategies with ESP-PPQ, this study demonstrates the feasibility of achieving gesture recognition on resource-limited devices while maintaining satisfactory accuracy. Additionally, insights and methodologies were inspired by the work described in [Espressif's blog on hand gesture recognition](https://developer.espressif.com/blog/hand-gesture-recognition-on-esp32-s3-with-esp-deep-learning/), which significantly influenced the approach taken in this article.

Member

pedrominatel Dec 6, 2024

Add an extra line.

.../blog/Deep-Learning-for-Gesture-Recognition-on-ESP32-S3-from-Training-to-Deployment/index.md Outdated

+              include($ENV{IDF_PATH}/tools/cmake/project.cmake)
+              project(gesture_recognition)
+              ```
+              __Note: Ensure that CMake can locate the esp-dl library cloned from GitHub by using relative paths to reference the esp-dl directory.

Member

pedrominatel Dec 6, 2024

Extra line.

.../blog/Deep-Learning-for-Gesture-Recognition-on-ESP32-S3-from-Training-to-Deployment/index.md Outdated

		@@ -0,0 +1,382 @@
		---

Member

pedrominatel Dec 6, 2024

Could you add the featured image? If you have questions about the image, please ask here.
The file should be placed at the same level as this MD file and the name should contain features and the format MUST be Webp.

.../blog/Deep-Learning-for-Gesture-Recognition-on-ESP32-S3-from-Training-to-Deployment/index.md Outdated

@@ @@ -0,0 +1,382 @@ @@
+              ---
+              title: "Deep Learning for Gesture Recognition on ESP32-S3 from Training to Deployment"
+              date: 2024-11-30

Member

pedrominatel Dec 6, 2024

The date should be updated to the expected publication date. We will ask you to change ass soon as this PR is approved.

.../blog/Deep-Learning-for-Gesture-Recognition-on-ESP32-S3-from-Training-to-Deployment/index.md Outdated

+                - ESP-DL
+                - Model Quantization
+              ---
+              Integrating deep learning capabilities into embedded systems has become a crucial aspect of modern IoT applications. Although powerful deep learning models can achieve high recognition accuracy, deploying these models on resource-constrained devices poses considerable challenges. This article presents a gesture recognition system based on the ESP32-S3, detailing the entire workflow from model training to deployment on embedded hardware. The complete project implementation and code are available at [gesture-recognition-model](https://github.com/BlakeHansen130/gesture-recognition-model). By utilizing ESP-DL(master branch) and incorporating efficient quantization strategies with ESP-PPQ, this study demonstrates the feasibility of achieving gesture recognition on resource-limited devices while maintaining satisfactory accuracy. Additionally, insights and methodologies were inspired by the work described in [Espressif's blog on hand gesture recognition](https://developer.espressif.com/blog/hand-gesture-recognition-on-esp32-s3-with-esp-deep-learning/), which significantly influenced the approach taken in this article.

Member

pedrominatel Dec 6, 2024

Add a link to ESP-DL(master branch).

[ESP-DL](https://github.com/espressif/esp-dl)

.../blog/Deep-Learning-for-Gesture-Recognition-on-ESP32-S3-from-Training-to-Deployment/index.md Outdated

+                - ESP-DL
+                - Model Quantization
+              ---
+              Integrating deep learning capabilities into embedded systems has become a crucial aspect of modern IoT applications. Although powerful deep learning models can achieve high recognition accuracy, deploying these models on resource-constrained devices poses considerable challenges. This article presents a gesture recognition system based on the ESP32-S3, detailing the entire workflow from model training to deployment on embedded hardware. The complete project implementation and code are available at [gesture-recognition-model](https://github.com/BlakeHansen130/gesture-recognition-model). By utilizing ESP-DL(master branch) and incorporating efficient quantization strategies with ESP-PPQ, this study demonstrates the feasibility of achieving gesture recognition on resource-limited devices while maintaining satisfactory accuracy. Additionally, insights and methodologies were inspired by the work described in [Espressif's blog on hand gesture recognition](https://developer.espressif.com/blog/hand-gesture-recognition-on-esp32-s3-with-esp-deep-learning/), which significantly influenced the approach taken in this article.

Member

pedrominatel Dec 6, 2024

Same for the esp-ppq.

.../blog/Deep-Learning-for-Gesture-Recognition-on-ESP32-S3-from-Training-to-Deployment/index.md Outdated


		The development process requires two distinct Conda environments to handle different stages of the workflow. The primary training environment, designated as 'dl_env', manages dataset preprocessing, model training, and basic evaluation tasks. A separate quantization environment, 'esp-dl', is specifically configured for model quantization, accuracy assessment, and ESP-DL format conversion.

		For the deployment phase, ESP-IDF version 5.x is used, with specific testing conducted on v5.3.1. The implementation relies on the master branch of ESP-DL and ESP-PPQ for enhanced quantization capabilities. The specific versions used in this implementation can be obtained through:

Member

pedrominatel Dec 6, 2024

To avoid confusion, please add this way:

The ESP-IDF version 5.3.1 is used for the deployment and testing phase.

.../blog/Deep-Learning-for-Gesture-Recognition-on-ESP32-S3-from-Training-to-Deployment/index.md Outdated

		@@ -0,0 +1,382 @@
		---

Member

pedrominatel Dec 6, 2024

The article folder should be placed in the following structure:

└── content
    └── blog
        └── 2024
            └── 12

So, move the article folder inside the blog/2024/12/ folder.

BlakeHansen130 added 6 commits

December 8, 2024 21:27


          Update index.md

7da5278


          Rename content/blog/Deep-Learning-for-Gesture-Recognition-on-ESP32-S3…

ad8020c

…-from-Training-to-Deployment/index.md to content/blog/2024/12/Deep-Learning-for-Gesture-Recognition-on-ESP32-S3-from-Training-to-Deployment/index.md


          Add files via upload

e49f5fe


          Delete content/blog/2024/12/Deep-Learning-for-Gesture-Recognition-on-…

eb69bb1

…ESP32-S3-from-Training-to-Deployment/features.webp


          Add files via upload

832d38d


          Merge branch 'espressif:main' into main

6f081d6

Author

BlakeHansen130 commented Dec 8, 2024

Thanks for the suggested fixes! I've updated the guide according to the review comments. Ready to merge after your approval.


          Merge branch 'espressif:main' into main

b498aaf

github-actions bot commented Dec 16, 2024

🎉 A preview for this PR is available at: https://preview-developer.espressif.com/pr350/

BlakeHansen130 added 3 commits

December 17, 2024 09:21


          更新 index.md

52975ea


          更新 gao-jiaxuan.json

f7e7e7a


          Merge branch 'espressif:main' into main

c7c54f1

BlakeHansen130 requested a review from pedrominatel

December 17, 2024 04:47

Author

BlakeHansen130 commented Dec 17, 2024

Sorry @pedrominatel, I accidentally clicked the refresh button on your review. I've already updated the date as well.

BlakeHansen130 added 3 commits

December 19, 2024 16:13


          Merge branch 'espressif:main' into main

1140cdb


          Update index.md

1630a08


          Update index.md

4157e77

horw reviewed

View reviewed changes

...24/12/Deep-Learning-for-Gesture-Recognition-on-ESP32-S3-from-Training-to-Deployment/index.md Outdated


		## Experimental Results

		The gesture recognition system on ESP32-S3 showed strong performance across multiple metrics. Both quantitative and qualitative evaluations were conducted.

Member

horw Dec 20, 2024 •

edited

Loading

If multiple metrics were mentioned here, they should also be specified somewhere in the text, along with what can be considered as strong performance.

BlakeHansen130 added 17 commits

December 21, 2024 11:32


          Merge branch 'espressif:main' into main

14eb026


          Update index.md

60dc5dd

Waiting for pictures to be added


          Create 1

d3372e5


          Update index.md

f5d7ab4


          Update index.md

d22718a

Modified the article description of the quantitative part


          Add files via upload

3668f7e


          Delete content/blog/2024/12/Deep-Learning-for-Gesture-Recognition-on-…

2cefe11

…ESP32-S3-from-Training-to-Deployment/img/1


          Rename content/blog/2024/12/Deep-Learning-for-Gesture-Recognition-on-…

eaae7d7

…ESP32-S3-from-Training-to-Deployment/features.webp to content/blog/2024/12/Deep-Learning-for-Gesture-Recognition-on-ESP32-S3-from-Training-to-Deployment/img/features.webp


          Update index.md

d28ef35

Uploaded some pictures


          Rename 8bit quantization.webp to 8bit_quantization.webp

10c63eb


          Rename 16bit quantization.webp to 16bit_quantization.webp

91cebe7


          Rename Collage_20241221_171845.webp to from_dataset.webp

606c563


          Rename Collage_20241221_172419.webp to from_Internet.webp

fd4fcc3


          Rename Layerwise Equalization quantization.webp to Layerwise_Equaliza…

a2487bb

…tion_quantization.webp


          Update index.md

2a8247c

Inserted some pictures


          Add files via upload

6991d0d


          Update index.md

d79add3

Added article flow chart

Author

BlakeHansen130 commented Dec 21, 2024

Thank you for your review and valuable comments, @horw. I have addressed your feedback with the following updates:

Updated Model Definition: I added the detailed modified code in the model definition section to provide more clarity.
Adjusted Subheadings: Some subheading titles have been revised to make them more appropriate and consistent with the content.
Clarification on UART: I explained the relevance of UART to the article, as it is related to the console output for debugging and configuring the ESP32-S3 during deployment.
Additional Visuals: I included comparison charts of quantization results, images showing the model’s performance on the ESP32-S3, and a comprehensive workflow diagram.

Please let me know if there’s anything further I can clarify or improve. Thank you again for your insightful feedback!

github-actions bot commented Dec 21, 2024

🎉 A preview for this PR is available at: https://preview-developer.espressif.com/pr350/

BlakeHansen130 added 2 commits

December 22, 2024 13:34


          Update index.md

fbd1711


          Rename content/blog/2024/12/Deep-Learning-for-Gesture-Recognition-on-…

78fb7ca

…ESP32-S3-from-Training-to-Deployment/img/features.webp to content/blog/2024/12/Deep-Learning-for-Gesture-Recognition-on-ESP32-S3-from-Training-to-Deployment/features.webp

github-actions bot commented Dec 23, 2024

🎉 A preview for this PR is available at: https://preview-developer.espressif.com/pr350/

Member

horw commented Dec 24, 2024 •

edited

Loading

@BlakeHansen130 Thank you for the update. Overall, it is better than the previous version. However, for your further contributions, I would suggest the following improvements in my opinion:

The title and content should be more related.
The structure should be easier to follow.
The content should address the problem. If it's not relevant, perhaps you don’t need to mention it.
When you mention something, you need to explain it, even if briefly.
Avoid using general terms like "strong," "best," or "significantly." Provide concrete numbers; otherwise, the information is not valuable.
Include code snippets only if they are very important. For example, if you're demonstrate model structure, maybe it's better to use an image.
When you mention parameters, such as for deep learning layers or hyperparameters, try to explain why these numbers work better than others.

Consider whether your article offers enough information to effectively help readers solve the problem. Consider how your code snippets and images can help. When your article contains enough information to solve the related problem and helps me fully understand the steps you took and why, it will be a very valuable article.

P.S. In the ML field, I like how Andrej Karpathy explains things. For example, his explanation of transformer models. Or 3Blue1Brown

Collaborator

f-hollow commented Dec 24, 2024

The folder name (URL slug) looks long, suggest shortening it from deep-learning-for-gesture-recognition-on-esp32-s3-from-training-to-deployment/ to esp32-s3-deep-learning-gesture-recongition.

Collaborator

f-hollow commented Dec 24, 2024 •

edited

Loading

Hi @BlakeHansen130 ,

I am Kirill - a technical writer at Espressif. I do not possess sufficient technical knowledge in your particular area, but I have some experience structuring knowledge, shaping content, and presenting it to readers, both in technical documentation and articles for developers.

I have read through your article and checked your project's repo. From what I understand, your article describes the methodology that you used to implement your project and gives some insights into the steps you took and techniques you employed.

However, the purpose of your article is still not very clear. Let me explain what I mean.

You state clearly the following:

This article provides an overview of the complete development process for a gesture recognition system, encompassing dataset preparation, model training, and deployment.

I see that you provide an overview of the development process. However, you cover only certain sides of the development process and other sides are only briefly touched on with phrases, such as carefully designed to achieve optimal performance, implemenmts a sophisticated memory management strategy, and strategically allocated to optimize performance.

Usually before you write content, you decide on your target audience and the goal you want to achieve. I can see that your article might have the following potential purposes:

Science
- Target audience: Scientific researchers looking for insights into optimizing the algorithms to enable running AI applications on Espressif SoCs.
- Your goal: Share your achievements and benchmarks in optimizing algorithms and data processing.
Teaching
- Target audience: Embedded developers who are interested in implementing AI applications on Espressif SoCs.
- Your goal: Provide an easy-to-follow (ideally, as much as possible) tutorial that a regular developer can follow and achieve the same results as you have achieved.
Advertizing
- Target audience: Potential buyers or investors for your product.
- Your goal: Demonstrate what you achieved and how it can be used commercially in order to earn income or attract investments.

Let's quickly go through all the purposes and how they compare against your current state of the article:

Science: I am not very experienced with scientific and academic writing, but I can assume that you need to focus on certain achievements, present the current state-of-the-art benchmarks from reputable sources and share your benchmarks. In this way, it will be clear that you have proven methods to achieve better results that might be of interest to this target audience.
Teaching: You only provide broad statements and a repo with data and code. A regular developer would have no idea what to do to teach their own model and deploy it on an Espressif SoC. If this is what you want, you need to turn your article into a tutorial - a number of reproducible steps that a developer can follow to achieve the goal of having a binary on your ESP32 that does gesture recognition without requiring too much research on the part of the developer (that's exactly what tutorials are for).
Advertizing: If you want to advertize your advances, first and foremost, you need to provide some evidence that demonstrates your claims: the ESP32 recognizing your hand gestures in real time. For that, you might need to provide a video demonstration and maybe a way to download your binary and flash it onto a chip, so that others can try it and confirm that it works.

Now after this analysis let's get back to your article. Can you please tell me what target audience and what your goal is?

If you don't want to look at it this way, please share your vision on what you want to achieve with your content.
If you simply didn't think about the purpose, it is never too late to decide on what you want and restucture your article accordingly. I will gladly help you with that as an editor if your purpose still fits the interests of the Espressif Developer Portal.

Now please let me know what you think.

Author

BlakeHansen130 commented Dec 24, 2024

@f-hollow Thank you for the feedback! During my undergraduate design project, I was inspired by Bukharai's gesture recognition tutorial. After successfully implementing the complete workflow with newer versions of esp-idf, esp-dl and esp-ppq (as discussed in alibukharai/Blogs#10), I wanted to share this article to help others exploring similar implementations.


          Modified the folder name and title, removed some subjective adjective…

c30fb32

…s in the article, added some experimental data and model visualization picture

Author

BlakeHansen130 commented Dec 24, 2024

@horw Thank you for your feedback. Here are my implemented changes:

Updated the title to better match content
Added detailed model structure images
Replaced subjective terms with concrete metrics

Regarding the structure - I've maintained it as is because it directly mirrors my project's implementation workflow. Each section progresses naturally from data preprocessing through model development and quantization to deployment, just like the organized hierarchy in my code repository. This parallel between article structure and actual development process helps readers understand both the theoretical concepts and practical implementation steps simultaneously.

I'm also a fan of 3Blue1Brown's excellent teaching style and continue learning from his videos, and I'm still working toward that level of clarity in my explanations.

github-actions bot commented Dec 24, 2024

🎉 A preview for this PR is available at: https://preview-developer.espressif.com/pr350/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels