Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/SMIL-SPCRAS/AVCR-Net into main
Browse files Browse the repository at this point in the history
  • Loading branch information
MiSTeR1995 committed Feb 5, 2024
2 parents b4f5b2c + 213c646 commit fde6417
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# AVCR-Net
# AVCR-Former

The official repository for AVCR-Net
The official repository for AVCR-Former

## Abstract
> The article presents a methodology and evaluation for audio-visual speech recognition in driver assistive systems. Driver assistive systems require permanent interaction with driver and during the driving such interaction should be implemented based on voice control due to safety issues. The article introduces the audio-visual command recognition transformer (AVCR-Former) specifically designed for robust audio-visual speech recognition (AVSR). We propose (1) a multimodal fusion strategy based on spatio-temporal fusion of audio and video feature matrices, (2) a regulated transformer based on iterative model refinement module with multiple encoders, and (3) a classifier ensemble strategy based on multiple decoders. The spatio-temporal fusion strategy preserves the contextual information of both modalities and achieves their synchronization. The iterative model refinement module can bridge the gap between acoustic and visual data by compensating for the weaknesses of unimodal information. The proposed multi-prediction strategy demonstrates superior performance compared to traditional single-prediction strategy, showcasing the model's adaptability across diverse audio-visual contexts. Our proposed transformer achieved the highest values of accuracy, reaching 98.87% and 98.81% on the RUSAVIC and LRW corpora, respectively. This research has significant implications for advancing human-computer interaction. The capabilities of AVCR-Former extend beyond AVSR, making it a valuable contribution to the intersection of audio-visual processing and artificial intelligence.
Expand Down

0 comments on commit fde6417

Please sign in to comment.