Audiovisual integration refers to the process in which visual and auditory information isintegrated into a single multisensory event, which can affect the encoding, maintenance, retrievalprocess of working memory. Along with the continuously go deep into of the multisensory research,multiple factors have been found to exert influences on working memory, but the cognitive and neuralmechanisms of the effect of audiovisual integration on working memory remain unclear. Audiovisualsemantic congruency refers to the congruency of content and meaning between the visual and auditorysensory inputs. Visual and auditory information complement and support each other, rather than conflictor contradict each other. Recent studies show that semantic congruency can improve multisensoryworking memory performance. Future studies needs to further explore the impact of audiovisualintegration on working memory maintenance in cognitive mechanism, and further explore the effect ofsemantic congruency on multisensory working memory by comprehensively considering the propertiesof materials and attention effects, so as to exend researches to people of different ages. In terms of neuralmechanism, future studies should enrich researches on the impact of attention on the working memoryretrieval process, and explore the role of different brain regions on multisensory working memory, so asto establish a more complete theoretical model.