| dc.description.abstract |
Surveillance video has been widely applied in residential communities, serving as a critical basis for abnormal situation
detection, security early warning, and emergency assistance. However, existing machine learning techniques have
limitations in video understanding for complex scenarios, such as recognizing relationships between people and objects,
inferring individual motivations and psychological states, and detecting early signs of crowd gathering, which leads to
deficiencies in risk early warning and timely identification of the rescue needs of special individuals. The human brain
possesses rich experience and subconscious perceptual abilities in video understanding, and these subconscious activities
can be reflected in electroencephalogram (EEG) signals. To address the gaps in existing video understanding technologies,
this study proposes a neurofeedback reinforcement learning method based on single-channel EEG information, which can
be applied to lightweight deployed human-machine collaborative video surveillance systems. This research unfolds along
three aspects: (1) constructing a novel multimodal dataset by collecting neural responses to perceived visual stimuli,
synchronizing community monitoring videos with single-channel EEG recordings from human annotators; (2) developing
a neuro-cognitive mapping that translates implicit human neural responses into effectively computable reward signals for
reinforcement learning (RL); (3) using the neural-derived reward as a supervisory signal, training with calibrated data to
enhance sensitivity to anticipate latent cues for risk perception by extracting causal Theta (4–8 Hz) and Gamma (30–42
Hz) envelopes and fusing them via robust baseline normalization. We further convert the standardized neural activation
into a bounded, probability-like reward through a Sigmoid mapping, providing dense guidance during the pre-incident
“gradient vacuum” where visual labels are sparse and delayed. Experimental tests on real community surveillance videos
demonstrate that the proposed method significantly outperforms standalone machine learning approaches and yields an
earlier warning lead time relative to vision-only triggers. This work offers a cost-effective approach to improving video
understanding capabilities and provides a viable pathway for human-machine collaborative monitoring in residential
communities. |
en_US |