Abstract:
Student emotion recognition in classroom and field practice training scenarios is a frequently encountered problem in
neuroeducation. Multichannel EEG emotion recognition can achieve high reliability by exploiting spatial distribution
information; however, wearable and real-world deployments are typically constrained to single-electrode acquisition,
leading to pronounced performance degradation. To address the performance drop induced by single-channel deployment
and the training–deployment input mismatch under the practical constraint of limited single-channel expressiveness,
this study proposes a deterministic sparse-gated framework for multichannel-to-single-channel knowledge transfer. A
multichannel teacher is first trained on full-channel data; a sparsemax-parameterized global sparse gate then forms a
convex combination of channels to produce a single-channel surrogate input, which is progressively hardened to a one
hot distribution, yielding deterministic channel compression without relying on stochastic sampling-based relaxations.
Beyond temperature-scaled logit distillation, we further introduce a physiology-aware band-consistency distillation
objective that, under a 4–45 Hz preprocessing setting, aligns θ/α/β/γ band-wise spectral characteristics aggregated in
the teacher with those observable from a single channel in the student, thereby mitigating performance loss caused by
insufficient access to global rhythmic information in the single-channel setting. Finally, the student model is retrained
under a fixed single-channel input to reduce discrepancies between the surrogate input used during training and the
real single-channel input encountered at deployment. Experiments on the DEAP dataset for valence and arousal binary
classification show that the proposed method yields more stable and better overall performance with a more reasonable
error structure on the Arousal task, improving balanced accuracy and accuracy over fixed-channel and logit-only
baselines while producing a rapidly hardened, deployment-interpretable channel choice. The results also reveal task
specific differences: for Valence, channel selection and prediction bias are more sensitive, and the proposed method
exhibits a more conservative “high TNR, low recall” pattern, suggesting that single-channel information ceiling and class
imbalance can substantially affect minority-class recognition. The novelty lies in unifying deterministic sparse-gated
channel compression, physiology-informed band-level consistency constraints, and deployment-consistent retraining
within an end-to-end framework, providing a reproducible and deployable pathway for lightweight single-channel EEG
emotion recognition in neuroeducation.