Benefit of Distraction

Convolutional attention networks are the current state of the art for obtaining physiological signals from video. In addition to providing a

ccurate physiological estimates, the attention masks also shed light on which regions of the video are clean and contribute to the signal. Although the regions ignored by the attention masks may not contain the pulsatile signal of interest, they may contain noise that is likely similar to the noise present in the attention region.

We use these ignored regions as a noise estimate which is similar to the noise present in the skin regions. We train an LSTM model joined with a convolutional attention network to learn a denoising mapping to recover cleaner signals. Our method significantly outperforms the state-of-the-art obtaining up to 40 % lower mean absolute error in heart rate estimation and up to 30 % lower mean absolute error in respiration rate estimation. Our model trained only on RGB videos also generalizes to near infrared videos without any additional training. [pdf]