Research

The Benefit of Distraction: Denoising Vitals using Inverse Attention

Convolutional attention networks are the current state of the art for obtaining physiological signals from video. In addition to providing accurate physiological estimates, the attention masks also shed light on which regions of the video are clean and contribute to the signal. Although the regions ignored by the attention masks may not contain the pulsatile signal of interest, they may contain noise that is likely similar to the noise present in the attention region.

We use these ignored regions as a noise estimate which is similar to the noise present in the skin regions. We train an LSTM model joined with a convolutional attention network to learn a denoising mapping to recover cleaner signals. Our method significantly outperforms the state-of-the-art obtaining up to 40 % lower mean absolute error in heart rate estimation and up to 30 % lower mean absolute error in respiration rate estimation. Our model trained only on RGB videos also generalizes to near infrared videos without any additional training. [pdf]

Impact of Skin Type and Gender on Non-contact Photoplethysmography Measurements

Computer vision datasets are not balanced in terms of gender and ethnic diversity. Machine learning models trained on such datasets are likely to be biased towards certain genders and ethnic groups, potentially putting some groups of people at risk of inaccurate measurements. We performed a large meta-analysis to evaluate how much gender and skin tone affect vital signs estimation from video for signal-processing-based  and supervised machine learning methods. We find that performance drops significantly on videos of people with very dark skin tones, especially for machine learning algorithms. This work was done in collaboration with Daniel McDuff at Microsoft Research and was published in CVPR-CVPM 2020 [pdf].

3D Face Tracking

We use 3D face tracking to estimate the position of facial landmarks with pixel-level accuracy, even in presence of large head motion. We show that our method using 3D tracking performs better than standard 2D tracking. This work was done in collaboration with Prof. Hiroshi Kawasaki from Kyushu University and was published in EMBC 2020 [pdf].

Overcoming video compression for camera-based vital signs

Video compression removes subtle spatial and temporal information to save memory. The removed information is often redundant and not important for visual video quality, but it does contain crucial information about the minuscule intensity variations in the skin caused by varying blood volume. Therefore, the state-of-the-art imaging photoplethysmography methods fail to recover vital signs from very compressed videos. We show that deep learning models can learn how noise at different video compression levels affects the iPPG signals and are able to reliably recover vital signs from highly compressed videos, even in presence of large motion. This work was done in collaboration with Daniel McDuff at Microsoft Research and was published in ICCV-CVPM 2019 [pdf] [poster] and extended to a journal publication in Biomedical Optics Express in 2020 [pdf].

SparsePPG: camera-based vital signs in NIR for driver monitoring

MERL-Rice Near-INfrared Pulse (MR. NIRP) indoor and driving datasets download link.

There are several challenges for camera-based vital signs measurements unique to the driver monitoring context which current remote photoplethysmography (rPPG) algorithms cannot account for. There are drastic illumination changes on the driver’s face and the amount of motion during driving is significant.

We have built an active narrowband near infrared (NIR) illumination system with a matching narrow filter on the camera to significantly reduce the outside light variations reaching the driver’s face. Specifically, we have found that using 940 nm wavelength reduces sunlight effects the best. However the SNR of rPPG signals is much lower in NIR than in the visible wavelengths making our system more prone to noise.

To account for the low SNR of rPPG signals recorded with NIR cameras, we developed an optimization-based rPPG signal tracking and denoising algorithm (SparsePPG) based on Robust Principal Components Analysis and sparse frequency spectrum estimation. We have collected data in the lab and in the car with both NIR and broadband RGB cameras and we have shown that our NIR system performs better during driving than RGB and achieves comparable accuracy in the lab to the benchmark RGB camera.

I have interned and collaborated with Tim Marks and Hassan Mansour Mitsubishi Electric Research Lab on this project. This work was published initially published in CVPR-CVPM 2018 [pdf] and extended to a journal publication in Trans. on Intelligent Transportation Systems in 2020 [pdf].

We received the best graduate poster and demo award for this work at 2019 ECE Corporate Affiliates Day at Rice [poster].

PPGSecure: Vital Signs for Liveness Detection and Face Antispoofing

slide1

Although the biometrics-based authentication systems are already widely used, they are still vulnerable to spoofing attacks where an attacker can gain access to a user’s biometric and fool the system. For example, an image of the user can be easily obtained from their social media page and it can be used to spoof the authentication system. I am working on vital signs extraction from a camera in order to verify the liveness of the face presented to the authentication system. Hemoglobin present in the blood absorbs in the green channel spectrum, therefore; as blood volume changes over time with the cardiac cycle, these small color changes signs can be measured from a regular webcam by observing intensity changes in the green spectrum. Careful spectral analysis of these signals can allow to distinguish between a live face and a fake. [poster] [pdf]

Engagement Measurement During Online Learning

Students’ engagement is a key element in successful learning; however, it is common for students to subconsciously lose focus during reading. Because losing focus happens subconsciously and involuntarily, it is difficult to accurately determine when the person stopped paying attention and for how long. Using computer vision and machine learning algorithms we can extract changes in eye and physiological parameters from a webcam recording. Previous studies have shown that changes in parameters, such as blinking rate, gaze, pupil dilation, heart rate, and skin conductance, may be linked to a person’s engagement level. Currently, researchers rely on participants self-reporting their engagement levels or require expensive equipment, such as professional eye trackers, making these studies not scalable to everyday home use. I worked with Open Stax on developing a system that would automatically measure a person’s engagement level during online learning.

FLASH – Family Level Assessment Of Screen Usage In The Home

flashfigure

In collaboration with the Children’s Nutrition Research Center at Baylor College of Medicine I worked on developing a system to accurately and unobtrusively measure the screen usage in the home ranging from TV to handheld devices, such as tablets. This system will be used to understand how watching TV or other forms of screen usage can increase the likelihood of a child becoming overweight in the future. Through face recognition, head pose estimation, as well as gaze estimation on videos from stationary cameras and mobile devices, we were able to infer if a person is looking at the screen or not and who was watching at a given time. The videos captured on mobile devices were especially challenging to analyze due to large motion and blur, so we are currently working on analyzing motion sensors data in addition to video from mobile devices that will improve the estimate.