Image for post
Image for post
Cristyan Rufino Gil Morales

Drowsy Detection Using Facial Landmarks Extraction and Deep Neural Networks

How can you detect a drowsy person using facial landmarks as an input of a neural network?

Introduction

The idea is to extract a group of frames from a webcam and then extract from them the facial landmarks, specifically the position of both eyes, then pass these coordinates to the neural model to get a final classification which will tell us if the user is awake, or falling sleep.

Methodology

Recent works have shown that activity recognition can be achieved with 3D convolutional neural networks or Conv3D because of the capacity of analyzing not a single frame but a group of them, this group of frames are a short video where the activity is contained.

Considering drowsiness as an activity that can be contained in a video, it makes sense to use Conv3D to try to predict drowsiness.

The first step is to extract a frame from a camera, in our case a webcam. Once we have the frame, we use a python library called dlib where a facial landmark detector is included; the result is a collection of x, y coordinates which indicate where the facial landmarks are.

Image for post
Image for post
Figure 1: Facial Landmarks

Even when we get a collection of points, we are only interested in the position of the eyes, so we are going to keep only the twelve points that belong to the eyes.

Image for post
Image for post
Figure 2: Region of interest of the facial landmarks

Now we have the facial landmarks of a single frame. Nevertheless, we want to give our system the sense of the sequence, and to do so, we are not considering single frames to make our final prediction, we take a group of them.

We consider that analyzing one second of video at a time is enough to make good drowsiness predictions. Hence, we keep ten facial landmarks detection, which is equivalent of one second of video; then, we concatenate them in one single pattern that is an array with the shape (10, 12, 12); 10 frames with 12 points in x-coordinates and 12 points in y-coordinates. This array is the input of our Conv3D model to get the final classification.

Image for post
Image for post
Figure 3: Neural Model

The first hidden layer of our model is a 3D convolutional layer, followed by a max pooling layer and a flatten layer, which results in a vector of eight hundred neurons. The next layer is a dense layer of teen units, with ReLu activation function. The last layer of the model y is composed of two neurons where the activation function is a softmax function, composed of two neurons, one per class.

Architecture

Once we have a group of eyes’ points (x, y coordinates) we pass them to our neural model to get a classification, whose result can be [1, 0] which represents “awake”, or [0, 1] that represents “drowsy”. In other words, we are analyzing the streaming of the webcam in small chunks to get a prediction of drowsiness each second.

Image for post
Image for post

Implementation

  • The system was implemented on python 3.5
  • The extraction of frames from the webcam was achieved using OpenCV for python
  • The facial landmarks were extracted with the library dlib
  • The model was constructed using keras
  • The front-end was deployed with the help of flask.
Image for post
Image for post

Results

The final result of this work is a front-end where the user’s webcam is shown. The streaming is analyzed every one second, and the prediction “drowsy” or “awake” is shown under the video. As an additional feature in our final example, if the user is detected as “drowsy,” the system fires a sound alarm.

Image for post
Image for post
Figura 5: Resultado da Parte Frontal

For further experiments, the solution proposed here can easily be extended to work on a smartphone or even in an embedded system running an Ubuntu distribution such as the well-known raspberry pi.

Exponential intelligence for exponential companies

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store