Use face tracking to analyze videos and draw boxes

Eugene Musebe

Introduction

Google's video intelligence API is a powerful tool for video analysis. There are many different ways that we can leverage its power to deliver more productivity. In this article, we will be exploring its Face Detection feature. We will be analyzing videos stored on Cloudinary using the Google video intelligence API and drawing bounding boxes over detected faces.

TL;DR

Here's a glimpse of what we will be going through in this article.

  1. Log into Cloudinary and obtain API credentials needed to access their API

  2. Log in to the Google Cloud Platform and obtain credentials that will allow us to make calls to the API.

  3. Upload videos to Cloudinary storage.

  4. Analyze uploaded video using Google's video intelligence API.

  5. Show the video on a webpage.

  6. Extract annotations from the Google video intelligence analysis.

  7. Use HTML Canvas API to draw bounding boxes over annotated faces.

The final project can be viewed on Codesandbox.

Codesandbox

You can find the full source code on my Github repository.

Getting Started

Prerequisites

This article assumes that you have already installed Node.js and NPM on your development environment and also have a code editor ready. Check out the official Node.js website if you have not yet installed Node. You will also need a video to analyze. This can be any video with people's faces in it. For the sake of this article, I have included one here

Obtaining Cloudinary API Keys.

Cloudinary is an amazing service that offers a wide range of solutions for dealing with Media storage and Optimization. You can get started with their API for free immediately.

Head over to Cloudinary and sign in or sign up for a free account. Navigate to your cloudinary console and take note of your Cloud name API Key and API Secret. These will be displayed on the top left of the console page.

Creating a Google Cloud Platform project and obtaining credentials

Google has a wide range of APIs. These are all part of the GCP and you will need a GCP project to access any one of them. Let's see how we can do that now. You might find this a bit overwhelming, especially if it's your first time. Don't be hesitant though, I will walk you through it.

There's a brief explanation on how to get started on the quickstart guide page. Create an account if you do not already have one then navigate to the project selector page. You will need to select an existing project or create a new one. The next step is to ensure that billing is activated for the selected project. Don't worry though, you won't be charged for accessing the APIs immediately. Most of the APIs, including the Video Intelligence one, have a free tier with a monthly limit. As long as you're within these limits you will not be charged. Even though it is highly unlikely to exceed those limits on a development environment, you should use the API sparingly. Confirm that billing is enabled.

Our project is now ready to use. We need to enable the APIs that we will be using. In our case, this is the Video Intelligence API. Once that is enabled, proceed to create a service account. This is what we will use to authenticate our application with the API. Make your way to the Create a new service account page and select the project you created earlier.

You will be required to provide a sensible name for your service account. Give it an appropriate name. For this article, I used the name face-tracking-with-next-js.

Leave the other defaults and go-ahead to create the service account. When you navigate back to the service accounts dashboard, your new service account will now be listed. Under the more actions button, click on Manage keys.

Click on Add key and then on Create new key

In the pop-up dialog, make sure to choose the JSON option.

Once you're done, a .json file will be downloaded to your computer. Take note of this file's location as we will be using it later.

The fun part - Implementation

This article assumes you are familiar with Next js basics. Let's create a new project. Open your terminal and run the following command in your desired project folder

1npx create-next-app

This will scaffold a basic Next.js project for you. For more advanced installation options, please take a look at the official docs. Give your project an appropriate name like face-tracking-with-next-js. Finally, you can change the directory into your new project and we can begin the coding.

1cd face-tracking-with-next-js

Upload video to Cloudinary storage

Let's install a few dependencies first. We will need the cloudinary NPM package.

1npm install --save cloudinary

Create a folder at the root of your project and name it lib. Inside this folder create a new file named cloudinary.js. Inside this file, we will initialize the Cloudinary SDK and create a function to upload videos. Paste the following inside lib/cloudinary.js

1// lib/cloudinary.js
2
3
4
5// Import the v2 api and rename it to cloudinary
6
7import { v2 as cloudinary } from "cloudinary";
8
9
10
11// Initialize the SDK with cloud_name, api_key, and api_secret
12
13cloudinary.config({
14
15cloud_name: process.env.CLOUD_NAME,
16
17api_key: process.env.API_KEY,
18
19api_secret: process.env.API_SECRET,
20
21});
22
23
24
25export const handleCloudinaryUpload = (path) => {
26
27// Create and return a new Promise
28
29return new Promise((resolve, reject) => {
30
31// Use the SDK to upload media
32
33cloudinary.uploader.upload(
34
35path,
36
37{
38
39// Folder to store video in
40
41folder: "videos/",
42
43// Type of resource
44
45resource_type: "video",
46
47},
48
49(error, result) => {
50
51if (error) {
52
53// Reject the promise with an error if any
54
55return reject(error);
56
57}
58
59
60
61// Resolve the promise with a successful result
62
63return resolve(result);
64
65}
66
67);
68
69});
70
71};

At the top of the file, we import the v2 API from the package we just installed and we rename it to cloudinary. This is just for readability and you can leave it as v2. Just remember to change it when we use it in our code. We then proceed to initialize the SDK by calling the config method and passing to it the cloud_name, api_key, and api_secret. We also have a handleCloudinaryUpload function that will handle the uploads for us. We pass in a path to the file that we want to upload and then call the upload method on the SDK and pass in the path and a few options. Read more about the upload media api and options you can pass from the official documentation. We then resolve or reject a promise depending on the success or failure of the upload.

One thing you might notice is that we've assigned our cloud_name, api_key, and api_secret to environment variables that we have not created yet. Let us create those now.

Create a new file called .env.local at the root of your project. Paste the following inside this file.

1CLOUD_NAME=YOUR_CLOUD_NAME
2
3API_KEY=YOUR_API_KEY
4
5API_SECRET=YOUR_API_SECRET

Make sure to replace YOUR_CLOUD_NAME YOUR_API_KEY and YOUR_API_SECRET with the appropriate values that we got from the Obtaining Cloudinary API Keys section above.

Read more about Environment Variables with Next.js here

Now that we have that in place, let's create an API route where we can post our video and upload it to cloudinary. Read more about Next js api routes on the official docs if you're not familiar with this.

Create a new file called videos.js inside the pages/api/ folder. Paste the following code inside the file :

1// pages/api/videos.js
2
3
4
5// Next.js API route support: https://nextjs.org/docs/api-routes/introduction
6
7import { promises as fs } from "fs";
8
9import { annotateVideoWithLabels } from "../../lib/google";
10
11import { handleCloudinaryUpload } from "../../lib/cloudinary";
12
13
14
15const videosController = async (req, res) => {
16
17// Check the incoming HTTP method. Handle the POST request method and reject the rest.
18
19switch (req.method) {
20
21// Handle the POST request method
22
23case "POST": {
24
25try {
26
27const result = await handlePostRequest();
28
29
30
31// Respond to the request with a status code 201(Created)
32
33return res.status(201).json({
34
35message: "Success",
36
37result,
38
39});
40
41} catch (error) {
42
43// In case of an error, respond to the request with a status code 400(Bad Request)
44
45return res.status(400).json({
46
47message: "Error",
48
49error,
50
51});
52
53}
54
55}
56
57// Reject other HTTP methods with a status code 405
58
59default: {
60
61return res.status(405).json({ message: "Method Not Allowed" });
62
63}
64
65}
66
67};
68
69
70
71const handlePostRequest = async () => {
72
73// Path to the file you want to upload
74
75const pathToFile = "public/videos/people.mp4";
76
77
78
79// Upload your file to cloudinary
80
81const uploadResult = await handleCloudinaryUpload(pathToFile);
82
83
84
85// Read the file using fs. This results in a Buffer
86
87const file = await fs.readFile(pathToFile);
88
89
90
91// Convert the file to a base64 string in preparation for analyzing the video with google's video intelligence api
92
93const inputContent = file.toString("base64");
94
95
96
97// Analyze the video using Google's video intelligence api
98
99const annotations = await annotateVideoWithLabels(inputContent);
100
101
102
103// Return an object with the cloudinary upload result and the video analysis result
104
105return { uploadResult, annotations };
106
107};
108
109
110
111export default videosController;

Let's go over this. We're defining a function that will handle the HTTP requests. This is all covered in the official docs. Inside this method, we're switching the request methods and assigning the POST request a handler method called handlePostRequest. We're then returning a failure response for all other methods since we will not be using them for this tutorial.

In the handlePostRequest function, we define a path to the file we want to upload. For the brevity and simplicity of this tutorial, we're having a static path to a locally stored file. This is the path stored under the variable pathToFile. You can change this path to point to your video. In the real world, you would want to have the user select a video on their computer and post it to the backend, then upload that file to cloudinary.

We then delegate the actual upload to cloudinary to the handleCloudinaryUpload function that we created earlier inside the lib/cloudinary.js file. Once the upload is complete, we want to analyze that same video using Google Video Intelligence. We delegate that to a function called annotateVideoWithLabels that we haven't created yet. We will do that shortly. We then return the upload result and also the video analysis result. What we're missing at this point is the annotateVideoWithLabels function. Let's create that.

Analyze uploaded video using Google video intelligence

First things first, we need to install the needed dependency, The video intelligence NPM package

1npm install --save @google-cloud/video-intelligence

Once we have that we need to define a few environment variables that we can reference in our code. Open the .env.local file that we created earlier and add the below the existing variables.

1GCP_PROJECT_ID=YOUR_GCP_PROJECT_ID
2
3GCP_PRIVATE_KEY=YOUR_GCP_PRIVATE_KEY
4
5GCP_CLIENT_EMAIL=YOUR_GCP_CLIENT_EMAIL

Open the .json file we downloaded in the Creating a Google Cloud Platform project and obtaining credentials section in a text editor. Inside the credentials.json file, you will find the appropriate values for project id, private key, and client email. Replace YOUR_GCP_PROJECT_ID,YOUR_GCP_PRIVATE_KEY and YOUR_GCP_CLIENT_EMAIL with the appropriate values from the .json file.

Careful with that file. Do not commit it into version control since anyone with this file can be able to access your GCP project.

Next, create a file called google.js under the lib/ folder. Paste the following code inside this file.

1// lib/google.js
2
3
4
5import {
6
7VideoIntelligenceServiceClient,
8
9protos,
10
11} from "@google-cloud/video-intelligence";
12
13
14
15const client = new VideoIntelligenceServiceClient({
16
17// Google cloud platform project id
18
19projectId: process.env.GCP_PROJECT_ID,
20
21credentials: {
22
23client_email: process.env.GCP_CLIENT_EMAIL,
24
25private_key: process.env.GCP_PRIVATE_KEY.replace(/\\n/gm, "\n"),
26
27},
28
29});
30
31
32
33/**
34
35*
36
37* @param {string | Uint8Array} inputContent
38
39* @returns {Promise<protos.google.cloud.videointelligence.v1.VideoAnnotationResults>}
40
41*/
42
43export const annotateVideoWithLabels = async (inputContent) => {
44
45// Grab the operation using array destructuring. The operation is the first object in the array.
46
47const [operation] = await client.annotateVideo({
48
49// Input content
50
51inputContent: inputContent,
52
53// Video Intelligence features
54
55features: ["FACE_DETECTION"],
56
57// Options for context of the video being analyzed
58
59videoContext: {
60
61// Options for the label detection feature
62
63faceDetectionConfig: {
64
65includeBoundingBoxes: true,
66
67includeAttributes: true,
68
69},
70
71},
72
73});
74
75
76
77const [operationResult] = await operation.promise();
78
79
80
81// Gets annotations for video
82
83const [annotations] = operationResult.annotationResults;
84
85
86
87return annotations;
88
89};

We first import VideoIntelligenceServiceClient from the package we just installed and create a new client. The client takes in the project id and a credentials object containing the client's email and private key. There are many different ways of authenticating Google APIs. Have a read in the official documentation. To learn more about the method that we have used above take a look at these github docs. The reason we use this approach is to avoid having to ship our application with the sensitive .json file we downloaded.

After we have a client ready we proceed to define the annotateVideoWithLabels function, which will handle the video analysis. We pass a string or a buffer array to the function and then call the client's annotateVideo method with a few options. The official documentation contains more information on this. Allow me to touch on a few of the options.

  • inputContent - This is a base64 string or buffer array of your video file. If your video is hosted on Google cloud storage, you'll want to use the inputUri field instead. Unfortunately, only Google cloud storage URLs are supported. Otherwise, you will have to use the inputContent.

  • features - This is an array of the Video intelligence features that should be run on the video. Read more in the documentation. For this tutorial, we only need the FACE_DETECTION feature which identifies people's faces.

  • videoContext.faceDetectionConfig.includeBoundingBoxes - This instructs the analyzer to include bounding boxes/coordinates for where the faces are located in the frame

  • videoContext.faceDetectionConfig.includeAttributes - This instructs the analyzer to include facial attributes e.g. smiling, wearing glasses, e.t.c. We won't really be using this for the tutorial but it's still a nice addition to have.

We wait for the operation to complete by calling promise() on the operation and awaiting for the Promise to complete. We then get the operation result using Javascript's destructuring. To understand the structure of the resulting data, take a look at the official documentation. The structure might look like so :

1{
2
3annotationResults: [
4
5{
6
7segment: {
8
9startTimeOffset: {},
10
11endTimeOffset: {
12
13seconds: string,
14
15nanos: number,
16
17},
18
19},
20
21faceDetectionAnnotations: [
22
23{
24
25tracks: [
26
27{
28
29segment: {
30
31startTimeOffset: {
32
33seconds: string,
34
35nanos: number,
36
37},
38
39endTimeOffset: {
40
41seconds: string,
42
43nanos: number,
44
45},
46
47},
48
49timestampedObjects: [
50
51{
52
53normalizedBoundingBox: {
54
55top: number,
56
57right: number,
58
59bottom: number,
60
61},
62
63timeOffset: {
64
65seconds: string,
66
67nanos: number,
68
69},
70
71attributes: [
72
73{
74
75name: string,
76
77confidence: number,
78
79},
80
81],
82
83},
84
85],
86
87attributes: [
88
89{
90
91name: string,
92
93confidence: number,
94
95},
96
97],
98
99confidence: number,
100
101},
102
103],
104
105thumbnail: string,
106
107version: string,
108
109},
110
111],
112
113},
114
115],
116
117}

We'll only need the first item in the array. Again we use Javascript's ES6 destructuring to get the first element in the array. With this, we have all we need to upload the video to cloudinary and analyze it for faces. Now we need to show the video on the client side.

Render the video on a webpage

Open pages/index.js and replace it with the following code.

1import Head from "next/head";
2
3import Image from "next/image";
4
5import { useRef, useState, MutableRefObject } from "react";
6
7
8
9export default function Home() {
10
11/**
12
13* This stores a reference to the video HTML Element
14
15* @type {MutableRefObject<HTMLVideoElement>}
16
17*/
18
19const playerRef = useRef(null);
20
21
22
23/**
24
25* This stores a reference to the HTML Canvas
26
27* @type {MutableRefObject<HTMLCanvasElement>}
28
29*/
30
31const canvasRef = useRef(null);
32
33
34
35const [video, setVideo] = useState();
36
37
38
39const [loading, setLoading] = useState(false);
40
41
42
43const handleUploadVideo = async () => {
44
45try {
46
47// Set loading to true
48
49setLoading(true);
50
51
52
53// Make a POST request to the `api/videos/` endpoint
54
55const response = await fetch("/api/videos", {
56
57method: "post",
58
59});
60
61
62
63const data = await response.json();
64
65
66
67// Check if the response is successful
68
69if (response.status >= 200 && response.status < 300) {
70
71const result = data.result;
72
73
74
75// Update our videos state with the results
76
77setVideo(result);
78
79} else {
80
81throw data;
82
83}
84
85} catch (error) {
86
87// TODO: Handle error
88
89console.error(error);
90
91} finally {
92
93setLoading(false);
94
95// Set loading to true once a response is available
96
97}
98
99};
100
101
102
103return [
104
105<div key="main div">
106
107<Head>
108
109<title>Face Tracking Using Google Video Intelligence</title>
110
111<meta
112
113name="description"
114
115content="Face Tracking Using Google Video Intelligence"
116
117/>
118
119<link rel="icon" href="/favicon.ico" />
120
121</Head>
122
123
124
125<header>
126
127<h1>Face Tracking Using Google Video Intelligence</h1>
128
129</header>
130
131<main className="container">
132
133<div className="wrapper">
134
135<div className="actions">
136
137<button onClick={handleUploadVideo} disabled={loading}>
138
139Upload
140
141</button>
142
143</div>
144
145<hr />
146
147{loading
148
149? [
150
151<div className="loading" key="loading div">
152
153Please be patient as the video uploads...
154
155</div>,
156
157<hr key="loading div break" />,
158
159]
160
161: null}
162
163
164
165{video ? (
166
167<div className="video-wrapper">
168
169<div className="video-container">
170
171<video
172
173width={1000}
174
175height={500}
176
177src={video.uploadResult.secure_url}
178
179ref={playerRef}
180
181onTimeUpdate={onTimeUpdate}
182
183></video>
184
185<canvas ref={canvasRef} height={500} width={1000}></canvas>
186
187<div className="controls">
188
189<button
190
191onClick={() => {
192
193playerRef.current.play();
194
195}}
196
197>
198
199Play
200
201</button>
202
203<button
204
205onClick={() => {
206
207playerRef.current.pause();
208
209}}
210
211>
212
213Pause
214
215</button>
216
217</div>
218
219</div>
220
221
222
223<div className="thumbnails-wrapper">
224
225Thumbnails
226
227<div className="thumbnails">
228
229{video.annotations.faceDetectionAnnotations.map(
230
231(annotation, annotationIndex) => {
232
233return (
234
235<div
236
237className="thumbnail"
238
239key={`annotation${annotationIndex}`}
240
241>
242
243<Image
244
245className="thumbnail-image"
246
247src={`data:image/jpg;base64,${annotation.thumbnail}`}
248
249alt="Thumbnail"
250
251layout="fill"
252
253></Image>
254
255</div>
256
257);
258
259}
260
261)}
262
263</div>
264
265</div>
266
267</div>
268
269) : (
270
271<div className="no-videos">
272
273No video yet. Get started by clicking on upload above
274
275</div>
276
277)}
278
279</div>
280
281</main>
282
283</div>,
284
285];
286
287}

This is just a basic React component but let's go over what's happening. Inside our component, we use useRef hooks to store a reference to our HTML video element and another to a Canvas element, playerRef and canvasRef respectively. Next, we have useState hooks that store the state for our video upload result and another for the loading/uploading state. We then define a handleUploadVideo function that will post the video to the /api/videos endpoint so that it can be uploaded to cloudinary and analyzed. Remember that we're only using a locally stored file. In a real-world application, you would instead want to handle the form submission and post the selected video file. Moving on to the HTML, we have an upload button that will trigger the handleUploadVideo, a div that only shows when the loading state is true, and a div that will hold our video and canvas elements when the video state contains data. Make sure to give the video and canvas elements width and height otherwise you might run into some weird issues with the canvas API. The canvas element is placed above the video element on the Z plane. This means that it also covers the native video controls and we won't be able to play or pause the video. We get around this by adding our own play and pause buttons above the Canvas. Finally, we have a div that shows thumbnails of all the detected faces.

Extract annotations/detected faces and draw bounding boxes using Canvas

We have our video and analysis results ready. Now, all we need to do is figure out where the faces are and draw boxes over them. How do we do that? The HTML video element has an ontimeupdate event that we can listen to. Using this, whenever the video's current time is updated, we iterate over our detected faces and check to see if the time on the detected face matches the current time of the video. If it does, we get the bounding box coordinates and use them to draw on the canvas. There are a couple of different ways you could do this to improve performance. One that comes to mind, you could iterate over detected faces and then listen to the ontimeupdate event and draw for each face instead of drawing for all faces at once. Let's go with the former.

You will notice that in the HTML that we wrote in the pages/index.js, our video element has an onTimeUpdate event listener but we haven't yet defined the handler.

1{/*...*/}
2
3<video
4
5// ...
6
7onTimeUpdate={onTimeUpdate}
8
9></video>
10
11{/*...*/}

Let's do that now. Add the following function to our Home component in pages/index.js just above the handleUploadVide function.

1const onTimeUpdate = (ev) => {
2
3// Video element scroll height and scroll width. We use the scroll height and width instead of the video height and width because we want to ensure the dimensions match the canvas elements.
4
5const videoHeight = playerRef.current.scrollHeight;
6
7const videoWidth = playerRef.current.scrollWidth;
8
9
10
11// Get the 2d canvas context
12
13const ctx = canvasRef.current.getContext("2d");
14
15
16
17// Whenever the video time updates make sure to clear any drawings on the canvas
18
19ctx.clearRect(0, 0, videoWidth, videoHeight);
20
21
22
23// The video's current time
24
25const currentTime = playerRef.current.currentTime;
26
27
28
29// Iterate over detected faces
30
31for (const annotation of video.annotations.faceDetectionAnnotations) {
32
33// Each detected face may have different tracks
34
35for (const track of annotation.tracks) {
36
37// Get the timestamps for all bounding boxes
38
39for (const face of track.timestampedObjects) {
40
41// Get the timestamp in seconds
42
43const timestamp =
44
45parseInt(face.timeOffset.seconds ?? 0) +
46
47(face.timeOffset.nanos ?? 0) / 1000000000;
48
49
50
51// Check if the timestamp and video's current time match. We convert them to fixed-point notations of 1 decimal place
52
53if (timestamp.toFixed(1) == currentTime.toFixed(1)) {
54
55// Get the x coordinate of the origin of the bounding box
56
57const x = (face.normalizedBoundingBox.left || 0) * videoWidth;
58
59
60
61// Get the y coordinate of the origin of the bounding box
62
63const y = (face.normalizedBoundingBox.top || 0) * videoHeight;
64
65
66
67// Get the width of the bounding box
68
69const width =
70
71((face.normalizedBoundingBox.right || 0) -
72
73(face.normalizedBoundingBox.left || 0)) *
74
75videoWidth;
76
77
78
79// Get the height of the bounding box
80
81const height =
82
83((face.normalizedBoundingBox.bottom || 0) -
84
85(face.normalizedBoundingBox.top || 0)) *
86
87videoHeight;
88
89
90
91ctx.lineWidth = 4;
92
93ctx.strokeStyle = "#800080";
94
95ctx.strokeRect(x, y, width, height);
96
97}
98
99}
100
101}
102
103}
104
105};

Let's go over that. This function will run every time the video's time updates. We first get the video element's height and width. Remember we're getting the element's width and height and not the actual videos. This is important because we need these dimensions to match those of the canvas that is placed above the video element. Next, we get the 2d context of the canvas. This is where the drawing will happen. We also want to clear the context to prepare it for new drawings and finally get the current time of the video. Once we have all these, we iterate over the detected faces, their tracks, and then the timestamped objects for each face. The timestamped object contains the time offset where the face is in relation to the video time and also bounding boxes with left, right, top, bottom coordinates in relation to the video's x and y planes. Read about this here. A timestamp is an object with seconds and nanos fields. We first need to convert the nanoseconds to seconds and add up the total time in seconds. Next, we convert the timestamp(in seconds) and the current video time to fixed-point notations of 1 decimal place and check if they match. If they match, this means that the face is in the current frame of the video and we should draw a bounding box over it. We convert them to 1 decimal place so that we can get a rough estimate instead of a precise point in time. Depending on the frame rate of the video, if we use a precise point in time, the timestamp and the video's time may never match.

Once we know that a face is in the frame, we get the x and y coordinates of the face, and also the width and height of the face's bounding box. These values are in relation to the video's width and height, so we multiply with the video width for the left and right values and the video height for the top and bottom values. See this for more info. We finally proceed to set line width and a stroke style and draw a rectangle on the canvas using the coordinates/dimensions we just got.

And there we have it. Here's the full code for pages/index.js

1// pages/index.jsx
2
3
4
5import Head from "next/head";
6
7import Image from "next/image";
8
9import { useRef, useState, MutableRefObject } from "react";
10
11
12
13export default function Home() {
14
15/**
16
17* @type {MutableRefObject<HTMLVideoElement>}
18
19*/
20
21const playerRef = useRef(null);
22
23
24
25/**
26
27* @type {MutableRefObject<HTMLCanvasElement>}
28
29*/
30
31const canvasRef = useRef(null);
32
33
34
35const [video, setVideo] = useState();
36
37
38
39const [loading, setLoading] = useState(false);
40
41
42
43const onTimeUpdate = (ev) => {
44
45// Video element scroll height and scroll width. We use the scroll height and width instead of the video height and width because we want to ensure the dimensions match the canvas elements.
46
47const videoHeight = playerRef.current.scrollHeight;
48
49const videoWidth = playerRef.current.scrollWidth;
50
51
52
53// Get the 2d canvas context
54
55const ctx = canvasRef.current.getContext("2d");
56
57
58
59// Whenever the video time updates make sure to clear any drawings on the canvas
60
61ctx.clearRect(0, 0, videoWidth, videoHeight);
62
63
64
65// The video's current time
66
67const currentTime = playerRef.current.currentTime;
68
69
70
71// Iterate over detected faces
72
73for (const annotation of video.annotations.faceDetectionAnnotations) {
74
75// Each detected face may have different tracks
76
77for (const track of annotation.tracks) {
78
79// Get the timestamps for all bounding boxes
80
81for (const face of track.timestampedObjects) {
82
83// Get the timestamp in seconds
84
85const timestamp =
86
87parseInt(face.timeOffset.seconds ?? 0) +
88
89(face.timeOffset.nanos ?? 0) / 1000000000;
90
91
92
93// Check if the timestamp and video's current time match. We convert them to fixed-point notations of 1 decimal place
94
95if (timestamp.toFixed(1) == currentTime.toFixed(1)) {
96
97// Get the x coordinate of the origin of the bounding box
98
99const x = (face.normalizedBoundingBox.left || 0) * videoWidth;
100
101
102
103// Get the y coordinate of the origin of the bounding box
104
105const y = (face.normalizedBoundingBox.top || 0) * videoHeight;
106
107
108
109// Get the width of the bounding box
110
111const width =
112
113((face.normalizedBoundingBox.right || 0) -
114
115(face.normalizedBoundingBox.left || 0)) *
116
117videoWidth;
118
119
120
121// Get the height of the bounding box
122
123const height =
124
125((face.normalizedBoundingBox.bottom || 0) -
126
127(face.normalizedBoundingBox.top || 0)) *
128
129videoHeight;
130
131
132
133ctx.lineWidth = 4;
134
135ctx.strokeStyle = "#800080";
136
137ctx.strokeRect(x, y, width, height);
138
139}
140
141}
142
143}
144
145}
146
147};
148
149
150
151const handleUploadVideo = async () => {
152
153try {
154
155// Set loading to true
156
157setLoading(true);
158
159
160
161// Make a POST request to the `api/videos/` endpoint
162
163const response = await fetch("/api/videos", {
164
165method: "post",
166
167});
168
169
170
171const data = await response.json();
172
173
174
175// Check if the response is successful
176
177if (response.status >= 200 && response.status < 300) {
178
179const result = data.result;
180
181
182
183// Update our videos state with the results
184
185setVideo(result);
186
187} else {
188
189throw data;
190
191}
192
193} catch (error) {
194
195// TODO: Handle error
196
197console.error(error);
198
199} finally {
200
201setLoading(false);
202
203// Set loading to true once a response is available
204
205}
206
207};
208
209
210
211return [
212
213<div key="main div">
214
215<Head>
216
217<title>Face Tracking Using Google Video Intelligence</title>
218
219<meta
220
221name="description"
222
223content="Face Tracking Using Google Video Intelligence"
224
225/>
226
227<link rel="icon" href="/favicon.ico" />
228
229</Head>
230
231
232
233<header>
234
235<h1>Face Tracking Using Google Video Intelligence</h1>
236
237</header>
238
239<main className="container">
240
241<div className="wrapper">
242
243<div className="actions">
244
245<button onClick={handleUploadVideo} disabled={loading}>
246
247Upload
248
249</button>
250
251</div>
252
253<hr />
254
255{loading
256
257? [
258
259<div className="loading" key="loading div">
260
261Please be patient as the video uploads...
262
263</div>,
264
265<hr key="loading div break" />,
266
267]
268
269: null}
270
271
272
273{video ? (
274
275<div className="video-wrapper">
276
277<div className="video-container">
278
279<video
280
281width={1000}
282
283height={500}
284
285src={video.uploadResult.secure_url}
286
287ref={playerRef}
288
289onTimeUpdate={onTimeUpdate}
290
291></video>
292
293<canvas ref={canvasRef} height={500} width={1000}></canvas>
294
295<div className="controls">
296
297<button
298
299onClick={() => {
300
301playerRef.current.play();
302
303}}
304
305>
306
307Play
308
309</button>
310
311<button
312
313onClick={() => {
314
315playerRef.current.pause();
316
317}}
318
319>
320
321Pause
322
323</button>
324
325</div>
326
327</div>
328
329
330
331<div className="thumbnails-wrapper">
332
333Thumbnails
334
335<div className="thumbnails">
336
337{video.annotations.faceDetectionAnnotations.map(
338
339(annotation, annotationIndex) => {
340
341return (
342
343<div
344
345className="thumbnail"
346
347key={`annotation${annotationIndex}`}
348
349>
350
351<Image
352
353className="thumbnail-image"
354
355src={`data:image/jpg;base64,${annotation.thumbnail}`}
356
357alt="Thumbnail"
358
359layout="fill"
360
361></Image>
362
363</div>
364
365);
366
367}
368
369)}
370
371</div>
372
373</div>
374
375</div>
376
377) : (
378
379<div className="no-videos">
380
381No video yet. Get started by clicking on upload above
382
383</div>
384
385)}
386
387</div>
388
389</main>
390
391</div>,
392
393<style key="style tag" jsx="true">{`
394
395* {
396
397box-sizing: border-box;
398
399}
400
401
402
403header {
404
405height: 100px;
406
407background-color: purple;
408
409display: flex;
410
411justify-content: center;
412
413align-items: center;
414
415}
416
417
418
419header h1 {
420
421padding: 0;
422
423margin: 0;
424
425color: white;
426
427}
428
429
430
431.container {
432
433min-height: 100vh;
434
435background-color: white;
436
437}
438
439
440
441.container .wrapper {
442
443max-width: 1000px;
444
445margin: 0 auto;
446
447}
448
449
450
451.container .wrapper .actions {
452
453display: flex;
454
455justify-content: center;
456
457align-items: center;
458
459}
460
461
462
463.container .wrapper .actions button {
464
465margin: 10px;
466
467padding: 20px 40px;
468
469width: 80%;
470
471font-weight: bold;
472
473border: none;
474
475border-radius: 2px;
476
477}
478
479
480
481.container .wrapper .actions button:hover {
482
483background-color: purple;
484
485color: white;
486
487}
488
489
490
491.container .wrapper .video-wrapper {
492
493display: flex;
494
495flex-flow: column;
496
497}
498
499
500
501.container .wrapper .video-wrapper .video-container {
502
503position: relative;
504
505width: 100%;
506
507height: 500px;
508
509background: red;
510
511}
512
513
514
515.container .wrapper .video-wrapper .video-container video {
516
517position: absolute;
518
519object-fit: cover;
520
521}
522
523
524
525.container .wrapper .video-wrapper .video-container canvas {
526
527position: absolute;
528
529z-index: 1;
530
531}
532
533
534
535.container .wrapper .video-wrapper .video-container .controls {
536
537left: 0;
538
539bottom: 0;
540
541position: absolute;
542
543z-index: 1;
544
545background-color: #ffffff5b;
546
547width: 100%;
548
549height: 40px;
550
551display: flex;
552
553justify-items: center;
554
555align-items: center;
556
557}
558
559
560
561.container .wrapper .video-wrapper .video-container .controls button {
562
563margin: 0 5px;
564
565}
566
567
568
569.container .wrapper .video-wrapper .thumbnails-wrapper {
570
571}
572
573
574
575.container .wrapper .video-wrapper .thumbnails-wrapper .thumbnails {
576
577display: flex;
578
579flex-flow: row wrap;
580
581}
582
583
584
585.container
586
587.wrapper
588
589.video-wrapper
590
591.thumbnails-wrapper
592
593.thumbnails
594
595.thumbnail {
596
597position: relative;
598
599flex: 0 0 20%;
600
601height: 200px;
602
603border: solid;
604
605}
606
607
608
609.container
610
611.wrapper
612
613.video-wrapper
614
615.thumbnails-wrapper
616
617.thumbnails
618
619.thumbnail
620
621.thumbnail-image {
622
623width: 100%;
624
625height: 100%;
626
627}
628
629
630
631.container .wrapper .no-videos,
632
633.container .wrapper .loading {
634
635display: flex;
636
637justify-content: center;
638
639align-items: center;
640
641}
642
643`}</style>,
644
645];
646
647}

At this point, you're ready to run your application. Open your terminal and run the following at the root of your project.

1npm run dev

Conclusion

At the moment, major technology companies, such as Apple, are very interested in and adopting facial recognition technology. AI startups are becoming unicorns as well. Without a doubt, facial recognition will play an increasingly important role in society in the coming years. Regardless of privacy concerns, facial recognition makes our streets, homes, banks, and shops safer—and more efficient.

Eugene Musebe

Software Developer

I’m a full-stack software developer, content creator, and tech community builder based in Nairobi, Kenya. I am addicted to learning new technologies and loves working with like-minded people.