Introduction
Businesses around the world are looking to drive customer engagement through automation and AI. Today, we will be looking at how we can use Google's video intelligence API to analyze videos and store the videos on Cloudinary. You can apply this to any use case. To keep it simple, we will be navigating real estate videos.
TL;DR
Have a quick overview of what will be covered in this tutorial.
- Obtain necessary credentials from Cloudinary
- Obtain necessary credentials from Google Cloud Platform
- Upload media to Cloudinary
- Analyze video using Google's video intelligence API
- Render the video on the client-side
- Extract video markers from video analysis results.
- Navigate video using extracted markers
To test the final product visit the codesandbox below :
The corresponding GitHub repository can be found here
Getting Started
Prerequisites
Installing Node.js and NPM
There are tons of tutorials on how to do this. You can check out the official Node.js website on installation and adding the path to your environment variables. You could also check out NVM, a version manager for node. If you are a power user and might be switching between node versions often, I would recommend the latter.
A code editor
You will need a code editor. Any code editor will do. Check out Visual Studio Code, which has great support for javascript and node.
Sample video
We're going to need a sample real estate video to work with. There are numerous sources for this type of video. One way would be to download a bunch of royalty-free images and then turn them into a video where each photo spans a couple of seconds. I used this approach on https://moviemakeronline.com and was able to quickly create a short video. Here's the link to the video if you'd like to reuse it.
Cloudinary account and API keys
Cloudinary provides a number of media solutions. These include programmable media, media optimization, dynamic asset management, and more
You will need some API credentials before making requests to Cloudinary. Luckily, you can get started with a free account immediately. Head over to Cloudinary and sign in or sign up for a free account. Once done with that, head over to your console. At the top left corner of your console, you'll notice your account details. Take note of your Cloud name
API Key
and API Secret
. We will need those later
Google Cloud Platform credentials
If you haven't worked with GCP before, this might be a bit intimidating, follow closely.
Navigate to the quickstart guide. You will first create an account if you do not have one. If you already have a google account you will just use that to authenticate.
You will then navigate to the project selector dashboard and select an existing project or create a new project. After you have selected or created a project, you need to ensure that billing is enabled for that project. Now, don't panic. Some of the GCP APIs are offered on a free tier with a monthly limit on how many times you can call the API. Therefore, they will need a billing account for them to bill if you exceed those limits. For development environments, you will almost never exceed those limits. Learn how to confirm that billing is enabled for your project.
After all, this is done, you can then proceed to enable the Video Intelligence API for your newly created project. The next step is to create a service account. You can think of this as API keys, but for your project's environment. Head over to the Create a service account page and select your project.
In the service account name field, input any sensible name. I named mine video-intelligence-nextjs
. This also automatically fills in out Service account ID.
Click on Create and Continue and then click on Done. And now on your service accounts dashboard, you'll see the newly created service account. Click on the More
actions button under actions and navigate to manage keys.
Click on Add key and choose Create new key
In the pop-up dialog, choose the JSON option.
This will download a .json
file. Rename this file as credentials.json
and note its location. We will use it later. Also, note that we named it credentials.json
so that I can easily refer to it later. You can give it any name you want.
The Implementation
Creating a new Next.js project
Let's go ahead and initialize a new project. You can check out different installation options on the official docs.
Open up your terminal/command line and navigate to your desired Project folder. Run the following command
1npx create-next-app
The terminal will ask for your project name. Give it any sensible name. I'm going to name mine google-video-intelligence
. The command installs a few react dependencies and scaffolds our project for us.
Change the directory into your newly created project and open the folder in your code editor.
1cd google-video-intelligence
Upload media to Cloudinary
The first step is to install the necessary dependencies. Run the following command in your terminal at the root of your project
1npm install --save cloudinary
Next, is to set up the cloudinary SDK and initialize it. At the root of your project create a new folder and name it lib
. Inside the lib
folder, create a new file and name it cloudinary.js
. Paste the following code inside.
1// lib/cloudinary.js23import { v2 as cloudinary } from "cloudinary";45cloudinary.config({6 cloud_name: process.env.CLOUD_NAME,7 api_key: process.env.API_KEY,8 api_secret: process.env.API_SECRET,9});1011export default cloudinary;
We first import the v2
API from the Cloudinary package that we just installed. We rename the v2
API as cloudinary
for better readability. Calling the config
method on the API will initialize it with the cloud_name
api_key
and api_secret
. Notice the use of environment variables to store the sensitive keys. We've referenced the keys as environment variables but, we have not defined them yet. Let's do that now.
At the root of your project, create a new file and name it .env.local
. Inside the file, paste the following data
1CLOUD_NAME=YOUR_CLOUD_NAME2API_KEY=YOUR_API_KEY3API_SECRET=YOUR_API_SECRET
Replace YOUR_CLOUD_NAME
YOUR_API_KEY
and YOUR_API_SECRET
with the appropriate values from the Prerequisites > Cloudinary account and API keys section.
We now have our API ready to use. Let's use it. We will be using Next.js api routes to handle the upload and analysis of the videos. Read more about api routes on the official docs. Navigate to pages/api
in your code editor and create a new file called videos.js
. This file will be the entry point for our /api/videos
endpoint. Paste the following piece of code inside.
1// pages/api/videos.js23export default async (req, res) => {4 // Check the incoming http method. Handle the POST request method and reject the rest.5 switch (req.method) {6 // Handle the POST request method7 case "POST": {8 try {9 const result = await handlePostRequest();1011 // Respond to the request with a status code 201(Created) and the result12 return res.status(201).json({13 message: "Success",14 result,15 });16 } catch (error) {17 // In case of an error, respond to the request with a status code 400(Bad Request)18 return res.status(400).json({19 message: "Error",20 error,21 });22 }23 }24 // Reject other http methods with a status code 40525 default: {26 return res.status(405).json({ message: "Method Not Allowed" });27 }28 }29};
You will quickly notice that we're missing the handlePostRequest
method. Let's create that now. Inside the same file add the following method.
1// pages/api/videos.js23const handlePostRequest = async () => {4 // Path to the file you want to upload5 const pathToFile = "public/videos/house.mp4";67 // Upload your file to cloudinary8 const uploadResult = await handleCloudinaryUpload(pathToFile);9};
We're defining the path to the file we want to upload and analyze. For the simplicity of this tutorial, we're just getting a locally stored file. Ideally, you would want to upload a file from the user's device and use that instead. Next, we delegate the upload to cloudinary to a function called handleCloudinaryUpload
. Let's create that in the same file, pages/api/videos.js
At the top of the file, import the cloudinary instance that we set up earlier
1// pages/api/videos.js2import cloudinary from "../../lib/cloudinary";3import { annotateVideoWithLabels } from "../../lib/google";
And Just below the handlePostRequest
function, add the following
1// pages/api/videos.js23const handleCloudinaryUpload = (path) => {4 // Create and return a new Promise5 return new Promise((resolve, reject) => {6 cloudinary.uploader.upload(7 path,8 {9 // Folder to store video in10 folder: "videos/",11 // Type of resource12 resource_type: "video",13 },14 (error, result) => {15 if (error) {16 // Reject the promise with an error if any17 return reject(error);18 }1920 // Resolve the promise with a successful result21 return resolve(result);22 }23 );24 });25};
With that, we have our upload code complete. Read more about the upload media api and options you can pass from the official documentation. We also imported a function called annotateVideoWithLabels
in preparation for the next section
Analyze video using Google video intelligence
Once done with our upload to cloudinary, we need to analyze the video.
Let's first install a dependency. We need the Node.js sdk. Run the following command in your terminal/command line
1npm install --save @google-cloud/video-intelligence
We are still inside the pages/api/videos.js
file. Update the handlePostRequest
function to the following.
1// pages/api/videos.js23const handlePostRequest = async () => {4 // Path to the file you want to upload5 const pathToFile = "public/videos/house.mp4";67 // Upload your file to cloudinary8 const uploadResult = await handleCloudinaryUpload(pathToFile);910 // Read the file using fs. This results in a Buffer11 const file = await fs.readFile(pathToFile);1213 // Convert the file to a base64 string in preparation of analysing the video with google's video intelligence api14 const inputContent = file.toString("base64");1516 // Analyze the video using Google's video intelligence api17 const annotations = await annotateVideoWithLabels(inputContent);1819 // Return an object with the cloudinary upload result and the video analysis result20 return { uploadResult, annotations };21};
After the cloudinary upload, we read our file into a buffer. We then convert that into a base64 string that we will pass to Google. And finally, delegate the analysis to a function called annotateVideoWithLabels
. Let's create this method now.
Inside our lib
folder, create a new file and name it google.js
. Paste the following code inside lib/google.js
1// lib/google.js23import {4 VideoIntelligenceServiceClient,5} from "@google-cloud/video-intelligence";67// Create a new Video intelligence service client8const client = new VideoIntelligenceServiceClient({9 // Google cloud platform project id10 projectId: process.env.GCP_PROJECT_ID,11 credentials: {12 client_email: process.env.GCP_CLIENT_EMAIL,13 private_key: process.env.GCP_PRIVATE_KEY.replace(/\\n/gm, "\n"),14 },15});1617/**18 *19 * @param {string | Uint8Array} inputContent20 * @returns21 */22export const annotateVideoWithLabels = async (inputContent) => {23 // Grab the operation using array destructuring. The operation is the first object in the array.24 const [operation] = await client.annotateVideo({25 // Input content26 inputContent: inputContent,27 // Video Intelligence features28 features: ["LABEL_DETECTION"],29 // Options for context of the video being analyzed30 videoContext: {31 // Options for the label detection feature32 labelDetectionConfig: {33 labelDetectionMode: "SHOT_AND_FRAME_MODE",34 stationaryCamera: true,35 frameConfidenceThreshold: 0.6,36 videoConfidenceThreshold: 0.6,37 },38 },39 });4041 // Grab the result using array destructuring. The result is the first object in the array.42 const [operationResult] = await operation.promise();4344 // Gets annotations for video. This is the first item in the annotationResults array45 const annotations = operationResult.annotationResults[0];4647 return annotations;48};
Let's go over this. At the top, we import the VideoIntelligenceServiceClient
from the SDK. We then proceed to initialize the client.
We've referenced some environment variables. Let's define those. Open the .env.local
file at the root of your project and add the following below the existing variables.
1GCP_PROJECT_ID=YOUR_GCP_PROJECT_ID2GCP_PRIVATE_KEY=YOUR_GCP_PRIVATE_KEY3GCP_CLIENT_EMAIL=YOUR_GCP_CLIENT_EMAIL
Let's go over where you can find your project id, private key, and client email. Remember the file we downloaded in the Prerequisites > Google Cloud Platform(GCP) credentials section. Open the file in a text editor. Inside the credentials.json
file, you will find the appropriate values. Replace YOUR_GCP_PROJECT_ID
,YOUR_GCP_PRIVATE_KEY
and YOUR_GCP_CLIENT_EMAIL
with the appropriate values from credentials.json
Make sure not to commit the json file into version control as it contains sensitive keys
There are many different ways of authenticating Google APIs. Check out the official documentation. The method that I have used here is a bit unorthodox but I chose to use it so that I can just use environment variables without needing to include the credentials.json
file. Read more about the method I used on these docs on github.
Next we have our annotateVideoWithLabels
function. The function takes in a string or a buffer array. Use the VideoIntelligenceServiceClient
's annotateVideo
method and pass in your input content and a few options. Have a look at the official documentation for more information. Let's just go over some of the options briefly.
inputContent
- This is a base64 string or buffer array of your video file. If your video is hosted on Google cloud storage, you'll want to use theinputUri
field instead. Unfortunately, only Google cloud storage URLs are supported. Otherwise, you will have to use theinputContent
.features
- This is an array of the Video intelligence features that should be run on the video. Read more in the documentation. For this tutorial, we only need theLABEL_DETECTION
feature which identifies objects, locations, activities, animal species, products, and more.videoContext.labelDetectionConfig.labelDetectionMode
- The mode to use to identify labels. We choseSHOT_AND_FRAME_MODE
which analyses frame by frame and also different shots/segments. Check out the official documentationvideoContext.labelDetectionConfig.stationaryCamera
- This will depend on the video that you are analysing. It informs the client whether the video camera is stationary or moving.videoContext.labelDetectionConfig.frameConfidenceThreshold
- Confidence threshold for frame analysis. Check out the official documentationvideoContext.labelDetectionConfig.videoConfidenceThreshold
- Confidence threshold for video segments analysis. Check out the official documentation
The annotateVideo
method of VideoIntelligenceServiceClient
returns an operation. We convert that to a promise by calling .promise()
on the operation and wait for the promise to resolve.
We then get the result using javascript Array destructuring. It's important that we understand the structure of the analysis/annotation result. Take a look at the official documentation for detailed information. Here's what the structure of operationResult
might look like.
1/// Structure of operationResult2{3 annotationResults: [4 {5 segmentLabelAnnotations: [6 {7 entity: {8 entityId: string,9 description: string,10 languageCode: string,11 },12 categoryEntities: [13 {14 entityId: string,15 description: string,16 languageCode: string,17 },18 ],19 segments: [20 {21 segment: {22 startTimeOffset: string,23 endTimeOffset: string,24 },25 confidence: number,26 },27 ],28 frames: [29 {30 timeOffset: string,31 confidence: number,32 },33 ],34 },35 ],36 frameLabelAnnotations: [37 {38 entity: {39 entityId: string,40 description: string,41 languageCode: string,42 },43 categoryEntities: [44 {45 entityId: string,46 description: string,47 languageCode: string,48 },49 ],50 segments: [51 {52 segment: {53 startTimeOffset: string,54 endTimeOffset: string,55 },56 confidence: number,57 },58 ],59 frames: [60 {61 timeOffset: string,62 confidence: number,63 },64 ],65 },66 ],67 },68 ],69}
The annotation results are an array and we only need the first item in the array. We then finish up by returning that. With all that in place, we're finally done with the backend and can now move on to the frontend.
Render the video on the client-side
Open pages/index.js
and replace the code inside with the following.
1// pages/index.js23import { useRef, useState, MutableRefObject } from "react";45export default function Home() {6 /**7 * @type {MutableRefObject<HTMLVideoElement>}8 */9 const playerRef = useRef(null);1011 // Our annotated video12 const [video, setVideo] = useState();1314 const [loading, setLoading] = useState(false);1516 return [17 <div key="main div">18 <header>19 <h1>Navigating auto tagged videos</h1>20 </header>2122 <main className="container">23 <div className="wrapper">24 <div className="actions">25 <button onClick={handleUploadVideo} disabled={loading}>26 Upload27 </button>28 </div>29 <hr />30 {loading31 ? [32 <div className="loading" key="loading div">33 Please be patient as the video uploads...34 </div>,35 <hr key="loading div break" />,36 ]37 : null}38 {video ? (39 <div className="videos-wrapper">40 <div className="video-wrapper">41 <video42 ref={playerRef}43 controls44 src={video.uploadResult.secure_url}45 ></video>46 <div className="navigation">47 </div>48 </div>49 <p>{video.uploadResult.secure_url}</p>50 </div>51 ): (52 <div className="no-videos">53 No video yet. Get started by clicking on upload above54 </div>55 )}56 </div>57 </main>58 </div>,59 <style key="style tag" jsx>60 </style>61 ];62}
We now have our barebones structure with a video element where the video will be rendered if the video
state is not null. The video element stores a reference to the DOM element in the playerRef
use ref hook. Read more about the useRef
hook from the official documentation.
And we also have a div with className navigation
. This div will hold our navigation markers. We'll work on that in the next section. For now, all we need is to define the handleUploadVideo
method inside our Home
component. Just above the return statement, add the following method.
1const handleUploadVideo = async () => {2 try {3 // Set loading to true4 setLoading(true);56 // Make a POST request to the `api/videos/` endpoint7 const response = await fetch("/api/videos", {8 method: "post",9 });1011 const data = await response.json();1213 // Check if the response is successful14 if (response.status >= 200 && response.status < 300) {15 /**16 * @type {UploadVideoResult}17 */18 const result = data.result;1920 // Update our videos state with the results21 setVideo(result);22 } else {23 throw data;24 }25 } catch (error) {26 // TODO: Handle error27 console.error(error);28 } finally {29 setLoading(false);30 // Set loading to true once a response is available31 }32 };
Inside the method, we first set the loading
state to true. We then make a POST request to our /api/videos
endpoint. We then extract the JSON body content from the response. Finally, we check if the response is successful and update our video state with the result. Again, It's important to note that it would be more ideal to have a form with a file input. You can then send the file to the backend for upload and analysis.
The final piece of the puzzle is the navigation markers.
Extract video markers from video analysis result.
Remember the empty div with className navigation
? Let's modify that. Look for the following div
1<div className="navigation">2</div>
and replace this with
1<div className="navigation">2 <h2>Rooms</h2>3 {video.annotations.frameLabelAnnotations4 .filter((annotation) =>5 annotation.categoryEntities?.some((entity) =>6 entity?.description?.includes("room")7 )8 )9 .map((annotation, index) => {10 return [11 <details key={`entity-${index}`}>12 <summary>{annotation.entity.description}</summary>13 <ul>14 {annotation.frames.map((frame, frameIndex) => {15 const seconds = frame.timeOffset.seconds ?? 0;1617 return (18 <li19 key={`frame-${frameIndex}`}20 onClick={() => {21 playerRef.current.currentTime = seconds;22 }}23 >24 Seek to{" "}25 {new Date(seconds * 1000)26 .toISOString()27 .substr(11, 8)}28 </li>29 );30 })}31 </ul>32 </details>,33 <hr key={`entity-break-${index}`} />,34 ];35 })}36</div>
At this point, it's important to understand the structure of the data stored in the video
state. It might be helpful to console log that out to get a good understanding. We also saw the structure of the analysis result at the end of the Analyze video using Google video intelligence section.
We first get video.annotations.frameLabelAnnotations
and filter the annotations that have the word room in the entity description. We do this because the result has a number of annotations/labels that we won't need. This will also depend on your use case. We filtered using room since we're focusing on real estate for this tutorial.
Next, we map through the remaining annotations and return a detailed element for every annotation since each annotation/label may have been spotted in more than one frame. For the summary, we show a description of the entity that was identified in the annotation/label. For the actual details, we return a list item for every frame which contains the entity.
Navigate video using extracted markers
For this, you need to understand how the HTML video element works. To seek the currently playing video we set its currentTime
field. This is mainly why we needed to keep a reference to the element in the playerRef
useRef hook. To navigate to a certain part of the video, we set the currentTime
of the video element to the timeOffset
of the frame where the annotation/label shows. The timeOffset
field has the seconds
and nanos
field. In our example, we just use the former(timeOffset.seconds
).
And that's it. You can find the full code along with the CSS on Codesandbox or on Github
Something to note
This tutorial just shows a simple way to get started. In a real-world application, you would want to optimize a few things. For example, the videos may take a long time to upload or analyze. It wouldn't be ideal to wait for this to finish. Have a look at Cloudinary notifications and Google video intelligence long-running operations. You might also want to store the resulting information in some sort of database.