Create Video Captions with Next.js

Banner for a MediaJam post

Eugene Musebe


Video captioning is vital for making videos accessible to a larger audience, providing a better ROI, and getting more people to start watching the videos. This article demonstrates how Next js can be used to identify and display captions from videos and will also include an online storage feature. Both elements shall be achieved via Cloudinary online services


Check the sandbox demo on Codesandbox.

You can also get the project Github repo using Github.


Entry-level javascript and React/Nextjs knowledge.

Setting Up the Sample Project

In your respective folder, create a new nextjs app : npx create-next-app videocaptions in your terminal. Go to your project root directory cd videocaptions

We will use Nextjs serverside backend. Here we will set up Cloudinary for our backend. Start by creating your Cloudinary account using Link and logging in to it. Each Cloudinary user account will have a dashboard containing the environment variable keys necessary for the Cloudinary integration in our project.

Include Cloudinary in your project dependencies npm install cloudinary. Create a directory .env.local in the root directory and use the following guide to feel the dashboard variables to your project.


To load the variables, restart your project: npm run dev.

In the pages/api folder, create a new directory pages/api/cloudinary.js. Start by configuring the environment keys and libraries.

3var cloudinary = require("cloudinary").v2;
6 cloud_name: process.env.CLOUDINARY_NAME,
7 api_key: process.env.CLOUDINARY_API_KEY,
8 api_secret: process.env.CLOUDINARY_API_SECRET,

Create a handler function to execute the POST request. Cloudinary has its own set of video transformation capabilities. We will use the POST request to generate speech-to-text transcripts. It is as easy as including the parameters inside the below handler function. Once a file is uploaded. The video will be uploaded and a captioned transcript will be available in a user's account media library.

3export default async function handler(req, res) {
4 let uploaded_url = '';
5 const fileStr =;
7 if (req.method === 'POST') {
8 try {
9 const uploadedResponse = await cloudinary.uploader.upload_large(fileStr, {
10 resource_type: "video",
11 chunk_size: 6000000,
12 raw_convert: "google_speech"
13 });
14 uploaded_url = uploadedResponse.secure_url;
15 console.log(uploadedResponse)
16 } catch (error) {
17 console.log(error);
18 }
19 res.status(200).json({ data: uploaded_url });
20 console.log('complete!');
21 }

Now we complete the front end. Here, we only need our UI to be able to import a video file and send it to the backend for captioning. Paste the following code in the return function. The css files can be located in the Github repository.

4return (
5 <div className="container" >
6 <h2>Nextjs Video Captioning</h2>
7 <div className="row">
8 <div className="column">
9 <button onClick={() => {}}>Select video</button>
10 <input
11 ref={inputRef}
12 type="file"
13 hidden
14 onChange={onChange}
15 /><br />
16 {video ? (
17 <video ref={videoRef} className="Video" controls src={URL.createObjectURL(video)} autoPlay loop/>
18 ) :
19 <video title="video shows here" controls />
20 }<br />
21 </div>
22 </div>
23 {sampleselected?
24 "caption complete! Check the captioned text in your Cloudinary media library"
25 :
26 <button onClick={captionHandler}>Click</button>
27 }
28 </div>

Once you paste the css code, the UI should look like the below:

complete UI

Now to instruct our buttons. In the home component, import and declare the following state hooks. We will use them to reference the video elements as we access them inside the functions.

1import { useRef, useState } from 'react';
3export default function Home() {
4 const videoRef = useRef();
5 const inputRef = useRef();
6 const [video, setVideo] = useState();
7 const [sampleselected, setSampleSelected] = useState(false);
10 const onChange = async (e) => {
11 const file =;
12 setVideo(file)
13 }
15 return (
16 <div className="container" >
17 <h2>Nextjs Video Captioning</h2>
18 <div className="row">
19 <div className="column">
20 <button onClick={() => {}}>Select video</button>
21 <input
22 ref={inputRef}
23 type="file"
24 hidden
25 onChange={onChange}
26 /><br />
27 {video ? (
28 <video ref={videoRef} className="Video" controls src={URL.createObjectURL(video)} autoPlay loop/>
29 ) :
30 <video title="video shows here" controls />
31 }<br />
32 </div>
33 </div>
34 {sampleselected?
35 "caption complete! Check the captioned text in your Cloudinary media library"
36 :
37 <button onClick={captionHandler}>Click</button>
38 }
39 </div>
40 )

The select video button is used to fire the onChange function which imports the local video files to be viewed in the video Element. We use the setVideo state hook to access the selected file throughout the rest of the functions we'll create.

We finally use a file reader to encode the selected file into base64 format and send it to the backend for Cloudinary upload. Once the upload is complete the user will be notified to check their accounts media library for the transcript.

Your media library will contain a file like the below:

captioned transcript.

That completes the project. You can go through the article to enjoy your experience

Eugene Musebe

Software Developer

I’m a full-stack software developer, content creator, and tech community builder based in Nairobi, Kenya. I am addicted to learning new technologies and loves working with like-minded people.