Video Transcription Using Cloudinary

Banner for a MediaJam post

Ekene Eze

Accessibility is one of the most important parts of the modern web. That is why we are using transcriptions to enhance the usability of video files online. Transcriptions are one of the most accessible ways to deliver video content as it caters to the challenges of a wide variety of web users. In this post, we'll look at how to add transcriptions to videos rendered in a Nextjs application with Cloudinary.

In the end, we'll create a web application that uses the Cloudinary API to transcribe a user-uploaded video and returns a downloadable URL of the transcribed video.

The Cloudinary API uses a Google-Speech-AI add-on to generate a subtitle file for the uploaded video, and then we will add a transformation that overlays this subtitle on the given video.


To follow along with this tutorial, you will need to have, — A free Cloudinary account. — Experience with JavaScript and React.js — Next.js is not a requirement, but it's good to have.


If you'd like to get a headstart by looking at the finished demo, I've got it set up here on Codesandbox for you!

We completed this project in this sandbox.

To test successfully with this demo, ensure that you upload a video size <1MB

Fork and run it to quickly get started.

Setup and Installations

First, we will create a Next.js boilerplate with the following command:

1npx create-next-app video-transcription

Let's navigate to the root folder and install Netlify-CLI with the following command:

1cd video-transcription

Next, install the following packages:

  • Cloudinary — a NodeJS SDK to interact with the Cloudinary APIs

  • File saver — to help us save our transcribed video

  • Axios — to carry out HTTP requests.

  • Dotenv — to store our API keys safely.

The following command will install all the above packages

1npm i cloudinary file-saver axios dotenv

Setup Cloudinary transcription Add-on

To enable the transcription feature on Cloudinary, we need to follow the process shown below:

Navigate to the Add-ons tab on your Cloudinary account and select the Google AI Video Transcription add-on.

Next, select the free plan that offers 120 monthly units. For a more broad project, you should probably select a paid plan with more units, but this will be sufficient for our demo.

Navigate back into the project folder and start the development server with the command below:

1npm run dev

The above command starts a development server at http://localost:3000. You can check that port on the browser to see our demo app running. Next, create a transacribe.js file in our pages/api folder. Then we add the following snippet to it:

1const multiparty = require("multiparty");
2const Cloudinary = require("cloudinary").v2;
3const pth = require("path");
6const uploadVideo = async (req, res) => {
7 const form = new multiparty.Form();
8 const data = await new Promise((resolve, reject) => {
9 form.parse(req, async function (err, fields, files) {
10 if (err) reject({ err });
11 const path =[0].path;
12 const filename = pth.parse([0].originalFilename).name;
14 // config Cloudinary
16 // rest of the code here
17 } catch (error) {
18 console.log(error);
19 }
20 });
21 });
23 res.status(200).json({ success: true, data });
26export default uploadVideo;

Here, we define a uploadVideo() function that first accepts the video file coming from the client. Next, we parse the video

In the snippet above, we:

  • Import Cloudinary and other necessary packages.
  • Create an uploadVideo() function to receive the video file from the client
  • Parse the request data with multiparty to retrieve the video's path and filename.

Next, we need to upload the retrieved video file to Cloudinary and transcribe using the Cloudinary Video Transcription Add-on we added.

2try {
4 cloud_name: process.env.CLOUD_NAME,
5 api_key: process.env.API_KEY,
6 api_secret: process.env.API_SECRET,
7 secure: true
10const VideoTranscribe = Cloudinary.uploader.upload(
11 path,
12 {
13 resource_type: "video",
14 public_id: `videos/${filename}`,
15 raw_convert: "google_speech:srt"
16 },
17 function (error, result) {
18 if (result) {
19 return result;
20 }
21 return error;
22 }
25let { public_id } = await VideoTranscribe;
27const transcribedVideo = Cloudinary.url(`${public_id}`, {
28 resource_type: "video",
29 fallback_content: "Your browser does not support HTML5 video tags",
30 transformation: [
31 {
32 overlay: {
33 resource_type: "subtitles",
34 public_id: `${public_id}.srt`
35 }
36 }
37 ]
39resolve({ transcribedVideo });
40} catch (error) {
44In the snippet above, we set up a Cloudinary instance to enable communications between our Next.js project and our Cloudinary account. Next, we upload the video to Cloudinary, transcribe it and return the result (the transcribed video). Lastly, we destructure the `public_id` of the transcribed video and use it to fetch the transcribed video. Afterwards, we simply add a Cloudinary transformation that overlays the subtitle on the video thereby achieving a complete video transcription functionality for the originally uploaded video.
46> **Note**: The Cloudinary Video Transcription feature can only be triggered during an `upload` or `update` call.
47> Also, each 15s video you transcribe takes 1 unit from your allocated 120 units in the free plan.
49With this, we are finished with our transcription logic.
51Next, let's implement the frontend aspect of this application. For this part, we will be creating a JSX form with an input field of type *file*, and a submit button.
52Navigate to the `index.js` file in the `pages` folder and add the following code:
55// pages/index.js
56import Head from 'next/head'
57import axios from 'axios'
58import { useState } from 'react'
59import {saveAs} from 'file-saver'
60export default function Home() {
61 const [selected, setSelected] = useState(null)
62 const [videoUrl, setVideoUrl] = useState('')
63 const [downloaded, setDownloaded] = useState(false)
65 const handleChange = (e) => {
66 if ( &&[0]) {
67 const i =[0];
68 let reader = new FileReader()
69 reader.onload = () => {
70 let base64String = reader.result
71 setSelected(base64String)
72 }
73 reader.readAsDataURL(i)
74 }
75 }
77 const handleSubmit = async(e) => {
78 e.preventDefault()
79 try {
80 const body = JSON.stringify(selected)
81 const config = {
82 headers: {
83 "Content-Type": "application/json"
84 }
85 };
86 const response = await'/transcribe', body, config)
87 const { data } = await
88 setVideoUrl(data)
89 } catch (error) {
90 console.error(error);
91 }
92 }
93 // return statement return ()

In the snippet above, we've set up the handleChange() and handleSubmit() functions to handle the interaction and submission of our form. Ideally, you can click on the Choose file button to select a video from your local filesystem, and the Upload video button to submit the selected video to our Next.js /transcribe API route for transcription.

Next, let's set up the return statement of our index.js file to render the JSX form for choosing and uploading videos for transcription:

1return (
3 <Head>
4 <title>Create Next App</title>
5 <meta name="description" content="Generated by create next app" />
6 <link rel="icon" href="/favicon.ico" />
7 </Head>
8 <header>
9 <h1>
10 Video transcription with Cloudinary
11 </h1>
12 </header>
13 <main>
14 <section>
15 <form onSubmit={handleSubmit}>
16 <label>
17 <span>Choose your video file</span>
18 <input type="file" onChange={handleChange} required />
19 </label>
20 <button type='submit'>Upload</button>
21 </form>
22 </section>
23 <section id="video-output">
24 {
25 videoUrl?
26 <div >
27 <div>
28 <video controls width={480}>
29 <source src={`${videoUrl}.webm`} type='video/webm'/>
30 <source src={`${videoUrl}.mp4`} type='video/mp4'/>
31 <source src={`${videoUrl}.ogv`} type='video/ogg'/>
32 </video>
33 </div>
34 <button
35 onClick={() => {
36 saveAs(videoUrl, "transcribed-video");
37 setDownloaded(true)}}
38 disabled={downloaded? true: false}>
39 {downloaded? 'Downloaded': 'Download'}
40 </button>
41 </div> :
42 <p>Please Upload a Video file to be Transcribed</p>
43 }
44 </section>
45 </main>

And with that, we should be able to upload and transcribe videos. As a bonus, I've added a couple more functionalities to allow you to download the transcribed video for local use. If you enjoyed this, be sure to come back for more as I look forward to all the things you'll do with this feature.

Ekene Eze

Director of DX at Plasmic

I work at Plasmic as the Director of Developer Experience and I love executing projects that help other software engineers. Passionate about sharing with the community, I often write, present workshops, create courses, and speak at conferences about web development concepts and best practices.