Video Transcription In Nuxtjs

Eugene Musebe

Introduction

Accessibility is important now more than ever, especially for content-heavy platforms. In this tutorial, we learn how to automatically generate transcripts for our videos using Google's Speech-To-Text API.

Codesandbox

The final project can be viewed on Codesandbox.

For demo purposes, you can transcribe this video

You can find the full source code on my Github repository.

Prerequisites

To be able to follow along with this tutorial, entry-level knowledge of HTML, CSS, and JavaScript is required. Knowledge of VueJs will be a valuable addition but is not required.

Setup

NuxtJs

Nuxt.Js is an intuitive Vue.Js framework. Its main value propositions are that it is modular, performant while providing an enjoyable developer experience. To set it up, make sure you have npx installed. Npx is shipped by default since npm 5.2.0 or npm v6.1 or yarn.

To get started, open the terminal and run the following command in your preferred working directory:

1yarn create nuxt-app nuxtjs-video-transcription
2
3# OR
4
5npx create-nuxt-app nuxtjs-video-transcription
6
7# OR
8
9npm init nuxt-app nuxtjs-video-transcription

The above command will result in a series of setup questions. Here are our recommended defaults:

Projec name: nuxtjs-video-trascription

Programming language: JavaScript

Package manager: Yarn

UI Framework: Tailwind CSS

Nuxt.js modules: Axios - Promise based HTTP client

Linting tools: N/A

Testing frameworks: None

Rendering mode: Universal (SSR/SSG)

Deployment target: Server (Node.js hosting)

Development tools: N/A

What is your Github username <your-github-username>

Version control system: Git

After the setup is complete, feel free to enter the project and run it:

1cd nuxtjs-video-transcription
2
3
4
5yarn dev
6
7# OR
8
9npm run dev

Cloudinary

We will be using Cloudinary to store our videos as well as perform the video-to-audio conversion. Cloudinary is a media management that allows us to unleash the full potential of our media. Proceed to the sign up page to create a new account. Once logged in, check your console to view your Cloud name. We will use this during the configuration step.

We will first install the recommended Nuxt.Js plugin: @nuxtjs/cloudinary. To install it, run the following command in the project folder:

1yarn add @nuxtjs/cloudinary
2
3# OR
4
5npm install @nuxtjs/cloudinary

Once installation is complete, add the plugin to the modules section of the nuxt.config.js file. This is the default configuration file in our Nuxt.Js projects.

1// nuxt.config.js
2
3export default {
4
5...
6
7modules: [
8
9...
10
11'@nuxtjs/cloudinary'
12
13],
14
15...
16
17}

Add a cloudinary section to the bottom of the nuxt.config.js file. Here we will configure the instance:

1// nuxt.config.js
2
3export default {
4
5...
6
7cloudinary: {
8
9cloudName: process.env.NUXT_ENV_CLOUDINARY_CLOUD_NAME,
10
11useComponent: true
12
13}
14
15}

The cloudName is being loaded from an environmental variable. These are variables that are dependent on where the project is being run thus do not need to be included in the codebase. They can include sensitive keys as well. To set them, we'll create a .env file.

1touch .env

We will then add the variable to our .env file. The variables we want to be loaded into our Nuxt.Js app need to be prefixed with NUXT_ENV:

1<!-- .env -->
2
3NUXT_ENV_CLOUDINARY_CLOUD_NAME=<your-cloudinary-cloud-name>

We will also need to create an upload preset. This is a predetermined set of instructions on how file uploads should be handled. To create one upload settings page. Scroll down to create upload preset and create on with the following recommended defaults:

Name: default-preset

Mode: unsigned

Unique filename: true

Delivery type: upload

Access mode: public

Google Cloud storage

Audio files sent to Google Speech-To-Text API have to be stored on Google Cloud storage. The service does not accept external URLs. To be able to upload audio files from our app to Google Cloud Storage, we will need two things:

  • Google Cloud Bucket

  • Google Service Account Key

To set up a Google Cloud Bucket, proceed to the Cloud Storage Browser and create a bucket. If you do not have a Google Account, feel free to create it on here.

Once you have created your bucket, add it to the .env file:

1<!-- .env -->
2
3GCS_BUCKET_NAME=

To create a service account key, proceed to the Service account section. Create a service account and give it Storage Account Admin access to the project. This will allow the service account to be used to authenticate requests meant to upload a file.

Once the service account is created, proceed to the keys section and create a .json service account key. Download it and store it in a secure location. Add the path of the key to our .env file:

1<!-- .env -->
2
3GOOGLE_APPLICATION_CREDENTIALS=

By setting it to GOOGLE_APPLICATION_CREDENTIALS it will be used to authenticate all our Google Cloud API requests.

Google Speech-To-Text API

We need to enable the above API in order to access it. Proceed to the Google Speech-To-Text API page and enable it.

It will use the already set up service account to authenticate as well.

Express Server

We are going to utilize server-side requests to interact with the above Google APIs. To do this, we will install the following dependencies:

  • express - A fast, unopinionated, minimalist web framework for Node.js

  • request - A simplified HTTP client for Node.js

  • shortid - Amazingly short non-sequential url-friendly unique id generator.

To install the above, we'll open our terminal and run the following commands:

1yarn add express request shortid
2
3# OR
4
5npm install --save express request shortid

We are now going to create the file to hold our express API:

1touch server-middleware/api.js

To link the above file to our Nuxt.Js project, we will add a serverMiddleware section in the nuxt.config.js and link it to the /api path:

1// nuxt.config.js
2
3export default {
4
5...
6
7serverMiddleware: [
8
9{ path: "/api", handler: "~/server-middleware/api.js" },
10
11],
12
13...
14
15}

Uploading the video file

In order to upload our video file to Cloudinary, we need to create a form to select the file:

1<!-- pages/index.vue -->
2
3<template>
4
5...
6
7<form @submit.prevent="process">
8
9<input
10
11type="file"
12
13accept="video/*"
14
15name="file"
16
17v-on:change="handleFile"
18
19/>
20
21<input type="submit" value="upload" />
22
23</form>
24
25...

The above form will call handleFile when the file changes. This method will configure the selected file. We will use the process function to upload and process this file. The readData method simply opens the uploaded file and obtains the fileData in preparation for the upload.

1// pages/index.vue
2
3<script>
4
5export default {
6
7data() {
8
9return {
10
11file: null,
12
13cloudinaryInstance: null
14
15...
16
17};
18
19},
20
21
22
23methods: {
24
25async process() {
26
27this.cloudinaryInstance = await this.upload();
28
29...
30
31},
32
33
34
35async handleFile(e) {
36
37this.file = e.target.files[0];
38
39},
40
41
42
43async readData(f) {
44
45return new Promise((resolve) => {
46
47const reader = new FileReader();
48
49reader.onloadend = () => resolve(reader.result);
50
51reader.readAsDataURL(f);
52
53});
54
55},
56
57
58
59async upload() {
60
61const fileData = await this.readData(this.file);
62
63
64
65return await this.$cloudinary.upload(fileData, {
66
67upload_preset: "default-preset",
68
69folder: "nuxtjs-video-transcription",
70
71});
72
73}
74
75},
76
77};
78
79</script>

Audio conversion and GCS upload

At this point, we now have the cloudinaryInstance of our video file. We are going to obtain the URL of this video but specify that we want the mp3 format. We will remove this f_auto,q_auto/ section of the resultant URL as these are video transformations we no longer need. We will then send the mp3 url to our server for GCS upload.

1// pages/index.js
2
3
4
5<script>
6
7export default {
8
9data() {
10
11return {
12
13...
14
15gcsUrl: null,
16
17...
18
19};
20
21},
22
23
24
25methods: {
26
27async process() {
28
29this.cloudinaryInstance = await this.upload();
30
31this.gcsUrl = await this.uploadAudio();
32
33...
34
35},
36
37...
38
39async uploadAudio() {
40
41const url = this.$cloudinary.video
42
43.url(this.cloudinaryInstance.public_id, {
44
45format: "mp3",
46
47})
48
49.replace("f_auto,q_auto/", "");
50
51const { gcsUrl } = await this.$axios.$post("/api/gcs-store", { url });
52
53return gcsUrl;
54
55},
56
57...
58
59},
60
61};
62
63</script>

On the server-side, we will generate a unique filename, create a file on our Google Cloud Storage bucket then stream the file data into the file. This is because uploads can only be either as file data or a local URL and downloading the file to upload it will take up unnecessary space.

1// server-middleware/api.js
2
3import { parse } from "path";
4
5import { generate } from "shortid";
6
7
8
9require('dotenv').config()
10
11
12
13const app = require('express')()
14
15
16
17const express = require('express')
18
19
20
21app.use(express.json())
22
23
24
25app.all('/gcs-store', async (req, res) => {
26
27
28
29const url = req.body.url;
30
31
32
33const fetch = require('node-fetch');
34
35const { Storage } = require('@google-cloud/storage');
36
37
38
39const storage = new Storage();
40
41const bucket = storage.bucket(process.env.GCS_BUCKET_NAME);
42
43
44
45// Create unique filename
46
47const pathname = new URL(url).pathname;
48
49const { ext } = parse(pathname);
50
51const shortId = generate();
52
53const filename = `${shortId}${ext}`;
54
55
56
57// Create a WritableStream from the File
58
59const file = bucket.file(filename);
60
61const writeStream = file.createWriteStream();
62
63
64
65fetch(url)
66
67.then(res => {
68
69res.body.pipe(writeStream);
70
71});
72
73
74
75const gcsUrl = file.publicUrl()
76
77.replace("https://storage.googleapis.com/", "gs://");
78
79
80
81return res.json({ gcsUrl });
82
83})
84
85
86
87...
88
89
90
91module.exports = app

Once the data streaming is complete, we will use the publicUrl() method to get the file's URL. But this does not return the gs:// Google Service URL we need. To get this, we will replace the https://storage.googleapis.com/ substring with gs://.

We will return the gs url as this is all we need to transcribe the audio.

Audio transcription

In this step, we send the audio file to Google's Speech-to-Text API for transcription. We are first going to trigger a call to our server-side API which receives the transcription output:

1<script>
2
3export default {
4
5data() {
6
7return {
8
9...
10
11transcription: null,
12
13};
14
15},
16
17
18
19methods: {
20
21async process() {
22
23...
24
25await new Promise((r) => setTimeout(r, 5000));
26
27this.transcription = await this.transcribe();
28
29},
30
31async transcribe() {
32
33return await this.$axios.$post("/api/trascribe", { url: this.gcsUrl });
34
35},
36
37},
38
39};
40
41</script>

In the above code, we wait 5 seconds after upload to our GCS bucket is complete. This is to enable GCS to register and release the file. Immediate access attempts will fail with a 404 error.

On the server-side, we trigger the recognize method in the API client with the relevant data.

1// server-middleware/api.js
2
3...
4
5app.all('/trascribe', async (req, res) => {
6
7const speech = require('@google-cloud/speech').v1p1beta1;
8
9
10
11// Creates a client
12
13const client = new speech.SpeechClient();
14
15
16
17const config = {
18
19languageCode: "en-US",
20
21enableSpeakerDiarization: true,
22
23};
24
25
26
27const url = req.body.url;
28
29
30
31const audio = {
32
33uri: url
34
35};
36
37
38
39const request = {
40
41config,
42
43audio,
44
45};
46
47
48
49// Detects speech in the audio file.
50
51const [response] = await client.recognize(request);
52
53
54
55return res.json(response);
56
57})
58
59
60
61module.exports = app

The above code will return the entire transcription output.

Rendering the video

To render the uploaded video, we will simply use the CldVideo element.

1<!-- pages/index.html -->
2
3<template>
4
5....
6
7<cld-video
8
9v-if="cloudinaryInstance"
10
11:public-id="cloudinaryInstance.public_id"
12
13width="500"
14
15crop="scale"
16
17quality="auto"
18
19controls="true"
20
21/>
22
23....
24
25</template>

The above configuration enables our controls, manages the quality automatically, scales down the video to match the width, and only shows once the video has been uploaded.

Displaying the transcription results.

Once the transcription is done, we will want to present the output to our users in a user-friendly way. To do this, we can use a simple table:

1<template>
2
3....
4
5<table v-if="transcription">
6
7<thead>
8
9<tr>
10
11<th scope="col"> Language </th>
12
13<th scope="col"> Confidence </th>
14
15<th scope="col"> Text </th>
16
17</tr>
18
19</thead>
20
21<tbody>
22
23<tr
24
25v-for="(result, index) in transcription.results"
26
27:key="index"
28
29>
30
31<td> {{ result.languageCode }} </td>
32
33<td> {{ result.alternatives[0].confidence }}</td>
34
35<td > {{ result.alternatives[0].transcript }}</td>
36
37</tr>
38
39</tbody>
40
41</table>
42
43....
44
45</template>

Conclusion

The above tutorial shows us how we can make a simple video, convert it to audio, upload it to Google Cloud Storage and transcribe it using Google Speech-To-Text API. This is just a scratch of what the API can do. To improve transcription output, we can customize many options. Some of these include:

Feel free to review the API comprehensively to make the most out of it.

Eugene Musebe

Software Developer

I’m a full-stack software developer, content creator, and tech community builder based in Nairobi, Kenya. I am addicted to learning new technologies and loves working with like-minded people.