Introduction
Accessibility is important now more than ever, especially for content-heavy platforms. In this tutorial, we learn how to automatically generate transcripts for our videos using Google's Speech-To-Text API.
Codesandbox
The final project can be viewed on Codesandbox.
For demo purposes, you can transcribe this video
You can find the full source code on my Github repository.
Prerequisites
To be able to follow along with this tutorial, entry-level knowledge of HTML, CSS, and JavaScript is required. Knowledge of VueJs will be a valuable addition but is not required.
Setup
NuxtJs
Nuxt.Js is an intuitive Vue.Js framework. Its main value propositions are that it is modular, performant while providing an enjoyable developer experience. To set it up, make sure you have npx installed. Npx is shipped by default since npm 5.2.0 or npm v6.1 or yarn.
To get started, open the terminal and run the following command in your preferred working directory:
1yarn create nuxt-app nuxtjs-video-transcription23# OR45npx create-nuxt-app nuxtjs-video-transcription67# OR89npm init nuxt-app nuxtjs-video-transcription
The above command will result in a series of setup questions. Here are our recommended defaults:
Projec name: nuxtjs-video-trascription
Programming language: JavaScript
Package manager: Yarn
UI Framework: Tailwind CSS
Nuxt.js modules: Axios - Promise based HTTP client
Linting tools: N/A
Testing frameworks: None
Rendering mode: Universal (SSR/SSG)
Deployment target: Server (Node.js hosting)
Development tools: N/A
What is your Github username
<your-github-username>
Version control system: Git
After the setup is complete, feel free to enter the project and run it:
1cd nuxtjs-video-transcription2345yarn dev67# OR89npm run dev
Cloudinary
We will be using Cloudinary to store our videos as well as perform the video-to-audio conversion. Cloudinary is a media management that allows us to unleash the full potential of our media. Proceed to the sign up page to create a new account. Once logged in, check your console to view your Cloud name
. We will use this during the configuration step.
We will first install the recommended Nuxt.Js plugin: @nuxtjs/cloudinary. To install it, run the following command in the project folder:
1yarn add @nuxtjs/cloudinary23# OR45npm install @nuxtjs/cloudinary
Once installation is complete, add the plugin to the modules
section of the nuxt.config.js
file. This is the default configuration file in our Nuxt.Js projects.
1// nuxt.config.js23export default {45...67modules: [89...1011'@nuxtjs/cloudinary'1213],1415...1617}
Add a cloudinary
section to the bottom of the nuxt.config.js
file. Here we will configure the instance:
1// nuxt.config.js23export default {45...67cloudinary: {89cloudName: process.env.NUXT_ENV_CLOUDINARY_CLOUD_NAME,1011useComponent: true1213}1415}
The cloudName
is being loaded from an environmental variable. These are variables that are dependent on where the project is being run thus do not need to be included in the codebase. They can include sensitive keys as well. To set them, we'll create a .env
file.
1touch .env
We will then add the variable to our .env
file. The variables we want to be loaded into our Nuxt.Js app need to be prefixed with NUXT_ENV
:
1<!-- .env -->23NUXT_ENV_CLOUDINARY_CLOUD_NAME=<your-cloudinary-cloud-name>
We will also need to create an upload preset. This is a predetermined set of instructions on how file uploads should be handled. To create one upload settings page. Scroll down to create upload preset
and create on with the following recommended defaults:
Name: default-preset
Mode: unsigned
Unique filename: true
Delivery type: upload
Access mode: public
Google Cloud storage
Audio files sent to Google Speech-To-Text API have to be stored on Google Cloud storage. The service does not accept external URLs. To be able to upload audio files from our app to Google Cloud Storage, we will need two things:
Google Cloud Bucket
Google Service Account Key
To set up a Google Cloud Bucket, proceed to the Cloud Storage Browser and create a bucket. If you do not have a Google Account, feel free to create it on here.
Once you have created your bucket, add it to the .env
file:
1<!-- .env -->23GCS_BUCKET_NAME=
To create a service account key, proceed to the Service account section. Create a service account and give it Storage Account Admin
access to the project. This will allow the service account to be used to authenticate requests meant to upload a file.
Once the service account is created, proceed to the keys
section and create a .json
service account key. Download it and store it in a secure location. Add the path of the key to our .env
file:
1<!-- .env -->23GOOGLE_APPLICATION_CREDENTIALS=
By setting it to GOOGLE_APPLICATION_CREDENTIALS
it will be used to authenticate all our Google Cloud API requests.
Google Speech-To-Text API
We need to enable the above API in order to access it. Proceed to the Google Speech-To-Text API page and enable it.
It will use the already set up service account to authenticate as well.
Express Server
We are going to utilize server-side requests to interact with the above Google APIs. To do this, we will install the following dependencies:
express - A fast, unopinionated, minimalist web framework for Node.js
shortid - Amazingly short non-sequential url-friendly unique id generator.
To install the above, we'll open our terminal and run the following commands:
1yarn add express request shortid23# OR45npm install --save express request shortid
We are now going to create the file to hold our express API:
1touch server-middleware/api.js
To link the above file to our Nuxt.Js project, we will add a serverMiddleware
section in the nuxt.config.js
and link it to the /api
path:
1// nuxt.config.js23export default {45...67serverMiddleware: [89{ path: "/api", handler: "~/server-middleware/api.js" },1011],1213...1415}
Uploading the video file
In order to upload our video file to Cloudinary, we need to create a form to select the file:
1<!-- pages/index.vue -->23<template>45...67<form @submit.prevent="process">89<input1011type="file"1213accept="video/*"1415name="file"1617v-on:change="handleFile"1819/>2021<input type="submit" value="upload" />2223</form>2425...
The above form will call handleFile
when the file changes. This method will configure the selected file. We will use the process
function to upload and process this file. The readData
method simply opens the uploaded file and obtains the fileData in preparation for the upload.
1// pages/index.vue23<script>45export default {67data() {89return {1011file: null,1213cloudinaryInstance: null1415...1617};1819},20212223methods: {2425async process() {2627this.cloudinaryInstance = await this.upload();2829...3031},32333435async handleFile(e) {3637this.file = e.target.files[0];3839},40414243async readData(f) {4445return new Promise((resolve) => {4647const reader = new FileReader();4849reader.onloadend = () => resolve(reader.result);5051reader.readAsDataURL(f);5253});5455},56575859async upload() {6061const fileData = await this.readData(this.file);62636465return await this.$cloudinary.upload(fileData, {6667upload_preset: "default-preset",6869folder: "nuxtjs-video-transcription",7071});7273}7475},7677};7879</script>
Audio conversion and GCS upload
At this point, we now have the cloudinaryInstance
of our video file. We are going to obtain the URL of this video but specify that we want the mp3
format. We will remove this f_auto,q_auto/
section of the resultant URL as these are video transformations we no longer need. We will then send the mp3 url to our server for GCS upload.
1// pages/index.js2345<script>67export default {89data() {1011return {1213...1415gcsUrl: null,1617...1819};2021},22232425methods: {2627async process() {2829this.cloudinaryInstance = await this.upload();3031this.gcsUrl = await this.uploadAudio();3233...3435},3637...3839async uploadAudio() {4041const url = this.$cloudinary.video4243.url(this.cloudinaryInstance.public_id, {4445format: "mp3",4647})4849.replace("f_auto,q_auto/", "");5051const { gcsUrl } = await this.$axios.$post("/api/gcs-store", { url });5253return gcsUrl;5455},5657...5859},6061};6263</script>
On the server-side, we will generate a unique filename, create a file on our Google Cloud Storage bucket then stream the file data into the file. This is because uploads can only be either as file data or a local URL and downloading the file to upload it will take up unnecessary space.
1// server-middleware/api.js23import { parse } from "path";45import { generate } from "shortid";6789require('dotenv').config()10111213const app = require('express')()14151617const express = require('express')18192021app.use(express.json())22232425app.all('/gcs-store', async (req, res) => {26272829const url = req.body.url;30313233const fetch = require('node-fetch');3435const { Storage } = require('@google-cloud/storage');36373839const storage = new Storage();4041const bucket = storage.bucket(process.env.GCS_BUCKET_NAME);42434445// Create unique filename4647const pathname = new URL(url).pathname;4849const { ext } = parse(pathname);5051const shortId = generate();5253const filename = `${shortId}${ext}`;54555657// Create a WritableStream from the File5859const file = bucket.file(filename);6061const writeStream = file.createWriteStream();62636465fetch(url)6667.then(res => {6869res.body.pipe(writeStream);7071});72737475const gcsUrl = file.publicUrl()7677.replace("https://storage.googleapis.com/", "gs://");78798081return res.json({ gcsUrl });8283})84858687...88899091module.exports = app
Once the data streaming is complete, we will use the publicUrl()
method to get the file's URL. But this does not return the gs://
Google Service URL we need. To get this, we will replace the https://storage.googleapis.com/
substring with gs://
.
We will return the gs
url as this is all we need to transcribe the audio.
Audio transcription
In this step, we send the audio file to Google's Speech-to-Text API for transcription. We are first going to trigger a call to our server-side API which receives the transcription output:
1<script>23export default {45data() {67return {89...1011transcription: null,1213};1415},16171819methods: {2021async process() {2223...2425await new Promise((r) => setTimeout(r, 5000));2627this.transcription = await this.transcribe();2829},3031async transcribe() {3233return await this.$axios.$post("/api/trascribe", { url: this.gcsUrl });3435},3637},3839};4041</script>
In the above code, we wait 5 seconds after upload to our GCS bucket is complete. This is to enable GCS to register and release the file. Immediate access attempts will fail with a 404
error.
On the server-side, we trigger the recognize
method in the API client with the relevant data.
1// server-middleware/api.js23...45app.all('/trascribe', async (req, res) => {67const speech = require('@google-cloud/speech').v1p1beta1;891011// Creates a client1213const client = new speech.SpeechClient();14151617const config = {1819languageCode: "en-US",2021enableSpeakerDiarization: true,2223};24252627const url = req.body.url;28293031const audio = {3233uri: url3435};36373839const request = {4041config,4243audio,4445};46474849// Detects speech in the audio file.5051const [response] = await client.recognize(request);52535455return res.json(response);5657})58596061module.exports = app
The above code will return the entire transcription output.
Rendering the video
To render the uploaded video, we will simply use the CldVideo
element.
1<!-- pages/index.html -->23<template>45....67<cld-video89v-if="cloudinaryInstance"1011:public-id="cloudinaryInstance.public_id"1213width="500"1415crop="scale"1617quality="auto"1819controls="true"2021/>2223....2425</template>
The above configuration enables our controls, manages the quality automatically, scales down the video to match the width, and only shows once the video has been uploaded.
Displaying the transcription results.
Once the transcription is done, we will want to present the output to our users in a user-friendly way. To do this, we can use a simple table:
1<template>23....45<table v-if="transcription">67<thead>89<tr>1011<th scope="col"> Language </th>1213<th scope="col"> Confidence </th>1415<th scope="col"> Text </th>1617</tr>1819</thead>2021<tbody>2223<tr2425v-for="(result, index) in transcription.results"2627:key="index"2829>3031<td> {{ result.languageCode }} </td>3233<td> {{ result.alternatives[0].confidence }}</td>3435<td > {{ result.alternatives[0].transcript }}</td>3637</tr>3839</tbody>4041</table>4243....4445</template>
Conclusion
The above tutorial shows us how we can make a simple video, convert it to audio, upload it to Google Cloud Storage and transcribe it using Google Speech-To-Text API. This is just a scratch of what the API can do. To improve transcription output, we can customize many options. Some of these include:
etc
Feel free to review the API comprehensively to make the most out of it.