Extracting Webcam Text with Next.JS

Eugene Musebe

Introduction

This article demonstrates texts can be extracted through webcam via nextjs and tesseract js library.

Codesandbox

Check the sandbox demo on Codesandbox.

You can also get the project GitHub repo using Github.

Prerequisites

Entry-level javascript and React/Nextjs knowledge.

Setting Up the Sample Project

Create a new nextjs app using npx create-next-app webcamtext in your terminal. Head to your project root directory cd webcamtext

We will begin by setting up our backend involving Cloudinary integration.

Create your own cloudinary account using this link and log into it. Cloudinary will provide you with a dashboard containing the environment variables necessary for the integration of your project.

To integrate, start by including cloudinary in your project dependencies npm install cloudinary In your project root directory, create a new file named .env and paste the following code.

1".env"
2
3
4CLOUDINARY_CLOUD_NAME =
5
6CLOUDINARY_API_KEY =
7
8CLOUDINARY_API_SECRET =

Fill the blanks with your environment variables from the Cloudinary dashboard and restart your project using: npm run dev.

In the pages/api folder, create a new file named upload.js and begin by configuring the environment keys and libraries.

1var cloudinary = require("cloudinary").v2;
2
3cloudinary.config({
4 cloud_name: process.env.CLOUDINARY_NAME,
5 api_key: process.env.CLOUDINARY_API_KEY,
6 api_secret: process.env.CLOUDINARY_API_SECRET,
7});

Next js includes a handler function to execute the POST request. The function will receive media file data and post it to the cloudinary website. It then captures the media file's cloudinary link and sends it back as a response.

1export default async function handler(req, res) {
2 if (req.method === "POST") {
3 let url = ""
4 try {
5 let fileStr = req.body.data;
6 const uploadedResponse = await cloudinary.uploader.upload_large(
7 fileStr,
8 {
9 resource_type: "video",
10 chunk_size: 6000000,
11 }
12 );
13 url = uploadedResponse.url
14 } catch (error) {
15 res.status(500).json({ error: "Something wrong" });
16 }
17
18 res.status(200).json({data: url});
19 }
20}

The code above concludes our backend. Let us now extract our texts.

Start by downloading tesseract.js in your dependencies: npm install tesseract.js

Create a directory components/Main.client.js and include tesseract.js in your imports

1components/main.client.json
2
3import Tesseract from "tesseract.js";
4import { createWorker } from "tesseract.js";
5
6import React, { useRef, useEffect, useState } from "react";

Notice the react state hooks included. We will use them as we move on.

Inside your Main function, start by declaring the following functions for your state hooks as well as filling in the return statement

1export default function Main() {
2 const processedVid = useRef();
3 const rawVideo = useRef();
4 const startBtn = useRef();
5 const closeBtn = useRef();
6 const snapBtn = useRef();
7 const text_canvas = useRef();
8
9 const [model, setModel] = useState(null);
10 const [output, setOutput] = useState(null);
11
12 useEffect(() => {
13 // captureOutput();
14 if (model) return;
15 const start_time = Date.now() / 1000;
16 worker.load().then((m) => {
17 setModel(m);
18 const end_time = Date.now() / 1000;
19 console.log(`model loaded successfully, ${end_time - start_time}`);
20 });
21 }, []);
22
23 return(
24 <>
25 {model && (
26 <>
27 <div className="card">
28 <div className="videos">
29 <video
30 className="display"
31 width={800}
32 height={450}
33 ref={rawVideo}
34 autoPlay
35 playsInline
36 />
37 </div>
38
39 <canvas
40 className="display"
41 width={800}
42 height={450}
43 ref={processedVid}
44 ></canvas>
45 </div>
46
47 {output && (
48 <canvas width={800} height={450} ref={text_canvas}>
49 {output}
50 </canvas>
51 )}
52
53 <div className="buttons">
54 <button className="button" onClick={startCamHandler} ref={startBtn}>
55 Start Webcam
56 </button>
57 <button className="button" onClick={stopCamHandler} ref={closeBtn}>
58 Close camera
59 </button>
60
61 <button className="button" onClick={captureSnapshot} ref={snapBtn}>
62 Capture snapshot and save
63 </button>
64 </div>
65 </>
66 )}
67 {!model && <div>Loading machine learning models...</div>}
68 </>
69 )
70}

The code above declares use state and use ref hooks constants that we use to link to the dom elements. In the DOM elements, we will have a video element for our webcam and 2 canvas, one for viewing a screenshot of the captured frame we decode the text from and one for viewing the decoded texts. We also include a useRef hook to allow us to load our models only when necessary.

Below the use effect, create a function below:

1const startCamHandler = async () => {
2 console.log("Starting webcam and mic ..... ");
3 localStream = await navigator.mediaDevices.getUserMedia({
4 video: true,
5 audio: false,
6 });
7
8 //populate video element
9 rawVideo.current.srcObject = localStream;
10 video_in = rawVideo.current;
11 rawVideo.current.addEventListener("loadeddata", (ev) => {
12 console.log("loaded data.");
13 });
14
15 mediaRecorder = new MediaRecorder(localStream, options);
16 mediaRecorder.ondataavailable = (event) => {
17 console.log("data-available");
18 if (event.data.size > 0) {
19 recordedChunks.push(event.data);
20 }
21 };
22 mediaRecorder.start();
23 };

The code above will first request a user to activate their webcam and mic and populate the video element with the webcam camera using a media recorder. Below it adds the code below

1const stopCamHandler = async () => {
2 console.log("Hanging up the call ...");
3 localStream.getTracks().forEach((track) => track.stop());
4
5 localStream.getTracks().forEach(function (track) {
6 track.stop();
7 });
8 };

The function above will stop the local media stream when the user is finished. Proceed with the following

1const captureSnapshot = async () => {
2 c_out = processedVid.current;
3
4 c_out
5 .getContext("2d")
6 .drawImage(video_in, 0, 0, video_in.videoWidth, video_in.videoHeight);
7
8 let img_url = c_out.toDataURL("image/png");
9
10 await worker.loadLanguage("eng");
11 await worker.initialize("eng");
12
13 // pass the image data to tessaract
14 const {
15 data: { text },
16 } = await worker.recognize(img_url);
17 console.log(text, " retrieved text");
18
19 setOutput(text.replace(/[^a-zA-Z ]/g, " "));
20
21 // uploadVideo(to_cloudinary);
22 await stopCamHandler();
23 };

In the code above, we reference a canvas using the useRef processedVid and draw in the image captured by the user using a draw image method. The image is then passed to tesseract where it is decoded, cleaned and assigned to the output variable using a use state hook. The function is async because it awaits the stopCamHandler function

At this point, we will then pass the decoded text to cloudinary through an image containing all the texts the webcam has decoded

1const uploadVideo = async (base64) => {
2 console.log("uploading to backend...");
3 try {
4 fetch("/api/upload", {
5 method: "POST",
6 body: JSON.stringify({ data: base64 }),
7 headers: { "Content-Type": "application/json" },
8 }).then((response) => {
9 console.log("successfull session", response.status);
10 });
11 } catch (error) {
12 console.error(error);
13 }
14 };

That's it!. Ensure to go through the article to enjoy the experience

Eugene Musebe

Software Developer

I’m a full-stack software developer, content creator, and tech community builder based in Nairobi, Kenya. I am addicted to learning new technologies and loves working with like-minded people.