Introduction
This article demonstrates texts can be extracted through webcam via nextjs and tesseract js library.
Codesandbox
Check the sandbox demo on Codesandbox.
You can also get the project GitHub repo using Github.
Prerequisites
Entry-level javascript and React/Nextjs knowledge.
Setting Up the Sample Project
Create a new nextjs app using npx create-next-app webcamtext
in your terminal.
Head to your project root directory cd webcamtext
We will begin by setting up our backend involving Cloudinary integration.
Create your own cloudinary account using this link and log into it. Cloudinary will provide you with a dashboard containing the environment variables necessary for the integration of your project.
To integrate, start by including cloudinary in your project dependencies npm install cloudinary
In your project root directory, create a new file named .env
and paste the following code.
1".env"234CLOUDINARY_CLOUD_NAME =56CLOUDINARY_API_KEY =78CLOUDINARY_API_SECRET =
Fill the blanks with your environment variables from the Cloudinary dashboard and restart your project using: npm run dev
.
In the pages/api
folder, create a new file named upload.js
and begin by configuring the environment keys and libraries.
1var cloudinary = require("cloudinary").v2;23cloudinary.config({4 cloud_name: process.env.CLOUDINARY_NAME,5 api_key: process.env.CLOUDINARY_API_KEY,6 api_secret: process.env.CLOUDINARY_API_SECRET,7});
Next js includes a handler function to execute the POST request. The function will receive media file data and post it to the cloudinary website. It then captures the media file's cloudinary link and sends it back as a response.
1export default async function handler(req, res) {2 if (req.method === "POST") {3 let url = ""4 try {5 let fileStr = req.body.data;6 const uploadedResponse = await cloudinary.uploader.upload_large(7 fileStr,8 {9 resource_type: "video",10 chunk_size: 6000000,11 }12 );13 url = uploadedResponse.url14 } catch (error) {15 res.status(500).json({ error: "Something wrong" });16 }1718 res.status(200).json({data: url});19 }20}
The code above concludes our backend. Let us now extract our texts.
Start by downloading tesseract.js
in your dependencies: npm install tesseract.js
Create a directory components/Main.client.js
and include tesseract.js in your imports
1components/main.client.json23import Tesseract from "tesseract.js";4import { createWorker } from "tesseract.js";56import React, { useRef, useEffect, useState } from "react";
Notice the react state hooks included. We will use them as we move on.
Inside your Main
function, start by declaring the following functions for your state hooks as well as filling in the return statement
1export default function Main() {2 const processedVid = useRef();3 const rawVideo = useRef();4 const startBtn = useRef();5 const closeBtn = useRef();6 const snapBtn = useRef();7 const text_canvas = useRef();89 const [model, setModel] = useState(null);10 const [output, setOutput] = useState(null);1112 useEffect(() => {13 // captureOutput();14 if (model) return;15 const start_time = Date.now() / 1000;16 worker.load().then((m) => {17 setModel(m);18 const end_time = Date.now() / 1000;19 console.log(`model loaded successfully, ${end_time - start_time}`);20 });21 }, []);2223 return(24 <>25 {model && (26 <>27 <div className="card">28 <div className="videos">29 <video30 className="display"31 width={800}32 height={450}33 ref={rawVideo}34 autoPlay35 playsInline36 />37 </div>3839 <canvas40 className="display"41 width={800}42 height={450}43 ref={processedVid}44 ></canvas>45 </div>4647 {output && (48 <canvas width={800} height={450} ref={text_canvas}>49 {output}50 </canvas>51 )}5253 <div className="buttons">54 <button className="button" onClick={startCamHandler} ref={startBtn}>55 Start Webcam56 </button>57 <button className="button" onClick={stopCamHandler} ref={closeBtn}>58 Close camera59 </button>6061 <button className="button" onClick={captureSnapshot} ref={snapBtn}>62 Capture snapshot and save63 </button>64 </div>65 </>66 )}67 {!model && <div>Loading machine learning models...</div>}68 </>69 )70}
The code above declares use state and use ref hooks constants that we use to link to the dom elements. In the DOM elements, we will have a video element for our webcam and 2 canvas, one for viewing a screenshot of the captured frame we decode the text from and one for viewing the decoded texts. We also include a useRef hook to allow us to load our models only when necessary.
Below the use effect, create a function below:
1const startCamHandler = async () => {2 console.log("Starting webcam and mic ..... ");3 localStream = await navigator.mediaDevices.getUserMedia({4 video: true,5 audio: false,6 });78 //populate video element9 rawVideo.current.srcObject = localStream;10 video_in = rawVideo.current;11 rawVideo.current.addEventListener("loadeddata", (ev) => {12 console.log("loaded data.");13 });1415 mediaRecorder = new MediaRecorder(localStream, options);16 mediaRecorder.ondataavailable = (event) => {17 console.log("data-available");18 if (event.data.size > 0) {19 recordedChunks.push(event.data);20 }21 };22 mediaRecorder.start();23 };
The code above will first request a user to activate their webcam and mic and populate the video element with the webcam camera using a media recorder. Below it adds the code below
1const stopCamHandler = async () => {2 console.log("Hanging up the call ...");3 localStream.getTracks().forEach((track) => track.stop());45 localStream.getTracks().forEach(function (track) {6 track.stop();7 });8 };
The function above will stop the local media stream when the user is finished. Proceed with the following
1const captureSnapshot = async () => {2 c_out = processedVid.current;34 c_out5 .getContext("2d")6 .drawImage(video_in, 0, 0, video_in.videoWidth, video_in.videoHeight);78 let img_url = c_out.toDataURL("image/png");910 await worker.loadLanguage("eng");11 await worker.initialize("eng");1213 // pass the image data to tessaract14 const {15 data: { text },16 } = await worker.recognize(img_url);17 console.log(text, " retrieved text");1819 setOutput(text.replace(/[^a-zA-Z ]/g, " "));2021 // uploadVideo(to_cloudinary);22 await stopCamHandler();23 };
In the code above, we reference a canvas using the useRef processedVid
and draw in the image captured by the user using a draw image method. The image is then passed to tesseract where it is decoded, cleaned and assigned to the output variable using a use state hook. The function is async because it awaits the stopCamHandler
function
At this point, we will then pass the decoded text to cloudinary through an image containing all the texts the webcam has decoded
1const uploadVideo = async (base64) => {2 console.log("uploading to backend...");3 try {4 fetch("/api/upload", {5 method: "POST",6 body: JSON.stringify({ data: base64 }),7 headers: { "Content-Type": "application/json" },8 }).then((response) => {9 console.log("successfull session", response.status);10 });11 } catch (error) {12 console.error(error);13 }14 };
That's it!. Ensure to go through the article to enjoy the experience