Handle Quick AI Text Detection in Images in Next.js

Divine Orji

Artificial intelligence (AI) has dramatically improved over the years. Before there was widespread access to AI technology, if you wanted to copy text from another document or image, you had to manually type or write it, which could be very stressful and time-consuming. Now it is possible to automate such a task with Optical Character Recognition (OCR).

Optical Character Recognition is the ability to detect and convert text from digital images and documents into machine-readable text. A standard OCR software typically uses AI to enhance its capabilities by detecting language, writing style, and context.

Cloudinary - a cloud-based media management service that enables you to store, transform, manipulate and deliver your images and videos - also provides an OCR add-on for text detection and extraction.

This article will teach you how to implement Cloudinary’s OCR add-on in your Next.js project to detect and extract text from images.

CodeSandbox & GitHub Repo

1<CodeSandbox title="AI text detection" id="inspiring-mountain-41wj8y" />

Click the link below to view a complete demo of this article on CodeSandbox:

https://codesandbox.io/embed/inspiring-mountain-41wj8y?fontsize=14&hidenavigation=1&theme=dark

To view the source code on GitHub, click here.

Pre-requisites

To understand the concepts in this article, you will need the following:

  • Experience with JavaScript and React
  • Yarn installed on your PC, you will need Node.js, which comes with NPM
  • A Cloudinary account, create a free one here
  • Knowledge of Next.js will be good, but it is not strictly required

Set up the project

Open your terminal, navigate to your preferred directory and run the command below to quickly set up the project:

1yarn create next-app nextjs-ocr-demo -e https://github.com/dpkreativ/nextjs-ocr-starter

The command above will create a new Next.js project named nextjs-ocr-demo, download the starter files for this demo from GitHub and install its dependencies.

After its successful installation, open the project in your preferred code editor and run the command below in its terminal:

1yarn dev

You can now view the project on your browser at localhost:3000.

Register Cloudinary’s OCR add-on

Navigate to your Cloudinary dashboard in your browser and click on the “Add-ons” tab, then scroll down till you locate “OCR Text Detection and Extraction”:

Click on the card, and click on “Free” to get 50 monthly OCR detections for free:

Install Cloudinary’s SDK

In your project’s terminal, run the command below to install Cloudinary’s SDK, which you will use to implement the OCR add-on:

1yarn add cloudinary

Set up environment variables

In the root folder of your project, create a .env.local file and add the following code to it:

1CLOUDINARY_NAME=<YOUR CLOUDINARY CLOUD NAME COMES HERE>
2CLOUDINARY_KEY=<YOUR CLOUDINARY API KEY COMES HERE>
3CLOUDINARY_SECRET=<YOUR CLOUDINARY API SECRET COMES HERE>

You will copy your Cloudinary “Cloud Name”, “API Key”, and “API Secret” from your Cloudinary dashboard and paste them into the appropriate parts of this file. Navigate to your Cloudinary dashboard to get the data you need:

When you’ve pasted it in, your .env.local file will look like this:

I blurred out my keys for this tutorial.

In your /pages/api/ folder, create a cloudinaryApi.js file and write the code below:

1import cloudinary from 'cloudinary';
2
3cloudinary.config({
4 cloud_name: process.env.CLOUDINARY_NAME,
5 api_key: process.env.CLOUDINARY_KEY,
6 api_secret: process.env.CLOUDINARY_SECRET,
7});

Here you imported cloudinary's module and set its configuration.

Write API logic to handle text detection

Update your pages/api/cloudinaryApi.js file with the code below:

1import cloudinary from 'cloudinary';
2
3cloudinary.config({
4 cloud_name: process.env.CLOUDINARY_NAME,
5 api_key: process.env.CLOUDINARY_KEY,
6 api_secret: process.env.CLOUDINARY_SECRET,
7});
8
9export default (request, response) => {
10 const image = request.body;
11
12 return cloudinary.v2.uploader.upload(
13 image, { ocr: 'adv_ocr' }, (error, result) => {
14 if (error) return response.status(500).json({ error });
15
16 const { textAnnotations } = result.info.ocr.adv_ocr.data[0];
17
18 const extractedText = textAnnotations
19 .map((anno, i) => i > 0 && anno.description.replace(/[^0-9a-z]/gi, ''))
20 .filter((entry) => typeof entry === 'string')
21 .join(' ');
22
23 return response.status(200).json({ data: extractedText });
24 }
25 );
26};

Let’s break this code into bits to understand it:

1export default (request, response) => {
2 const image = request.body;
3};

Here you created and exported an arrow function that takes in two parameters:

  • The request parameter contains the API request’s body from the client-side, which you stored in an image variable
  • Use the response parameter to return the result after the API has processed the data given to it
1return cloudinary.v2.uploader.upload(
2 image, {ocr: adv_ocr}, (error, result) => {}
3);

Your arrow function returns an upload method from Cloudinary’s uploader, which takes in three parameters:

  • The image parameter contains the data from your image variable. It will be uploaded to Cloudinary for the OCR add-on to process
  • ocr parameter is set to adv_ocr, which detects text on images. To detect text on documents like PDFs, set it to adv_ocr:document
  • Callback function to handle the data after Cloudinary’s OCR has worked on your image. It takes in two parameters: error (to set up an error message if the API request fails) and result, which contains a JSON object with all the details of the image and its extracted text
1if (error) return response.status(500).json({ error });

Here you’re returning a response with a status code of 500 and a JSON object with the error message.

1const { textAnnotations } = result.info.ocr.adv_ocr.data[0];

The result JSON object contains an ocr node under the info section.

  • Used object destructuring to access textAnnotations present in adv_ocr.data[0] (the first element in Cloudinary’s adv_ocr engine’s data array)
  • textAnnotations is an array of objects that contains a description key nested in its first object. This description contains the extracted text you need.
1const extractedText = textAnnotations.map(
2 (anno, i) => i > 0 && anno.description.replace(/[^0-9a-z]/gi, '')
3).filter((entry) => typeof entry === 'string').join(' ');

Here:

  • Mapped through textAnnotations, replacing all elements that are not 0-9 or a-z with an empty string
  • Filtered through your resulting array, selecting only string elements
  • Joined the array elements to form a string of words separated by spaces and stored it in the extractedText variable
1return response.status(200).json({ data: extractedText });

Finally, you returned a response with a status code of 200 to show that it was successful and a JSON object containing the extractedText variable as a data value.

Update the handleSubmit function

In your pages/index.js file, locate and update the handleSubmit function with the code below:

1// OnSubmit function
2 const handleOnSubmit = async (e) => {
3 e.preventDefault();
4 setLoading(true);
5
6 try {
7 const { data } = await fetch('/api/cloudinaryApi', {
8 method: 'POST',
9 body: imageSrc,
10 }).then((response) => response.json());
11
12 console.log(data);
13 setExtractedText(data);
14 setLoading(false);
15
16 } catch (error) {
17 console.log(error);
18 setLoading(false);
19 }
20 };

In the code above:

  • Used setLoading(true) to trigger the loading state of your Detect text button
  • Implemented a try-catch block to handle your API calls and any errors you might encounter on the client-side
  • In the try section:
    • Used the fetch method to send your image data URL to your API, and object destructuring to get data from response.json()
    • setExtractedText(data) updates the state of extractedText with the text from data
    • Use setLoading(false) to turn off the loading state of your Detect text button
  • Logged your error to the console in the catch section and set the loading state to false

When you test your demo app in your browser, you will see something similar to this:

Conclusion

This article taught you how to detect and extract text from images using Cloudinary’s OCR add-on. For an in-depth knowledge of how the OCR add-on works, please look at its official documentation.

Resources

Divine Orji

Software Engineer and Technical Writer

I am a software engineer passionate about building fast, scalable apps with beautiful user interfaces.