Extract and Transform Text from Images

Banner for a MediaJam post

Ifeoma Imoh

Manually extracting and transforming text contents from images can be a time-consuming and tedious task. Some examples include attempting to auto-tag or categorize an image based on its text content, blurring or adding an overlay to the text on an image, etc.

Cloudinary provides an OCR (Optical character recognition) text detection and extraction add-on powered by Google's vision API that integrates smoothly with its image upload and transformation capability. The add-on makes it easier to capture text elements in an image and apply some Cloudinary transformation effects to the image.

This post will demonstrate how to use the Cloudinary OCR text detection and extraction add-on. We will create a simple application to demonstrate the process of extracting detected text from an image, blurring or pixelating detected text, and adding an image over text in an image.

Here is a link to the demo CodeSandbox.

Project Setup

Create a Next.js app using the following command:

1npx create-next-app ocr-demo

Next, run this command to change into the newly created directory:

1cd ocr-demo

Now, add the project dependencies using the following command:

1npm install cloudinary axios

The Node Cloudinary SDK will provide easy-to-use methods to interact with the Cloudinary APIs, while axios will serve as the HTTP client for communicating with our serverless functions.

Run this command to preview the running application:

1npm run dev

Setting up Cloudinary

To use Cloudinary's provisioned services, you need to first sign up for a free Cloudinary account if you don’t have one already. Displayed on your account’s Management Console (aka Dashboard) are important details: your cloud name, API key, etc.

Next, let’s create environment variables to hold the details of our Cloudinary account. Create a new file called .env at the root of your project and add the following to it:

1CLOUD_NAME = YOUR CLOUD NAME HERE
2 API_KEY = YOUR API API KEY
3 API_SECRET = YOUR API API SECRET

This will be used as a default when the project is set up on another system. To update your local environment, create a copy of the .env file using the following command:

1cp.env.env.local;

By default, this local file resides in the .gitignore folder, mitigating the security risk of inadvertently exposing secret credentials to the public. You can update the .env.local file with your Cloudinary credentials.

When we create an account, access to the Cloudinary add-ons is not provided out of the box. You need to register for the OCR Text Detection and Extraction add-on to access this feature. Each add-on provides us with several plans and their associated prices. Thankfully, most of them also offer free plans, and since this is a demo application, we’ll go with the free plan here, which gives us access to 50 monthly OCR detections.

Extracting Detected Text from an Image

Using the Cloudinary OCR text detection and extraction add-on, we can extract all detected text from an image by setting the ocr parameter in an image upload or update method call to adv_ocr or adv_ocr:document for text-heavy images.

1cloudinary.v2.uploader.upload(
2 "your-image.jpg",
3 { ocr: "adv_ocr" },
4 function (error, result) {
5 //some other code
6 }
7);

Cloudinary attaches an ocr node nested in an info section of the JSON response when the ocr parameter is set to adv_ocr or adv_ocr:document. The ocr node contains detailed information about all the returned text in the group and a breakdown of the individual text, such as the default language, the bounding box coordinates, etc.

Now let's create an API route in our Next.js application that accepts an image, extracts the image’s text content, and sends the response back to the client. To achieve that, create an extractText.js file in the pages/api folder of the project and add the following to it:

1const cloudinary = require("cloudinary").v2;
2
3cloudinary.config({
4 cloud_name: process.env.CLOUD_NAME,
5 api_key: process.env.CLD_API_KEY,
6 api_secret: process.env.CLD_API_SECRET,
7 secure: true,
8});
9
10export default async function handler(req, res) {
11 const { baseImage } = req.body;
12 try {
13 await cloudinary.uploader.upload(
14 baseImage,
15 {
16 ocr: "adv_ocr",
17 folder: "ocr-demo",
18 },
19 async function (error, result) {
20 res.status(200).json(result);
21 }
22 );
23 } catch (error) {
24 res.status(500).json(error);
25 }
26}
27
28export const config = {
29 api: {
30 bodyParser: {
31 sizeLimit: "4mb",
32 },
33 },
34};

In the code above, we import Cloudinary and configure it with an object consisting of our Cloudinary credentials. We then defined a route handler function that expects a base64 image attached to the request's body. The image will then be uploaded to a folder called ocr-demo in your Cloudinary account, and its text content will be extracted from it.

We also exported the Next.js default config object to set the default payload size limit to 4MB.

Let's create a file that will hold helper functions used to make Axios requests to our API routes. At the root level of your application, create a folder called util, and inside it, create a file called axiosReq.js. Add the following to the axiosReq.js file:

1import axios from "axios";
2
3export const extractText = async (baseImage, setStatus, setOutputData) => {
4 setStatus("loading");
5 try {
6 const extractedImage = await axios.post("/api/extractText", { baseImage });
7 const { textAnnotations, fullTextAnnotation } =
8 extractedImage.data.info.ocr.adv_ocr.data[0];
9 setOutputData({
10 type: "text",
11 data: fullTextAnnotation.text || textAnnotations[0].description,
12 });
13 setStatus("");
14 } catch (error) {
15 setStatus("error");
16 }
17};

The file exports a function called extractText that takes in an image, a function to set the loading state, and a function to set the response. The function makes an Axios call to the extractText API route and processes the response to extract the text annotations, which will then be set to state.

Next, let's use this function. Replace the content of your pages/index.js file withe the following:

1import { useState, useRef } from "react";
2import { extractText } from "../util/axiosReq";
3import styles from "../styles/Home.module.css";
4
5export default function Home() {
6 const [baseImage, setBaseImage] = useState();
7 const [outputData, setOutputData] = useState();
8 const [status, setStatus] = useState();
9 const baseFileRef = useRef();
10
11 const handleSelectImage = (e, setStateFn) => {
12 const reader = new FileReader();
13 reader.readAsDataURL(e.target.files[0]);
14 reader.onload = function (e) {
15 setStateFn(e.target.result);
16 };
17 };
18
19 const handleExtractText = async () => {
20 extractText(baseImage, setStatus, setOutputData);
21 };
22
23 const isBtnDisabled = !baseImage || status === "loading";
24
25 return (
26 <main className={styles.app}>
27 <h1>Cloudinary OCR demo App</h1>
28 <div>
29 <div className={styles.input}>
30 <div
31 className={`${styles.image} ${styles.flex}`}
32 onClick={() => baseFileRef.current.click()}
33 >
34 <input
35 type="file"
36 ref={baseFileRef}
37 style={{ display: "none" }}
38 onChange={(e) => handleSelectImage(e, setBaseImage)}
39 />
40 {baseImage ? (
41 <img src={baseImage} alt="selected image" />
42 ) : (
43 <h2>Click to select image</h2>
44 )}
45 <div>
46 <h2>Click to select image</h2>
47 </div>
48 </div>
49 <div className={styles.actions}>
50 <button onClick={handleExtractText} disabled={isBtnDisabled}>
51 Extract text
52 </button>
53 </div>
54 </div>
55 <div className={styles.output}>
56 {status ? (
57 <h4>{status}</h4>
58 ) : (
59 outputData &&
60 (outputData.type === "text" ? (
61 <div>
62 <span>{outputData.data}</span>
63 </div>
64 ) : (
65 ""
66 ))
67 )}
68 </div>
69 </div>
70 </main>
71 );
72}

In the code above, we defined three state variables to hold the image to the processed, request status state, and output data. We also created a baseFileRef ref to workaround dynamically opening the image file picker. The handleSelectImage function handles file selection changes and converts the file to its base64 equivalence.

Next, we rendered a div element that can be clicked to select and preview a file and a button that triggers the handleExtractText function when clicked.

We used the status state to set loading feedback and disable the button when required.

Before previewing the application in the browser, let's update our styles/Home.module.css file with the styles from this codeSandbox link to give our application a decent look.

Now, save the changes, and you should be able to select an image and extract text contents from the image to the screen.

Blurring Detected Text Contents

Leveraging the add-on’s text detection capability, Cloudinary allows us to integrate that with its built-in blur_region and pixelate_region image transformation effects. To blur all detected text in an image, set the effect parameter on the default image transformation method to blur_region:<blur-value and the gravity parameter to ocr_text.

The blur-value can be any value within 0 - 2000, and the higher the value, the more blurry the effect.

1cloudinary.image("your-image-public_id.jpg", {
2 effect: "blur_region:800",
3 gravity: "ocr_text",
4});

To add this feature to our demo application, create a blurImage.js file in the pages/api folder and add the following to it:

1const cloudinary = require("cloudinary").v2;
2
3cloudinary.config({
4 cloud_name: process.env.CLOUD_NAME,
5 api_key: process.env.API_KEY,
6 api_secret: process.env.API_SECRET,
7 secure: true,
8});
9
10export default async function handler(req, res) {
11 const { baseImage } = req.body;
12 try {
13 await cloudinary.uploader.upload(
14 baseImage,
15 { folder: "ocr-demo" },
16 async function (error, result) {
17 const response = await cloudinary.image(`${result.public_id}.jpg`, {
18 effect: "blur_region:800",
19 gravity: "ocr_text",
20 sign_url: true,
21 });
22 res.status(200).json(response);
23 }
24 );
25 } catch (error) {
26 res.status(500).json(error);
27 }
28}
29
30export const config = {
31 api: {
32 bodyParser: {
33 sizeLimit: "4mb",
34 },
35 },
36};

The code is similar to what we did in the extactText.js file, except that we now have to first upload the image passed from the client to Cloudinary and extract its public ID, which is then passed as a parameter to Cloudinary’s image transformation method.

We also set the gravity parameter to ocr_text and the effect parameter to blur_region with a value of 800.

We added a signature to the generated URLs by setting the sign_url parameter to true as required by Cloudinary due to the potential cost of accessing unplanned dynamic URLs that apply the OCR text detection or extraction functionality.

Now, open the util/axiosReq.js file and add the function below to the bottom of the file. The function will be used to make a request and get back some response from our api/blurImage route.

1export const blurImage = async (baseImage, setStatus, setOutputData) => {
2 setStatus("loading");
3 try {
4 const blurredImage = await axios.post("/api/blurImage", { baseImage });
5 const url = /'(.+)'/.exec(blurredImage.data);
6 setOutputData({
7 type: "imgUrl",
8 data: url[1],
9 });
10 setStatus("");
11 } catch (error) {
12 setStatus("error");
13 }
14};

The function expects the image and functions needed to set the request status and response data state. It then makes a request to the API route, extracts the blurred image URL, and sets the states accordingly.

Next, update your pages/index.js file with the following:

1import { useState, useRef } from "react";
2
3 // import blurImage
4 import { extractText, blurImage } from "../util/axiosReq";
5 import styles from "../styles/Home.module.css";
6
7 export default function Home() {
8 //...
9
10 const handleSelectImage = (e, setStateFn) => {
11 //...
12 };
13
14 };
15 const handleExtractText = async () => {
16 //...
17 };
18
19 // Add this
20 const handleBlurImage = async () => {
21 blurImage(baseImage, setStatus, setOutputData)
22 };
23
24 const isBtnDisabled = !baseImage || status === "loading";
25
26 return (
27 <main className={styles.app}>
28 <h1>Cloudinary OCR demo App</h1>
29 <div>
30 <div className={styles.input}>
31
32 //...
33
34 </div>
35 <div className={styles.actions}>
36 <button onClick={handleExtractText} disabled={isBtnDisabled}>
37 Extract text
38 </button>
39 {/* Add this */}
40 <button onClick={handleBlurImage} disabled={isBtnDisabled}>Blur text content</button>
41 </div>
42 </div>
43 <div className={styles.output}>
44 {status ? (
45 <h4>{status}</h4>
46 ) : (
47 outputData &&
48 (outputData.type === "text" ? (
49 <div>
50 <span>{outputData.data}</span>
51 </div>
52 )
53 :<img src={outputData.data} alt="" />)
54 )}
55 </div>
56 </div>
57 </main>
58 );
59 }

We updated the code by adding a new button that triggers the handleBlurImage function when clicked. The function calls the blurImage function and passes it the expected arguments. We reformatted the output div to render an image if the output data is a URL and not text. Save the changes and test the application in your browser.

Overlaying Text with Images

Instead of blurring detected text in an image, we can rather choose to add an image overlay instead. Achieving this with the add-on is similar to the default way of adding an overlay on images. The only difference is that we now have to set the gravity parameter to ocr_text as seen below.

1cloudinary.image("your-image-public_id.jpg", {
2 transformation: [
3 { overlay: "overlay-public_id" },
4 { flags: "region_relative", width: "1.1", crop: "scale" },
5 { flags: "layer_apply", gravity: "ocr_text" },
6 ],
7});

Notice how we didn’t set a fixed value for the width in the code above. Cloudinary allows us to set values relative to the width of the detected text in the image.

Create a file called addOverlay.js in your pages/api folder and add the following to it:

1const cloudinary = require("cloudinary").v2;
2
3cloudinary.config({
4 cloud_name: process.env.CLOUD_NAME,
5 api_key: process.env.API_KEY,
6 api_secret: process.env.API_SECRET,
7 secure: true,
8});
9
10export default async function handler(req, res) {
11 const { baseImage, overlay } = req.body;
12 try {
13 await cloudinary.uploader.upload(
14 baseImage,
15 { folder: "ocr-demo" },
16 async function (error, baseImageCld) {
17 await cloudinary.uploader.upload(
18 overlay,
19 { folder: "ocr-demo" },
20 async function (error, overlayImageCld) {
21 const overlayedImage = await cloudinary.image(
22 `${baseImageCld.public_id}.jpg`,
23 {
24 transformation: [
25 {
26 overlay: `${overlayImageCld.public_id}`.replace(/\//g, ":"),
27 },
28 { flags: "region_relative", width: "1.1", crop: "scale" },
29 { flags: "layer_apply", gravity: "ocr_text" },
30 ],
31 sign_url: true,
32 }
33 );
34 res.status(200).json(overlayedImage);
35 }
36 );
37 }
38 );
39 } catch (error) {
40 res.status(500).json(error);
41 }
42}
43
44export const config = {
45 api: {
46 bodyParser: {
47 sizeLimit: "4mb",
48 },
49 },
50};

In the code above, we configured Cloudinary. We define the API route handler to extract the base image and the overlay from the request body, which will then be uploaded to Cloudinary. We then use the public ID extracted from the response to transform the overlay image.

Let's create a function we can use to make a request to this newly created route. Open the util/axiosReq.js file and add the following to the bottom of the file:

1export const addOverlay = async (
2 baseImage,
3 overlay,
4 setStatus,
5 setOutputData
6) => {
7 setStatus("loading");
8 try {
9 const overlayedImage = await axios.post("/api/addOverlay", {
10 baseImage,
11 overlay,
12 });
13 const url = /'(.+)'/.exec(overlayedImage.data);
14 setOutputData({ type: "imgUrl", data: url[1] });
15 setStatus("");
16 } catch (error) {
17 setStatus("error");
18 }
19};

The function accepts an image overlay in addition to the base image and the state functions. It manages the request status state and makes the request to our API route.

To conclude this section, update your pages/index.js file with the following:

1import { useState, useRef } from "react";
2//import addOverlay
3import { extractText, blurImage, addOverlay } from "../util/axiosReq";
4import styles from "../styles/Home.module.css";
5
6export default function Home() {
7 const [baseImage, setBaseImage] = useState();
8 const [outputData, setOutputData] = useState();
9 const [status, setStatus] = useState();
10 //Add this
11 const [overlay, setOverlay] = useState();
12 const baseFileRef = useRef();
13 //Add this
14 const overlayFileRef = useRef();
15 const handleSelectImage = (e, setStateFn) => {
16 //...
17 };
18 const handleExtractText = async () => {
19 //...
20 };
21 const handleBlurImage = async () => {
22 //...
23 };
24 //Add this
25 const handleAddOverlay = async () => {
26 addOverlay(baseImage, overlay, setStatus, setOutputData);
27 };
28 const isBtnDisabled = !baseImage || status === "loading";
29
30 return (
31 <main className={styles.app}>
32 <h1>Cloudinary OCR demo App</h1>
33 <div>
34 <div className={styles.input}>
35 <div
36 className={`${styles.image} ${styles.flex}`}
37 onClick={() => baseFileRef.current.click()}
38 >
39 <input
40 type="file"
41 ref={baseFileRef}
42 style={{ display: "none" }}
43 onChange={(e) => handleSelectImage(e, setBaseImage)}
44 />
45 {baseImage ? (
46 <img src={baseImage} alt="selected image" />
47 ) : (
48 <h2>Click to select image</h2>
49 )}
50 <div>
51 <h2>Click to select image</h2>
52 </div>
53 </div>
54 <div className={styles.actions}>
55 <button onClick={handleExtractText} disabled={isBtnDisabled}>
56 Extract text
57 </button>
58 <button onClick={handleBlurImage} disabled={isBtnDisabled}>
59 Blur text content
60 </button>
61
62 {/* Add this */}
63 <button
64 onClick={handleAddOverlay}
65 disabled={!overlay || isBtnDisabled}
66 >
67 Add overlay
68 </button>
69 <div
70 className={`${styles.overlay} ${styles.flex}`}
71 onClick={() => overlayFileRef.current.click()}
72 >
73 <input
74 type="file"
75 ref={overlayFileRef}
76 onChange={(e) => handleSelectImage(e, setOverlay)}
77 style={{ display: "none" }}
78 />
79 {overlay ? (
80 <img src={overlay} alt="overlay" />
81 ) : (
82 <p>Click to select overlay</p>
83 )}
84 <div>
85 <p>Click to select overlay</p>
86 </div>
87 </div>
88 </div>
89 </div>
90
91 <div className={styles.output}>//...</div>
92 </div>
93 </main>
94 );
95}

In the updated code, we added an overlay state to hold the image selected by the user to be used as an overlay. Next, we worked around opening the image file picker by adding a ref to a hidden input element and calling its click method dynamically. We previewed the selected image to be used as an overlay image and added a button that calls our request function.

After that, you can finally overlay any text detected in a selected image with an overlay image.

Find the complete project here on GitHub.

Conclusion

So far, we've covered how we can use Cloudinary's OCR text detection and extraction add-on to extract and transform text from images. However, the add-on is not limited to just the three functionalities explained in this article. A lot more can be achieved with it. For example, it can be used for text-based image cropping, ensuring that text in an image is preserved during a crop transformation.

Resources you may find helpful:

Ifeoma Imoh

Software Developer

Ifeoma is a software developer and technical content creator in love with all things JavaScript.