Manually extracting and transforming text contents from images can be a time-consuming and tedious task. Some examples include attempting to auto-tag or categorize an image based on its text content, blurring or adding an overlay to the text on an image, etc.
Cloudinary provides an OCR (Optical character recognition) text detection and extraction add-on powered by Google's vision API that integrates smoothly with its image upload and transformation capability. The add-on makes it easier to capture text elements in an image and apply some Cloudinary transformation effects to the image.
This post will demonstrate how to use the Cloudinary OCR text detection and extraction add-on. We will create a simple application to demonstrate the process of extracting detected text from an image, blurring or pixelating detected text, and adding an image over text in an image.
Here is a link to the demo CodeSandbox.
Project Setup
Create a Next.js app using the following command:
1npx create-next-app ocr-demo
Next, run this command to change into the newly created directory:
1cd ocr-demo
Now, add the project dependencies using the following command:
1npm install cloudinary axios
The Node Cloudinary SDK will provide easy-to-use methods to interact with the Cloudinary APIs, while axios will serve as the HTTP client for communicating with our serverless functions.
Run this command to preview the running application:
1npm run dev
Setting up Cloudinary
To use Cloudinary's provisioned services, you need to first sign up for a free Cloudinary account if you don’t have one already. Displayed on your account’s Management Console (aka Dashboard) are important details: your cloud name, API key, etc.
Next, let’s create environment variables to hold the details of our Cloudinary account. Create a new file called .env
at the root of your project and add the following to it:
1CLOUD_NAME = YOUR CLOUD NAME HERE2 API_KEY = YOUR API API KEY3 API_SECRET = YOUR API API SECRET
This will be used as a default when the project is set up on another system. To update your local environment, create a copy of the .env
file using the following command:
1cp.env.env.local;
By default, this local file resides in the .gitignore
folder, mitigating the security risk of inadvertently exposing secret credentials to the public. You can update the .env.local
file with your Cloudinary credentials.
When we create an account, access to the Cloudinary add-ons is not provided out of the box. You need to register for the OCR Text Detection and Extraction add-on to access this feature. Each add-on provides us with several plans and their associated prices. Thankfully, most of them also offer free plans, and since this is a demo application, we’ll go with the free plan here, which gives us access to 50 monthly OCR detections.
Extracting Detected Text from an Image
Using the Cloudinary OCR text detection and extraction add-on, we can extract all detected text from an image by setting the ocr
parameter in an image upload
or update
method call to adv_ocr
or adv_ocr:document
for text-heavy images.
1cloudinary.v2.uploader.upload(2 "your-image.jpg",3 { ocr: "adv_ocr" },4 function (error, result) {5 //some other code6 }7);
Cloudinary attaches an ocr node nested in an info section of the JSON response when the ocr
parameter is set to adv_ocr
or adv_ocr:document
. The ocr
node contains detailed information about all the returned text in the group and a breakdown of the individual text, such as the default language, the bounding box coordinates, etc.
Now let's create an API route in our Next.js application that accepts an image, extracts the image’s text content, and sends the response back to the client. To achieve that, create an extractText.js
file in the pages/api
folder of the project and add the following to it:
1const cloudinary = require("cloudinary").v2;23cloudinary.config({4 cloud_name: process.env.CLOUD_NAME,5 api_key: process.env.CLD_API_KEY,6 api_secret: process.env.CLD_API_SECRET,7 secure: true,8});910export default async function handler(req, res) {11 const { baseImage } = req.body;12 try {13 await cloudinary.uploader.upload(14 baseImage,15 {16 ocr: "adv_ocr",17 folder: "ocr-demo",18 },19 async function (error, result) {20 res.status(200).json(result);21 }22 );23 } catch (error) {24 res.status(500).json(error);25 }26}2728export const config = {29 api: {30 bodyParser: {31 sizeLimit: "4mb",32 },33 },34};
In the code above, we import Cloudinary and configure it with an object consisting of our Cloudinary credentials. We then defined a route handler function that expects a base64 image attached to the request's body. The image will then be uploaded to a folder called ocr-demo
in your Cloudinary account, and its text content will be extracted from it.
We also exported the Next.js default config object to set the default payload size limit to 4MB.
Let's create a file that will hold helper functions used to make Axios requests to our API routes. At the root level of your application, create a folder called util
, and inside it, create a file called axiosReq.js
. Add the following to the axiosReq.js
file:
1import axios from "axios";23export const extractText = async (baseImage, setStatus, setOutputData) => {4 setStatus("loading");5 try {6 const extractedImage = await axios.post("/api/extractText", { baseImage });7 const { textAnnotations, fullTextAnnotation } =8 extractedImage.data.info.ocr.adv_ocr.data[0];9 setOutputData({10 type: "text",11 data: fullTextAnnotation.text || textAnnotations[0].description,12 });13 setStatus("");14 } catch (error) {15 setStatus("error");16 }17};
The file exports a function called extractText
that takes in an image, a function to set the loading state, and a function to set the response. The function makes an Axios call to the extractText
API route and processes the response to extract the text annotations, which will then be set to state.
Next, let's use this function. Replace the content of your pages/index.js
file withe the following:
1import { useState, useRef } from "react";2import { extractText } from "../util/axiosReq";3import styles from "../styles/Home.module.css";45export default function Home() {6 const [baseImage, setBaseImage] = useState();7 const [outputData, setOutputData] = useState();8 const [status, setStatus] = useState();9 const baseFileRef = useRef();1011 const handleSelectImage = (e, setStateFn) => {12 const reader = new FileReader();13 reader.readAsDataURL(e.target.files[0]);14 reader.onload = function (e) {15 setStateFn(e.target.result);16 };17 };1819 const handleExtractText = async () => {20 extractText(baseImage, setStatus, setOutputData);21 };2223 const isBtnDisabled = !baseImage || status === "loading";2425 return (26 <main className={styles.app}>27 <h1>Cloudinary OCR demo App</h1>28 <div>29 <div className={styles.input}>30 <div31 className={`${styles.image} ${styles.flex}`}32 onClick={() => baseFileRef.current.click()}33 >34 <input35 type="file"36 ref={baseFileRef}37 style={{ display: "none" }}38 onChange={(e) => handleSelectImage(e, setBaseImage)}39 />40 {baseImage ? (41 <img src={baseImage} alt="selected image" />42 ) : (43 <h2>Click to select image</h2>44 )}45 <div>46 <h2>Click to select image</h2>47 </div>48 </div>49 <div className={styles.actions}>50 <button onClick={handleExtractText} disabled={isBtnDisabled}>51 Extract text52 </button>53 </div>54 </div>55 <div className={styles.output}>56 {status ? (57 <h4>{status}</h4>58 ) : (59 outputData &&60 (outputData.type === "text" ? (61 <div>62 <span>{outputData.data}</span>63 </div>64 ) : (65 ""66 ))67 )}68 </div>69 </div>70 </main>71 );72}
In the code above, we defined three state variables to hold the image to the processed, request status state, and output data. We also created a baseFileRef
ref to workaround dynamically opening the image file picker. The handleSelectImage
function handles file selection changes and converts the file to its base64 equivalence.
Next, we rendered a div
element that can be clicked to select and preview a file and a button that triggers the handleExtractText
function when clicked.
We used the status
state to set loading feedback and disable the button when required.
Before previewing the application in the browser, let's update our styles/Home.module.css
file with the styles from this codeSandbox link to give our application a decent look.
Now, save the changes, and you should be able to select an image and extract text contents from the image to the screen.
Blurring Detected Text Contents
Leveraging the add-on’s text detection capability, Cloudinary allows us to integrate that with its built-in blur_region
and pixelate_region
image transformation effects. To blur all detected text in an image, set the effect parameter on the default image transformation method to blur_region:<blur-value
and the gravity
parameter to ocr_text
.
The blur-value
can be any value within 0 - 2000, and the higher the value, the more blurry the effect.
1cloudinary.image("your-image-public_id.jpg", {2 effect: "blur_region:800",3 gravity: "ocr_text",4});
To add this feature to our demo application, create a blurImage.js
file in the pages/api
folder and add the following to it:
1const cloudinary = require("cloudinary").v2;23cloudinary.config({4 cloud_name: process.env.CLOUD_NAME,5 api_key: process.env.API_KEY,6 api_secret: process.env.API_SECRET,7 secure: true,8});910export default async function handler(req, res) {11 const { baseImage } = req.body;12 try {13 await cloudinary.uploader.upload(14 baseImage,15 { folder: "ocr-demo" },16 async function (error, result) {17 const response = await cloudinary.image(`${result.public_id}.jpg`, {18 effect: "blur_region:800",19 gravity: "ocr_text",20 sign_url: true,21 });22 res.status(200).json(response);23 }24 );25 } catch (error) {26 res.status(500).json(error);27 }28}2930export const config = {31 api: {32 bodyParser: {33 sizeLimit: "4mb",34 },35 },36};
The code is similar to what we did in the extactText.js
file, except that we now have to first upload the image passed from the client to Cloudinary and extract its public ID, which is then passed as a parameter to Cloudinary’s image transformation method.
We also set the gravity
parameter to ocr_text
and the effect
parameter to blur_region
with a value of 800.
We added a signature to the generated URLs by setting the sign_url
parameter to true
as required by Cloudinary due to the potential cost of accessing unplanned dynamic URLs that apply the OCR text detection or extraction functionality.
Now, open the util/axiosReq.js
file and add the function below to the bottom of the file. The function will be used to make a request and get back some response from our api/blurImage
route.
1export const blurImage = async (baseImage, setStatus, setOutputData) => {2 setStatus("loading");3 try {4 const blurredImage = await axios.post("/api/blurImage", { baseImage });5 const url = /'(.+)'/.exec(blurredImage.data);6 setOutputData({7 type: "imgUrl",8 data: url[1],9 });10 setStatus("");11 } catch (error) {12 setStatus("error");13 }14};
The function expects the image and functions needed to set the request status and response data state. It then makes a request to the API route, extracts the blurred image URL, and sets the states accordingly.
Next, update your pages/index.js
file with the following:
1import { useState, useRef } from "react";23 // import blurImage4 import { extractText, blurImage } from "../util/axiosReq";5 import styles from "../styles/Home.module.css";67 export default function Home() {8 //...910 const handleSelectImage = (e, setStateFn) => {11 //...12 };1314 };15 const handleExtractText = async () => {16 //...17 };1819 // Add this20 const handleBlurImage = async () => {21 blurImage(baseImage, setStatus, setOutputData)22 };2324 const isBtnDisabled = !baseImage || status === "loading";2526 return (27 <main className={styles.app}>28 <h1>Cloudinary OCR demo App</h1>29 <div>30 <div className={styles.input}>3132 //...3334 </div>35 <div className={styles.actions}>36 <button onClick={handleExtractText} disabled={isBtnDisabled}>37 Extract text38 </button>39 {/* Add this */}40 <button onClick={handleBlurImage} disabled={isBtnDisabled}>Blur text content</button>41 </div>42 </div>43 <div className={styles.output}>44 {status ? (45 <h4>{status}</h4>46 ) : (47 outputData &&48 (outputData.type === "text" ? (49 <div>50 <span>{outputData.data}</span>51 </div>52 )53 :<img src={outputData.data} alt="" />)54 )}55 </div>56 </div>57 </main>58 );59 }
We updated the code by adding a new button that triggers the handleBlurImage
function when clicked. The function calls the blurImage
function and passes it the expected arguments. We reformatted the output div to render an image if the output data is a URL and not text.
Save the changes and test the application in your browser.
Overlaying Text with Images
Instead of blurring detected text in an image, we can rather choose to add an image overlay instead. Achieving this with the add-on is similar to the default way of adding an overlay on images. The only difference is that we now have to set the gravity parameter to ocr_text
as seen below.
1cloudinary.image("your-image-public_id.jpg", {2 transformation: [3 { overlay: "overlay-public_id" },4 { flags: "region_relative", width: "1.1", crop: "scale" },5 { flags: "layer_apply", gravity: "ocr_text" },6 ],7});
Notice how we didn’t set a fixed value for the width in the code above. Cloudinary allows us to set values relative to the width of the detected text in the image.
Create a file called addOverlay.js
in your pages/api
folder and add the following to it:
1const cloudinary = require("cloudinary").v2;23cloudinary.config({4 cloud_name: process.env.CLOUD_NAME,5 api_key: process.env.API_KEY,6 api_secret: process.env.API_SECRET,7 secure: true,8});910export default async function handler(req, res) {11 const { baseImage, overlay } = req.body;12 try {13 await cloudinary.uploader.upload(14 baseImage,15 { folder: "ocr-demo" },16 async function (error, baseImageCld) {17 await cloudinary.uploader.upload(18 overlay,19 { folder: "ocr-demo" },20 async function (error, overlayImageCld) {21 const overlayedImage = await cloudinary.image(22 `${baseImageCld.public_id}.jpg`,23 {24 transformation: [25 {26 overlay: `${overlayImageCld.public_id}`.replace(/\//g, ":"),27 },28 { flags: "region_relative", width: "1.1", crop: "scale" },29 { flags: "layer_apply", gravity: "ocr_text" },30 ],31 sign_url: true,32 }33 );34 res.status(200).json(overlayedImage);35 }36 );37 }38 );39 } catch (error) {40 res.status(500).json(error);41 }42}4344export const config = {45 api: {46 bodyParser: {47 sizeLimit: "4mb",48 },49 },50};
In the code above, we configured Cloudinary. We define the API route handler to extract the base image and the overlay from the request body, which will then be uploaded to Cloudinary. We then use the public ID extracted from the response to transform the overlay image.
Let's create a function we can use to make a request to this newly created route. Open the util/axiosReq.js
file and add the following to the bottom of the file:
1export const addOverlay = async (2 baseImage,3 overlay,4 setStatus,5 setOutputData6) => {7 setStatus("loading");8 try {9 const overlayedImage = await axios.post("/api/addOverlay", {10 baseImage,11 overlay,12 });13 const url = /'(.+)'/.exec(overlayedImage.data);14 setOutputData({ type: "imgUrl", data: url[1] });15 setStatus("");16 } catch (error) {17 setStatus("error");18 }19};
The function accepts an image overlay in addition to the base image and the state functions. It manages the request status state and makes the request to our API route.
To conclude this section, update your pages/index.js
file with the following:
1import { useState, useRef } from "react";2//import addOverlay3import { extractText, blurImage, addOverlay } from "../util/axiosReq";4import styles from "../styles/Home.module.css";56export default function Home() {7 const [baseImage, setBaseImage] = useState();8 const [outputData, setOutputData] = useState();9 const [status, setStatus] = useState();10 //Add this11 const [overlay, setOverlay] = useState();12 const baseFileRef = useRef();13 //Add this14 const overlayFileRef = useRef();15 const handleSelectImage = (e, setStateFn) => {16 //...17 };18 const handleExtractText = async () => {19 //...20 };21 const handleBlurImage = async () => {22 //...23 };24 //Add this25 const handleAddOverlay = async () => {26 addOverlay(baseImage, overlay, setStatus, setOutputData);27 };28 const isBtnDisabled = !baseImage || status === "loading";2930 return (31 <main className={styles.app}>32 <h1>Cloudinary OCR demo App</h1>33 <div>34 <div className={styles.input}>35 <div36 className={`${styles.image} ${styles.flex}`}37 onClick={() => baseFileRef.current.click()}38 >39 <input40 type="file"41 ref={baseFileRef}42 style={{ display: "none" }}43 onChange={(e) => handleSelectImage(e, setBaseImage)}44 />45 {baseImage ? (46 <img src={baseImage} alt="selected image" />47 ) : (48 <h2>Click to select image</h2>49 )}50 <div>51 <h2>Click to select image</h2>52 </div>53 </div>54 <div className={styles.actions}>55 <button onClick={handleExtractText} disabled={isBtnDisabled}>56 Extract text57 </button>58 <button onClick={handleBlurImage} disabled={isBtnDisabled}>59 Blur text content60 </button>6162 {/* Add this */}63 <button64 onClick={handleAddOverlay}65 disabled={!overlay || isBtnDisabled}66 >67 Add overlay68 </button>69 <div70 className={`${styles.overlay} ${styles.flex}`}71 onClick={() => overlayFileRef.current.click()}72 >73 <input74 type="file"75 ref={overlayFileRef}76 onChange={(e) => handleSelectImage(e, setOverlay)}77 style={{ display: "none" }}78 />79 {overlay ? (80 <img src={overlay} alt="overlay" />81 ) : (82 <p>Click to select overlay</p>83 )}84 <div>85 <p>Click to select overlay</p>86 </div>87 </div>88 </div>89 </div>9091 <div className={styles.output}>//...</div>92 </div>93 </main>94 );95}
In the updated code, we added an overlay
state to hold the image selected by the user to be used as an overlay. Next, we worked around opening the image file picker by adding a ref
to a hidden input element and calling its click
method dynamically. We previewed the selected image to be used as an overlay image and added a button that calls our request function.
After that, you can finally overlay any text detected in a selected image with an overlay image.
Find the complete project here on GitHub.
Conclusion
So far, we've covered how we can use Cloudinary's OCR text detection and extraction add-on to extract and transform text from images. However, the add-on is not limited to just the three functionalities explained in this article. A lot more can be achieved with it. For example, it can be used for text-based image cropping, ensuring that text in an image is preserved during a crop transformation.
Resources you may find helpful: