Ollama-OCR for High-Precision OCR with Ollama-JS Tutorial-php.cn

Ollama-OCR for High-Precision OCR with Ollama

Linda Hamilton

Release： 2024-11-27 08:16:09

Original

318 people have browsed it

Llama 3.2-Vision is a multimodal large language model available in 11B and 90B sizes, capable of processing both text and image inputs to generate text outputs. The model excels in visual recognition, image reasoning, image description, and answering image-related questions, outperforming existing open-source and closed-source multimodal models across multiple industry benchmarks.

Llama 3.2-Vision Examples

Handwriting

Ollama-OCR for High-Precision OCR with Ollama

Optical Character Recognition (OCR)

Ollama-OCR for High-Precision OCR with Ollama

In this article I will describe how to call the Llama 3.2-Vision 11B modeling service run by Ollama and implement image text recognition (OCR) functionality using Ollama-OCR.

Features of Ollama-OCR

? High accuracy text recognition using Llama 3.2-Vision model
? Preserves original text formatting and structure
?️ Supports multiple image formats: JPG, JPEG, PNG
⚡️ Customizable recognition prompts and models
? Markdown output format option
? Robust error handling

Installing Ollama

Before you can start using Llama 3.2-Vision, you need to install Ollama, a platform that supports running multimodal models locally. Follow the steps below to install it:

Download Ollama: Visit the official Ollama website to download the installation package for your operating system.
Install Ollama: Follow the prompts to complete the installation according to the downloaded installation package.

Install Llama 3.2-Vision 11B

After installing Ollama, you can install the Llama 3.2-Vision 11B model with the following command:

ollama run llama3.2-vision

Copy after login

How to use Ollama-OCR

npm install ollama-ocr
# or using pnpm
pnpm add ollama-ocr

Copy after login

OCR

Code

import { ollamaOCR, DEFAULT_OCR_SYSTEM_PROMPT } from "ollama-ocr";

async function runOCR() {
  const text = await ollamaOCR({
    filePath: "./handwriting.jpg",
    systemPrompt: DEFAULT_OCR_SYSTEM_PROMPT,
  });
  console.log(text);
}

Copy after login

Input Image:

Ollama-OCR for High-Precision OCR with Ollama

Output：
The Llama 3.2-Vision collection of multimodal large language models (LLMs) is a collection of instruction-tuned image reasoning generative models in 118 and 908 sizes (text images in / text out). The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. The models outperform many of the available open source and closed multimodal models on common industry benchmarks.

2. Markdown Output

import { ollamaOCR, DEFAULT_MARKDOWN_SYSTEM_PROMPT } from "ollama-ocr";

async function runOCR() {
  const text = await ollamaOCR({
    filePath: "./trader-joes-receipt.jpg",
    systemPrompt: DEFAULT_MARKDOWN_SYSTEM_PROMPT,
  });
  console.log(text);
}

Copy after login

Input Image：

Ollama-OCR for High-Precision OCR with Ollama

Output:

Ollama-OCR for High-Precision OCR with Ollama

ollama-ocr is using a local vision model, if you want to use the online Llama 3.2-Vision model, try the llama-ocr library.

The above is the detailed content of Ollama-OCR for High-Precision OCR with Ollama. For more information, please follow other related articles on the PHP Chinese website!