Arabic OCR

من ويكي عربآيز
نسخة 16:12، 20 يناير 2007 للمستخدم Hosny (ناقش | مساهمات) (Software)
اذهب إلى: تصفح، ابحث

Optical Character Recognition

OCR is the ability to scan a document (or grab a PDF file) and run an OCR program on it and it will generate, based on optical recognition and approximation, an editable text file. For an idea about OCR see http://www.students.cs.uu.nl/people/mjkammer/Work/intro_2_OCR.html

Current Status of Arabic OCR software

I (MuhammadAlkarouri) know of no actually working Arabic OCR software that is open source. Any additions are certainly welcome.

Resources

Arabic OCR Links

Papers

Software

  • Readiris - Supports Arabic and Persian
  • NovoDynamics VERUS - Focuses on high-performance OCR and image enhancement for Arabic-based scripts, including Arabic, Persian, Pashto, Urdu.

FOSS "no Arabic support yet"

  • Tesseract is an open source OCR, initially developed by HP, and released under the Apache License.
  • OOCR OOCR is an OCR program still in development, under the GPL.
  • GOCR - included in Debian and other distributions.
  • GNU Ocrad "is an OCR [...] program based on a feature extraction method".

Other Links