«Arabic OCR»: الفرق بين المراجعتين
من ويكي عربآيز
(←Software) |
ط (←Software) |
||
سطر 25: | سطر 25: | ||
* [http://www.novodynamics.com NovoDynamics VERUS] - Focuses on high-performance OCR and image enhancement for Arabic-based scripts, including Arabic, Persian, Pashto, Urdu. |
* [http://www.novodynamics.com NovoDynamics VERUS] - Focuses on high-performance OCR and image enhancement for Arabic-based scripts, including Arabic, Persian, Pashto, Urdu. |
||
− | '''FOSS''' "no Arabic support" |
+ | '''FOSS''' "no Arabic support yet" |
*[http://sourceforge.net/projects/tesseract-ocr Tesseract] is an open source OCR, initially developed by HP, and released under the Apache License. |
*[http://sourceforge.net/projects/tesseract-ocr Tesseract] is an open source OCR, initially developed by HP, and released under the Apache License. |
||
*[http://oocr.sourceforge.net OOCR] OOCR is an OCR program still in development, under the GPL. |
*[http://oocr.sourceforge.net OOCR] OOCR is an OCR program still in development, under the GPL. |
نسخة 16:12، 20 يناير 2007
محتويات
Optical Character Recognition
OCR is the ability to scan a document (or grab a PDF file) and run an OCR program on it and it will generate, based on optical recognition and approximation, an editable text file. For an idea about OCR see http://www.students.cs.uu.nl/people/mjkammer/Work/intro_2_OCR.html
Current Status of Arabic OCR software
I (MuhammadAlkarouri) know of no actually working Arabic OCR software that is open source. Any additions are certainly welcome.
Resources
Arabic OCR Links
Papers
- Automatic Recognition Using Zernike Moments As A Feature Extractor (Paper)
- Graph Based Segmentation .. (Paper)
- Structural Features Of Cursive Arabic Scripts (Paper)
- Multilingual Machine Printed OCR (Paper)
- Test of two Arabic OCR programs
- Performance Evaluation of two Arabic OCR products
Software
- Readiris - Supports Arabic and Persian
- NovoDynamics VERUS - Focuses on high-performance OCR and image enhancement for Arabic-based scripts, including Arabic, Persian, Pashto, Urdu.
FOSS "no Arabic support yet"
- Tesseract is an open source OCR, initially developed by HP, and released under the Apache License.
- OOCR OOCR is an OCR program still in development, under the GPL.
- GOCR - included in Debian and other distributions.
- GNU Ocrad "is an OCR [...] program based on a feature extraction method".
Other Links
- How to encode image produced by a recognition system (mailing thread) http://lists.arabeyes.org/archives/general/2002/March/msg00001.html
- Rapidly Retargetable Translingual Detection http://tides.umiacs.umd.edu/description.html
- Sibawayhi Project http://www.hf.uio.no/east/sibawayhi/HomePage/