«Arabic OCR»: الفرق بين المراجعتين
من ويكي عربآيز
(←Software) |
ط |
||
(5 مراجعات متوسطة بواسطة مستخدمين اثنين آخرين غير معروضة) | |||
سطر 1: | سطر 1: | ||
− | <div class=english> |
||
=Optical Character Recognition= |
=Optical Character Recognition= |
||
OCR is the ability to scan a document (or grab a PDF file) and run an OCR program on it and it will generate, based on optical recognition and approximation, an editable text file. |
OCR is the ability to scan a document (or grab a PDF file) and run an OCR program on it and it will generate, based on optical recognition and approximation, an editable text file. |
||
سطر 5: | سطر 4: | ||
= Current Status of Open Source Arabic OCR software = |
= Current Status of Open Source Arabic OCR software = |
||
+ | The only FOSS OCR system with Arabic support is Tesseract, help is needed in testing and training it. |
||
− | Actually (2007-08-27) The principal GPL OCR active utilities (OCRAD, GOCR, OCRE) doesn't support arabic printed text recognition. [http://directory.fsf.org/claraocr.html ClaraOCR] project seems to be inactive, but its documentation promises to work well with any horitzontal-writing language. |
||
− | |||
− | ==Siragi-OCR== |
||
− | [http://siragi.sourceforge.net/ SIRAGI] is an open source software designed to help blind and partially sighted people working with their computer. Visually impaired people can use this program to "listen" the content of their screen under windows or Linux/KDE. The main advantage of using SIRAGI is the support of arabic language for braille language and for speech synthesis in arabic. |
||
− | |||
− | As a part of This, Siragi's developer started devoloping a FOSS OCR that should support Arabic. |
||
− | |||
− | [http://www.arabeyes.org/project.php?proj=Siragi Siragi's Arabeyes Page] |
||
= Resources = |
= Resources = |
||
− | * [http://www.linux-ocr.ekitap.gen.tr/ List of Linux OCR applications] |
||
* [http://en.wikipedia.org/wiki/Optical_character_recognition OCR from Wikipedia] |
* [http://en.wikipedia.org/wiki/Optical_character_recognition OCR from Wikipedia] |
||
سطر 28: | سطر 19: | ||
* [http://www.ai.mit.edu/~gremio/publications/Kanungo-etal-AIPR98.pdf Performance Evaluation of two Arabic OCR products] |
* [http://www.ai.mit.edu/~gremio/publications/Kanungo-etal-AIPR98.pdf Performance Evaluation of two Arabic OCR products] |
||
− | ===Software=== |
+ | ===Software (FOSS)=== |
⚫ | |||
− | * [http://www.abbyy.com/ ABBYY] - ABBYY provides various solutions to turn scans, PDFs and digital photographs into searchable and editable documents. Unmatched recognition accuracy and conversion capabilities virtually eliminates retyping and reformatting. Intuitive use and one-click automated tasks let you do more in fewer steps. ABBYY can recognize Arabic and 188 other languages. ABBYY's Arabic recognition accuracy is the best available. ABBYY's core product is [http://finereader.abbyy.com/arabic_ocr_software/ ABBYY FineReader 11] |
||
⚫ | |||
− | * [http://www.irislink.com/c2-532/OCR-Software---Product-list.aspx Readiris] - Supports Arabic and Persian |
||
⚫ | |||
− | * [http://www.novodynamics.com NovoDynamics VERUS] - High-performance Optical Character Recognition and image enhancement for Arabic-based scripts, including Farsi, Pashto, Urdu and Arabic OCR. |
||
− | |||
− | '''FOSS''' "no Arabic support yet" |
||
⚫ | |||
− | *[http://oocr.sourceforge.net OOCR] OOCR is an OCR program still in development, under the GPL. |
||
⚫ | |||
⚫ | |||
== Other Links == |
== Other Links == |
||
− | <!-- |
||
− | * Software from SA http://ceri.kacst.edu.sa/webpage/software_a_3.htm |
||
− | --> |
||
* How to encode image produced by a recognition system (mailing thread) http://lists.arabeyes.org/archives/general/2002/March/msg00001.html |
* How to encode image produced by a recognition system (mailing thread) http://lists.arabeyes.org/archives/general/2002/March/msg00001.html |
||
* Rapidly Retargetable Translingual Detection http://tides.umiacs.umd.edu/description.html |
* Rapidly Retargetable Translingual Detection http://tides.umiacs.umd.edu/description.html |
||
− | * Sibawayhi Project http://www.hf.uio.no/east/sibawayhi/HomePage/ |
المراجعة الحالية بتاريخ 02:02، 26 يناير 2017
محتويات
Optical Character Recognition
OCR is the ability to scan a document (or grab a PDF file) and run an OCR program on it and it will generate, based on optical recognition and approximation, an editable text file. For an idea about OCR see http://www.students.cs.uu.nl/people/mjkammer/Work/intro_2_OCR.html
Current Status of Open Source Arabic OCR software
The only FOSS OCR system with Arabic support is Tesseract, help is needed in testing and training it.
Resources
Arabic OCR Links
Papers
- Automatic Recognition Using Zernike Moments As A Feature Extractor (Paper)
- Graph Based Segmentation .. (Paper)
- Structural Features Of Cursive Arabic Scripts (Paper)
- Multilingual Machine Printed OCR (Paper)
- Test of two Arabic OCR programs
- Performance Evaluation of two Arabic OCR products
Software (FOSS)
- Tesseract is an open source OCR, initially developed by HP, and released under the Apache License. 3.x versions has Arabic support.
- GOCR - included in Debian and other distributions. No Arabic support.
- GNU Ocrad "is an OCR [...] program based on a feature extraction method". No Arabic support.
Other Links
- How to encode image produced by a recognition system (mailing thread) http://lists.arabeyes.org/archives/general/2002/March/msg00001.html
- Rapidly Retargetable Translingual Detection http://tides.umiacs.umd.edu/description.html