«Arabic OCR»: الفرق بين المراجعتين

من ويكي عربآيز
اذهب إلى: تصفح، ابحث
 
ط
 
(16 مراجعة متوسطة بواسطة 4 مستخدمين غير معروضة)
سطر 1: سطر 1:
= Optical Character Recognition =
+
=Optical Character Recognition=
 
OCR is the ability to scan a document (or grab a PDF file) and run an OCR program on it and it will generate, based on optical recognition and approximation, an editable text file.
 
OCR is the ability to scan a document (or grab a PDF file) and run an OCR program on it and it will generate, based on optical recognition and approximation, an editable text file.
 
For an idea about OCR see http://www.students.cs.uu.nl/people/mjkammer/Work/intro_2_OCR.html
 
For an idea about OCR see http://www.students.cs.uu.nl/people/mjkammer/Work/intro_2_OCR.html
   
= Current Status of Arabic OCR software =
+
= Current Status of Open Source Arabic OCR software =
  +
The only FOSS OCR system with Arabic support is Tesseract, help is needed in testing and training it.
I (MuhammadAlkarouri) know of no actually working Arabic OCR software that is open source. Any additions are certainly welcome.
 
   
 
= Resources =
 
= Resources =
   
  +
* [http://en.wikipedia.org/wiki/Optical_character_recognition OCR from Wikipedia]
* List of Linux OCR applications: http://www.linux-ocr.ekitap.gen.tr/
 
   
 
== Arabic OCR Links ==
 
== Arabic OCR Links ==
  +
===Papers===
* Automatic Recognition Using Zernike Moments As A Feature Extractor (Paper) http://www.ici.ro/ici/revista/sic2001_3/art4.html
+
* [http://www.ici.ro/ici/revista/sic2001_3/art4.html Automatic Recognition Using Zernike Moments As A Feature Extractor (Paper)]
* Graph Based Segmentation .. (Paper) http://ceri.kacst.edu.sa/webpage/software_a_3.htm
+
* [http://ceri.kacst.edu.sa/webpage/software_a_3.htm Graph Based Segmentation .. (Paper)]
* Structural Features Of Cursive Arabic Scripts (Paper) http://www.bmva.ac.uk/bmvc/1999/papers/42.pdf
 
  +
* [http://www.bmva.ac.uk/bmvc/1999/papers/42.pdf Structural Features Of Cursive Arabic Scripts (Paper)]
* Multilingual Machine Printed OCR (Paper) http://portal.acm.org/citation.cfm?id=505744&dl=ACM&coll=GUIDE
+
* [http://portal.acm.org/citation.cfm?id=505744&dl=ACM&coll=GUIDE Multilingual Machine Printed OCR (Paper)]
* Test of two Arabic OCR programs http://www.hf.uib.no/smi/ksv/arabocr.html
 
  +
* [http://www.hf.uib.no/smi/ksv/arabocr.html Test of two Arabic OCR programs]
* Performance Evaluation of two Arabic OCR products http://www.ai.mit.edu/~gremio/publications/Kanungo-etal-AIPR98.pdf
+
* [http://www.ai.mit.edu/~gremio/publications/Kanungo-etal-AIPR98.pdf Performance Evaluation of two Arabic OCR products]
  +
  +
===Software (FOSS)===
  +
*[http://code.google.com/p/tesseract-ocr/ Tesseract] is an open source OCR, initially developed by HP, and released under the Apache License. ''3.x versions has Arabic support''.
  +
*[http://jocr.sourceforge.net/ GOCR] - included in Debian and other distributions. ''No Arabic support''.
  +
*[http://www.gnu.org/software/ocrad/ocrad.html GNU Ocrad] "is an OCR [...] program based on a feature extraction method". ''No Arabic support''.
  +
 
== Other Links ==
 
== Other Links ==
* Software from SA http://ceri.kacst.edu.sa/webpage/software_a_3.htm
 
 
* How to encode image produced by a recognition system (mailing thread) http://lists.arabeyes.org/archives/general/2002/March/msg00001.html
 
* How to encode image produced by a recognition system (mailing thread) http://lists.arabeyes.org/archives/general/2002/March/msg00001.html
 
* Rapidly Retargetable Translingual Detection http://tides.umiacs.umd.edu/description.html
 
* Rapidly Retargetable Translingual Detection http://tides.umiacs.umd.edu/description.html
* Sibawayhi Project http://www.hf.uio.no/east/sibawayhi/HomePage/
 

المراجعة الحالية بتاريخ 02:02، 26 يناير 2017

Optical Character Recognition

OCR is the ability to scan a document (or grab a PDF file) and run an OCR program on it and it will generate, based on optical recognition and approximation, an editable text file. For an idea about OCR see http://www.students.cs.uu.nl/people/mjkammer/Work/intro_2_OCR.html

Current Status of Open Source Arabic OCR software

The only FOSS OCR system with Arabic support is Tesseract, help is needed in testing and training it.

Resources

Arabic OCR Links

Papers

Software (FOSS)

  • Tesseract is an open source OCR, initially developed by HP, and released under the Apache License. 3.x versions has Arabic support.
  • GOCR - included in Debian and other distributions. No Arabic support.
  • GNU Ocrad "is an OCR [...] program based on a feature extraction method". No Arabic support.

Other Links