【龍】甲辰年戊辰月壬子日 / 三月十一日
Thursday April 18, 2024

Optical Character Recognition (OCR)

OCR stands for Optical Character Recognition and is the conversion of scanned images (i.e. handwritten, type written, printed text) into machine encoded (digital) text.

The clearer and larger the characters are, the better the system will recognise them. The problem with Chinese characters is that every character has to matched for recognition against thousands of individual characters (compared to less then 100 latin characters), who are rather complex in structure.
Chinese punctuation like a '。' (dot) can be misread as 'o' (letter o) or '0' (zero)

Google Drive has the ability to OCR uploaded PDF's and image files in Simplified & Traditional Chinese.
Microsoft Office has a smilar feature if you have the 'Document Imaging' function installed

[ < Home ]