Clean up OCR

সম্পাদনা

Hi Sam, I am so glad that you are working on the Google OCR issue, which is helping our community a lot. Your Google OCR javascript works fine in Bengali Wikisource. Is it possible to add the Clean up OCR feature to this script by default, to get cleaned-up texts as output. -- বোধিসত্ত্ব (আলাপ) ০৯:১৯, ৭ সেপ্টেম্বর ২০১৬ (ইউটিসি)উত্তর দিন

Terrific idea! Is there any situation in which one would not want to run that on the returned text? I can't think of any. I'll add the cleanup script in, and if anyone down the track figures out a reason it's not good, we can revisit it. :-) Does the progress-indication thing work okay for you? I'm not quite happy with it, but perhaps it's good enough. Samwilson (আলাপ) ১০:১৫, ৭ সেপ্টেম্বর ২০১৬ (ইউটিসি)উত্তর দিন
Only with the poems, one may not want to remove trailing spaces at the end of each line, but in all other cases, we do need the feature, so its better to add the cleanup script. Also, the progress-indication thing is ok and serves the purpose but maybe if you can add percentage of progression, it would look cool. -- বোধিসত্ত্ব (আলাপ) ১৩:৫৪, ৭ সেপ্টেম্বর ২০১৬ (ইউটিসি)উত্তর দিন
See comments at https://phabricator.wikimedia.org/T142770#2620108. Kaldari (আলাপ) ১৮:২০, ৮ সেপ্টেম্বর ২০১৬ (ইউটিসি)উত্তর দিন