Wednesday, July 28, 2021

Improving the Tellsis language translator: Update 3

Following the previous post, this is to consolidate my thoughts on the way ahead for the app, in terms of OCR.

So far, I have tried to train a Tesseract OCR model using my Tellsis font, but the result is that it can only work on computer generated images (using the Tellsis font) but will fail terribly on real world data (such as actual images from the Violet Evergarden series or handwritten). Given that there isn't a lot of such real world data to use for training data in the first place, it is going to be difficult to pursue this route.

Therefore, I thought of going back to basics. Basically, train a handwriting recognition model using Tensorflow, then deploy it to the Flutter app using TensorflowLite. There is already such a package available for deployment called tflite_flutter.

A quick search using Google revealed the following:
I am looking at this more for deployment as well as generic Tensorflow handwriting recognition training.
 
Similarly, I am looking at this more for deployment as well as generic Tensorflow handwriting recognition training.

Github repo Handwriting Recognition System that uses lines of handwritten text as training data. This looks promising if I can figure out the deployment aspect.

Github repo Siamese-Networks-for-One-Shot-Learning that uses data of single characters for training. This one looks promising for my use case.

I can probably rewrite the Python script version of the Tellsis translator to generate single line or single character output using my Tellsis font. The current Flutter version of the Tellsis translator can also be used to generate sample text, which I can then write out by hand and photograph or scan to use as training data. This solves the issue with training data.

Next is to find the time to do so... 😅
 
Anyone wants to do Tensorflow training on my behalf? 🥰

No comments: