Friday, January 15, 2021

Tellsis language ("Nunkish") translator written in Python3

(Update: There is a version that uses Flutter for the UI and can run on Windows, Android, and Linux. See this post for more information and download location.)
 
Update 6 September 2022: I updated the script slightly as the PyPi version of google_trans_new is outdated. Instead of installing it via pip, a local copy is used instead. Also fixed the handling of names. Of course, the Flutter version (mentioned at the start of the post) is recommended since that is the one that I am more likely to work on and maintain.
 
I noticed a lot of views on my updated "Nunkish" script, but I kind of felt bad as that script was really written by someone else. I only updated it with an alternative Google Translate API library to allow it to work.

So I embarked on a "quest" to make my own version. A true bidirectional translator for the Tellsis language that can translate to and from Tellsis.(日本語ブログにもこのスクリプトについて書きました。)

And v0.1 is now ready, after a day of coding during my free time. The Tellsis language translator can be found at the Github repository here. (Update 28 August 2023: I just found official sources that state "Tellsis" and have made changes to the text used in this blog from "Tellsis" to "Tellsis" but the software will not be updated as it may impact others who have already downloaded the software.)

It works from the commandline.
$ ./telsistrans.py -t "I love Major \\Gilbert\\" -sl en
Nun posuk Gilbert ui gikapmarikon
$ ./telsistrans.py -t "Posuk \\Gilbert\\ nunki." -sl telsis
Thank you Major Gilbert.

It works in interactive mode.
$ ./telsistrans.py -i
Source language: en
Input source text: I love you
Target language:       
In Tamil script: நான் உன்னை நேசிக்கிறேன்
Pronunciation: Nāṉ uṉṉai nēcikkiṟēṉ
In unaccented characters: Nan unnai necikkiren
In target language: Nun annui noyirrikon
Source language: telsis
Input source text: Nunki posuk
Target language: ja
In Tamil script: நன்றி மேஜர்
Pronunciation:
In unaccented characters: Nanri mejar
In target language: ありがとう少佐


It works as a library.
from telsistrans import telsis_translator
translator = telsis_translator()
srctext = "I love you"
srclang = 'en'
translator.lang2telsis(srctext, srclang)
print(translator.results['tgt_text'])  # Print out results of translation
srctext = "Nunki posuk"
tgtlang = 'ja'
translator.telsis2lang(srctext, tgtlang)
print(translator.results['tgt_text'])  # Print out results of translation

 
Output of the above example:
Nun annui noyirrikon
ありがとう少佐
 
It can even output in the actual Tellsis alphabet if you supply a font file.


Other improvements include being able to use backslashes to tell the translator which are names that should not be processed by the substitution cipher. The script can handle the translation of phrases instead of individual words, but I have not tested it with full pages of text yet (may not work because of issues with handling punctuation).
 
A lot of details about how the Tellsis language (called Nunkish by fans) was decoded can be found in this Reddit post. During the production staff event at Shinjuku Picadilly Cinema, Suzuki Takaaki (who created this language) also talked about the process, but did not disclose the intermediate language used. We know that language is Tamil.

The substitution cipher is more or less what the Reddit post says. However, based on my tinkering, I have made the following changes to the mapping.
L <-> Q
J <-> S

The rest of the mapping:
A <-> U
C <-> Y 
E <-> O
G <-> V
H <-> T
K <-> R
M <-> P

The script requires Python3 to run, with the following libraries:
google_trans_new (Python library to use Google Translate)
unidecode (for converting to unaccented characters)
requests (for conversion to Tamil script)
Pillow (for rendering in Tellsis font)

Details can be found in the README.md of the repository, and explanation.md contains information about the conversion process and how the script works.

I have yet to do comprehensive tests on the translation results to make sure they are consistent with what can be found in the anime. If anyone is willing to do the testing, please report back on your findings in the comments here, or file an issue in the Github repository. Punctuation also seems to cause erratic behaviour, I am not sure why, but this could be due to the difference in punctuation between English and Tamil. Finally, my dream is to create a GUI for this using Tk, and maybe even an Android app using Kivy. But don't have high hopes... I really hate GUI programming so the GUI and Android app may never happen.
 
Please feel free to leave feedback in the comments. But please be civil and forgiving, this was, after all, a work which I put together in a couple of hours.

By the way, my review of the 2020 Violet Evergarden movie (VIOLET EVERGARDEN the Movie) can be found here.
 
Update January 20, 2021: I made a simple GUI using the PySimpleGUI framework.

I tried a Kivy version too, but the default theme is a bit dark and I still haven't figured out how to change the theme, so it will be shelved for a while.
 
 
 
I also created my own font file for the Tellsis language because I do not have the rights to distribute the font files that I found. Instead of trying to contact the authors to seek permission for distribution, I decided to learn how to create my own fonts using FontForge and managed to come up with something. It is VERY rudimentary but it serves the purpose of displaying the output in the Tellsis alphabet. After placing the font file in ~/.fonts directory, execute
sudo fc-cache -fv
to refresh the font cache.

Update 25 January 2021: I added a simple video to demonstrate use of the GUI.

Update 14 March 2021: I am working on converting the commandline version to Dart, with a future GUI in Flutter. So far, the commandline version in Dart seems to be working. I also managed to solve the issue with use of commas. However, the Dart commandline version won't be able to display the results in Tellsis font as Dart does not have a package like Python's PIL. Therefore, displaying results in Tellsis font will have to go through Flutter.

Update 30 March 2021: After trying to learn Dart and Flutter, and a lot of trial and error in getting the Flutter layout and such, I have a working app that can run in Linux and in an Android emulator. Multiple sentences work as long as everything is enclosed within double quotation marks.




3 comments:

Jamie said...

I just stumbled across this whilst looking for images of the stamps used in the series, I'd never have imagined something like this to have existed, its honestly so cool thank you!

Anonymous said...

Can you make it offline?

Teck said...

As the language conversion requires translating to and from Tamil, if you can recommend an offline dictionary that can translate to and from Tamil to other languages and is free for distribution, I can take a look to see how it can be used instead of Google Translate.