Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Few thousand documents written (1995 to 1999) using TamNet.ttf from the old irdu.nus.sg archives [Advise Needed] #15

Open
ashbeats opened this issue Oct 26, 2023 · 9 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@ashbeats
Copy link

ashbeats commented Oct 26, 2023

Hi,

My name is John. And I have been attempting to convert a few thousand documents and articles that were written in the TamNet.ttf bilingual font, that was released in 1995. The original authors are no longer around, and I have been attempting to find information about the keyboard mapping to write a converter or find an existing converter it to Unicode standard encodings.

Do you know of a converter? I tried your Open-Tamil lib, but it failed to recognise the text or the conversions were not fully accurate.

I understand that the encoding is also questionable as the documents were moved between various formats over the years, such as ansi. So I have preserved them from the originals, and have been inspecting it in binary and comparing it to the same text's written in the TamNet99 formats, and the Murasu formats.

The closest seems to be TamNet99, from google searches and papers, however, there may be edge cases that may elude me.

And insights or direction would be most appreciated.

Best Regards,
John

@ashbeats ashbeats changed the title Few thousand documents written using TamNet.ttf in from 1995 to 1999. [Advise Needed] Few thousand documents written (1995 to 1999) using TamNet.ttf from the old irdu.nus.sg archives [Advise Needed] Oct 26, 2023
@arcturusannamalai arcturusannamalai self-assigned this Oct 28, 2023
@arcturusannamalai
Copy link
Contributor

Hi there- Interesting topic; I'll post to my Twitter. Perhaps NUS can announce a bug bounty and have some engineers take a look.

In the past what has helped is the following:

  • provide a copy of the font you have
  • provide a copy of several sample documents
  • provide a copy of standard documents (like திருக்குறள் ) in this embedding font

I'm sure someone can crack this problem with sufficient effort and motivation.
Thanks

@arcturusannamalai arcturusannamalai added the help wanted Extra attention is needed label Oct 28, 2023
@tshrinivasan
Copy link

@ashbeats I can explore on this. Share the font ttf file and few sample documents.

@gchandra10
Copy link

HTML / JS script should help. Please try.

https://www.suratha.com/reader.htm

@ashbeats
Copy link
Author

ashbeats commented Oct 28, 2023

Hi,

Thank you for responding.

The documents just been restored to the original website:
https://kanian.com

And another archived site, holds a bit more information:
https://ccat.sas.upenn.edu/plc/tamilweb/

The fonts are available for download here:
https://ccat.sas.upenn.edu/plc/tamilweb/download.html

and GPT4 had this to add...

... The TAMNET.ttf font is based on the TAM encoding system, which stands for Tamilnet. It was developed by Mr. Naa Govindasamy, an expert in Tamil encoding, and was released in 1995 by the Institute of Research in Digital Units (IRDU) in Singapore.

TAMNET.ttf is a TrueType font that uses a unique encoding scheme to represent Tamil characters. It deviates from the traditional Tamil encoding systems like TSCII (Tamil Standard Code for Information Interchange) or TAM (Tamil Monolingual Keyboard). Instead, it introduces a new layout that is optimized for ease of use and compatibility with the ASCII character set.

In TAMNET.ttf, the Tamil characters are mapped to the traditional QWERTY keyboard layout, where each key represents one Tamil character. For example, pressing the 'a' key outputs the Tamil character 'அ', 'b' outputs 'ப', 'c' outputs 'ச', and so on. This layout made it convenient for users familiar with the English keyboard layout to type Tamil characters without the need for any additional hardware or input methods.

TAMNET.ttf gained popularity during the late 1990s and early 2000s as it provided an easy-to-use encoding system ...

@tshrinivasan
Copy link

tshrinivasan commented Oct 31, 2023 via email

@arcturusannamalai
Copy link
Contributor

thanks folks; @tshrinivasan - if you find a fix please post a PR to open-tamil also

@tshrinivasan
Copy link

tshrinivasan commented Nov 2, 2023 via email

@arcturusannamalai
Copy link
Contributor

@ashbeats - do you still need this feature ? did you make any progress ?

@ashbeats
Copy link
Author

@arcturusannamalai I do, but the project is on hold.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants