Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic language detection #14

Open
markusressel opened this issue Jan 22, 2019 · 4 comments
Open

Automatic language detection #14

markusressel opened this issue Jan 22, 2019 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@markusressel
Copy link
Owner

Is your feature request related to a problem? Please describe.
Currently the dev has to know what syntax highlighter to use for a given text.

Describe the solution you'd like
The KodeEditor (or a layer in between) should be able to detect what language is most likely used and apply syntax highlighting automatically. This behaviour should be optional so that the dev can still force a specific language if desired.

@markusressel markusressel self-assigned this Jan 22, 2019
@markusressel
Copy link
Owner Author

markusressel commented Jan 22, 2019

Using something like this would be an option, although the trained models are pretty big (approx. 150 MB):
https://github.com/aliostad/deep-learning-lang-detection

Integrating this seems to be relatively easy:
https://medium.com/capital-one-tech/using-a-pre-trained-tensorflow-model-on-android-part-2-153ebdd4c465

GitHub
Deep Learning using Keras to detect programming language of a file or snippet - aliostad/deep-learning-lang-detection
Medium
In Part 1, I introduced you to the TensorFlowInferenceInterface and the org.tensorflow:tensorflow-android dependency. Together they provide an easy way to embed pre-trained TensorFlow models in your…

@markusressel markusressel transferred this issue from markusressel/KodeEditor Jun 27, 2020
@markusressel
Copy link
Owner Author

A more naive approach could be to simply count the number of role matches for all available rule books and use the one with the highest count.

@markusressel
Copy link
Owner Author

It would also be nice to inlude common file extensions in the rule book, to detect the language simply based on the file name.

Both detection variants should be usable independently.

@markusressel markusressel added the enhancement New feature or request label Apr 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant