-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect language detected (C++ as C, XML as TypeScript, etc.) #26
Comments
But why ? |
Is that what Onefetch currently uses? It detects C++ as C in the case of Godot, and it didn't detect anything for the repo of a Godot project (while GitHub detects GDScript). |
it only detects the languages that are currently supported by onefetch (WIP):
Also tokei ignores all commented lines which is why the language distribution sometimes differs from GH. Supported languages by tokei --> https://github.com/Aaronepower/tokei#supported-languages |
Upstream issues: XAMPPRocky/tokei#305 and XAMPPRocky/tokei#67 We can leave this closed though if you want. |
Ok, with the new title it makes more sense to keep this open. We'll wait for tokei to fix it then. Thx @aaronfranke |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue still exists, though it is likely seen by the devs as low priority, so I'll probably have to bump this again later to please the |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue still exists, though it is likely seen by the devs as low priority, so I'll probably have to bump this again later to please the |
Hi, I added the c header and cpp header to language.rs file to my fork https://github.com/Aaronepower/tokei already detects the c header and cpp header only the mapping in onefetch is missing. Here is a PR #365 |
I'm not very fund of this idea of having separate entries for header files (CHeader and C++Header). I personally prefer the GitHub Linguist approach of extending C and C++ detection scope to include their respective header files: C++:
type: programming
tm_scope: source.c++
ace_mode: c_cpp
codemirror_mode: clike
codemirror_mime_type: text/x-c++src
color: "#f34b7d"
aliases:
- cpp
extensions:
- ".cpp"
- ".c++"
- ".cc"
- ".cp"
- ".cxx"
- ".h"
- ".h++"
- ".hh"
- ".hpp"
- ".hxx"
- ".inc"
- ".inl"
- ".ino"
- ".ipp"
- ".re"
- ".tcc"
- ".tpp"
C:
type: programming
color: "#555555"
extensions:
- ".c"
- ".cats"
- ".h"
- ".idc"
interpreters:
- tcc
tm_scope: source.c
ace_mode: c_cpp
codemirror_mode: clike
codemirror_mime_type: text/x-csrc
language_id: 41 I doubt the people over at tokei would be ready to make that shift...So, either we stick to tokei's detection rules and merge @mapau's PR, or we override the logic in Onefetch or... |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue still exists, though it is seen by the devs as low priority, so I'll probably have to bump this again later to please the |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue still exists, though it is seen by the devs as low priority, so I'll probably have to bump this again later to please the |
Incorrect detection of Verilog using tokei. tokei uses file extension When considering a new approach, please consider verilog file identification as a useful test case. |
@o2sh We might want to create a |
I'd be happy to do so, but do we actually have any workaround for this? 🤔 As far as I know tokei still doesn't provide an option to allows users to override the extensions - as suggested here |
Sorry, I incorrectly assumed that tokei allowed language overrides, but I guess that's not implemented yet. Well, the only workaround that I know of is renaming all Verilog files to |
Coming back to a really old issue to document a potential solution: It might be worth creating a new crate that acts as a wrapper for tokei. This wrapper would provide its own function for getting languages, adding the following:
Also, it should probably re-export the rest of tokei's public interface to make usage easier. Such a crate should probably be in a separate repository, as I anticipate releases occurring on a very different schedule from onefetch. Additionally, this crate would probably need a lot of community support to provide the heuristics and code samples. I might attempt to do this sometime, but I can't promise that it will be soon. If someone else wants to take this on, I'll be happy to help and discuss this further. |
You mean exposing the same set of APIs? or is it gonna reuse some of tokei's code? If I'm understanding correctly, the new project will be similar to Regardless, it's definitely an intriguing challenge. If executed well, it could gain a lot of traction, especially given the current state/limitations of existing solutions 😢. I'd be happy to help 👍 |
Mostly exposing the same API. Basically a bunch of This could also be a fork of tokei. I was just thinking about "wrapping" tokei since AFAIK Also, since we're mentioning github-linguist, I should note that linguist actually analyzes the So I guess the first question is: do we want to improve tokei for our purposes, or port linguist to Rust? |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
Hey everyone following this 👋 There's been a bit of discussion here, but to keep you all up to date: I went ahead and started a project called gengo that should be more linguist-like, to hopefully improve our language detection eventually. Unlike tokei, there can be file extension collisions, and gengo will try to pick the right language using heuristics. For example, for this comment, it would need to register But right now, gengo doesn't support nearly enough languages. While I can just grab the data from linguist (and maybe I eventually will), right now I'm hoping that language support grows more organically, with discussion for each added language. So if you'd like to contribute, please do! I'll definitely need help with languages that I'm unfamiliar with, especially when it comes to adding heuristics, for example for C and C++ Edit: See spenserblack/gengo#34 |
https://github.com/github/linguist
Linguist is a tool developed by GitHub for the specific purpose of detecting languages. It's a very mature tool that gets it right the majority of the time by using complex rules.
The text was updated successfully, but these errors were encountered: