Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.gitattributes linguist-language (language detection override) isn't taken into account #352

Open
ell1e opened this issue Oct 7, 2022 · 8 comments
Labels
enhancement New feature or request

Comments

@ell1e
Copy link

ell1e commented Oct 7, 2022

Describe the bug
A .gitattributes file's linguist-language entries seem to be ignored. These are language detection overrides based on specified files or file patterns, for use e.g. when a language in the repository is often detected wrong or is unlikely to be known by any tools looking at it. This might for example be the case if it's a small, domain-specific language made just for that specific project.

To Reproduce

  1. Clone or create a repo that uses .gitattributes to override something, here is an example line you can put into .gitattributes in your repo root for testing:
    *.mylang linguist-language=MyLang
    
    Then make sure any non-empty stuff.mylang file is present.
  2. Run scc on the repository.

Expected behavior
With above steps to reproduce MyLang should show up in the listing, or whatever override is specified. At least when I tried it, this wasn't the case.

Desktop (please complete the following information):

  • OS: Fedora Linux
  • OS Version 26
  • SCC Version: 3.1.0
@boyter boyter closed this as completed Oct 10, 2022
@boyter boyter reopened this Oct 10, 2022
@boyter
Copy link
Owner

boyter commented Oct 10, 2022

Dammit wrong issue.

@boyter boyter added the enhancement New feature or request label Oct 10, 2022
@spenserblack
Copy link
Contributor

IMO supporting linguist-language could be troublesome, because then SCC would have to follow Linguist's language data more closely. Some languages might be in SCC but not Linguist, some might be in Linguist but not SCC, they may have different names, etc.

However, I think the linguist-generated and linguist-vendored boolean attributes are good candidates for SCC.

@ell1e
Copy link
Author

ell1e commented Oct 1, 2024

Would you really need to, for it to be an improvement? If you just took the name from linguist-language, I think it would already be more correct in many cases. I can't speak on linguist-generated except that it just doesn't seem to provide what I suggested, and it wouldn't correct the incorrect stats on the repos that I work on while linguist-language would.

@spenserblack
Copy link
Contributor

I'm thinking, for example, how linguist has TSX as a separate language, while SCC treats TSX and TypeScript as the same language.

So if a user had something like this:

*.ts linguist-language=TSX

Then that's an unknown language for SCC. So it raises the question of what SCC should do in that case. Should SCC also split TSX into a separate language? Should TSX be an alias of TypeScript in SCC?

And as another example, SCC considers Docker Ignore to be its own language, while Linguist puts it with other Ignore List files. So if I try to tell GitHub/Linguist that my generic RubyOnRails.dockerignore is an ignore file, SCC stats would lose specificity by treating it as a generic ignore file.


My comment about linguist-generated may not be appropriate for this exact issue, but I was thinking in terms of a more general "SCC can use linguist-* attributes" idea.

@ell1e
Copy link
Author

ell1e commented Oct 1, 2024

Oh I see, interesting. I admit I didn't realize there were so many differences, and I get that you're not looking forward to maintaining a mapping table at all times to untangle this. Maybe it could be an option for scc to process linguist-language which defaults to off?

@boyter
Copy link
Owner

boyter commented Oct 1, 2024

If this were to be done I would be inclined to have scc use its defaults but defer to linguist ones where they are known, falling back to the defaults (with a warning) if thats the case.

@spenserblack
Copy link
Contributor

I get that you're not looking forward to maintaining a mapping table at all times to untangle this.

Oh, I'm just a random user browsing the issues and offering my thoughts, I don't represent the SCC project or its maintainers 😅 Sorry if I made it sound that way.

@boyter
Copy link
Owner

boyter commented Oct 1, 2024

I get that you're not looking forward to maintaining a mapping table at all times to untangle this.

Oh, I'm just a random user browsing the issues and offering my thoughts, I don't represent the SCC project or its maintainers 😅 Sorry if I made it sound that way.

No I like hearing others ideas.

Generally though I opt for the path of least surprise. What I have described is how I would approach it, based on how I would expect it to work. If someone else has different expectations I want to hear it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants