Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gitattributes #50

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -26,7 +26,7 @@ test-coverage: $(LINGUIST_PATH)
tail -n +2 $(COVERAGE_PROFILE) >> $(COVERAGE_REPORT); \
rm $(COVERAGE_PROFILE); \
fi; \
done;
done;

code-generate: $(LINGUIST_PATH)
mkdir -p data
35 changes: 34 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -118,7 +118,7 @@ Note that even if enry's CLI is compatible with linguist's, its main point is th
Development
------------

*enry* re-uses parts of original [linguist](https://github.com/github/linguist) especially data in `languages.yml` to generate internal data structures. In oreder to update to latest upstream run
*enry* re-uses parts of original [linguist](https://github.com/github/linguist) especially data in `languages.yml` to generate internal data structures. In order to update to latest upstream run

make clean code-generate

@@ -140,6 +140,7 @@ Using [linguist/samples](https://github.com/github/linguist/tree/master/samples)
* all files for SQL language fall to the classifier because we don't parse this [disambiguator expresion](https://github.com/github/linguist/blob/master/lib/linguist/heuristics.rb#L433) for `*.sql` files right. This expression doesn't comply with the pattern for the rest of [heuristics.rb](https://github.com/github/linguist/blob/master/lib/linguist/heuristics.rb) file.



Benchmarks
------------

@@ -172,6 +173,38 @@ to get time averages for main detection function and strategies for the whole sa
if you want see measures by sample file



.gitattributes
--------------

Like in linguist you can override the strategies via `.gitattributes` file.
Add a `.gitattributes` file to the directory and use the same matchers that you would use in linguist `linguist-documentation`,`linguist-language` or `linguist-vendored` to do the override.

#### Vendored code

Use the `linguist-vendored` attribute to vendor or un-vendor paths.

```
$ cat .gitattributes
this-is-a-vendor-directory/ linguist-vendored
this-is-not/ linguist-vendored=false
```
#### Documentation

Documentation works the same way as vendored code but using `linguist-documentation` and `linguist-documentation=false`.

#### Language assignation

If you want some files to be classified according to certain language use `linguist-language=[language]`.

```
$ cat .gitattributes
.*\.go linguist-language=MyFavouriteLanguage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the regex must be compatible with Golang regexp, does it mean that a .gitattribute file used in linguist has a different syntax? Does it make .gitattributes files for enry incompatibles with linguist ones?

```

Note that the regular expression that matches the file name should be compatible with go, see: [Golang regexp](https://golang.org/pkg/regexp/).


Why Enry?
------------

37 changes: 21 additions & 16 deletions cli/enry/main.go
Original file line number Diff line number Diff line change
@@ -29,6 +29,12 @@ func main() {
log.Fatal(err)
}

gitAttributes := enry.NewGitAttributes()
reader, err := os.Open(".gitattributes")
if err == nil {
gitAttributes.LoadGitAttributes("", reader)
}

errors := false
out := make(map[string][]string, 0)
err = filepath.Walk(root, func(path string, f os.FileInfo, err error) error {
@@ -53,8 +59,9 @@ func main() {
relativePath = relativePath + "/"
}

if enry.IsVendor(relativePath) || enry.IsDotFile(relativePath) ||
enry.IsDocumentation(relativePath) || enry.IsConfiguration(relativePath) {
if gitAttributes.IsVendor(relativePath) || enry.IsDotFile(relativePath) ||
gitAttributes.IsDocumentation(relativePath) || enry.IsConfiguration(relativePath) ||
gitAttributes.IsGenerated(path) {
if f.IsDir() {
return filepath.SkipDir
}
@@ -66,20 +73,18 @@ func main() {
return nil
}

language, ok := enry.GetLanguageByExtension(path)
if !ok {
if language, ok = enry.GetLanguageByFilename(path); !ok {
content, err := ioutil.ReadFile(path)
if err != nil {
errors = true
log.Println(err)
return nil
}

language = enry.GetLanguage(filepath.Base(path), content)
if language == enry.OtherLanguage {
return nil
}
content, err := ioutil.ReadFile(path)
if err != nil {
errors = true
log.Println(err)
return nil
}

language := gitAttributes.GetLanguage(filepath.Base(path))
if language == enry.OtherLanguage {
language = enry.GetLanguage(filepath.Base(path), content)
if language == enry.OtherLanguage {
return nil
}
}

26 changes: 26 additions & 0 deletions common.go
Original file line number Diff line number Diff line change
@@ -3,6 +3,7 @@ package enry
import (
"bufio"
"bytes"
"os"
"path/filepath"
"regexp"
"strings"
@@ -95,6 +96,12 @@ func GetLanguageByClassifier(content []byte, candidates []string) (language stri
return getLanguageByStrategy(GetLanguagesByClassifier, "", content, candidates)
}

// GetLanguageByGitattributes returns the language assigned to a file for a given regular expresion in .gitattributes.
// This strategy needs to be initialized calling LoadGitattributes
func GetLanguageByGitattributes(filename string) (language string, safe bool) {
return getLanguageByStrategy(GetLanguagesByGitAttributes, filename, nil, nil)
}

func getLanguageByStrategy(strategy Strategy, filename string, content []byte, candidates []string) (string, bool) {
languages := strategy(filename, content, candidates)
return getFirstLanguageAndSafe(languages)
@@ -407,6 +414,25 @@ func GetLanguagesBySpecificClassifier(content []byte, candidates []string, class
return classifier.Classify(content, mapCandidates)
}

// GetLanguagesByGitAttributes returns either a string slice with the language
// if the filename matches with a regExp in .gitattributes or returns an empty slice
// in case no regExp matches the filename. It complies with the signature to be a Strategy type.
func GetLanguagesByGitAttributes(filename string, content []byte, candidates []string) []string {
gitAttributes := NewGitAttributes()
reader, err := os.Open(".gitattributes")
if err != nil {
return nil
}

gitAttributes.LoadGitAttributes("", reader)
lang := gitAttributes.GetLanguage(filename)
if lang != OtherLanguage {
return []string{}
}

return []string{lang}
}

// GetLanguageExtensions returns the different extensions being used by the language.
func GetLanguageExtensions(language string) []string {
return data.ExtensionsByLanguage[language]
Loading