Skip to content
This repository has been archived by the owner on Jun 10, 2022. It is now read-only.

Latest commit

 

History

History
78 lines (58 loc) · 3.22 KB

README.md

File metadata and controls

78 lines (58 loc) · 3.22 KB

FOSSlim

FOSSlim stands for Free Open Source Software LIcense Matcher and it matches the text of the OSS license with SPDX id, but user can easily change & update training data with additional EULAs and license text;

It is designed to be modular and to provide many low-level high-speed utilities which libraries written in high-level languages like Ruby & Javascript could benefit; Which means you could take advantage of various models implemented here, but they alone are not enough to provide a response with high-confidence. This task is left for the RubyGem & NPM packages, which are cleaning up a raw-text and combining results from multiple models to increase the confidence of the match result;

It is still under active development, but it will be released as

  1. Rust library ( milestone.1, milestone.3 )
  2. RoR gem with example API ( milestone.2 ) - LicenseMatcher gem
  3. sample RoR application using the GEM - Fosslim.com

... TBD = release time unknown: priority depends on interests from community 4. NodeJS library with example AWS lambda function, TBD 5. Rust Microservice, TBD 6. commandline tool to scan files, TBD

Models

  • NaiveTF - uses simple WordBag model and ranks results by Jaccard similarity
  • FingerNgram - splits text into overlapping Ngrams and hashes selected NGrams for fingerprint;

... in near future

  • TF/IDF models with Cosine similarity
  • Okapi25 model
  • Winnowing model
  • Simple probabilistic ML models ~ Naive Bayes, HMM, ...?

Usage

use fosslim::index;
use fosslim::document::Document;
use fosslim::naive_tf; // Simple wordbag model with Jaccard similarity
...
let idx_file_path = "data/index.msgpack"; // it is pre-built index from SPDX data, includes ~300 licenses
let mit_txt = r#"
Permission is hereby granted, free of charge, to any person obtaining a copy of this software \
and associated documentation files (the "Software"), to deal in the Software without restriction,\
including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense,\
and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so,\
subject to the following conditions:\
"#;

let doc1 = Document::new(0, "mit".to_string(), mit_txt.to_string());


// matching document with SPDX label
if let Ok(idx) = index::load(idx_file_path) {
    let mdl = naive_tf::from_index(&idx);
    
    mdl::match_document(&doc1);
}
...

check tests folder for more usage examples;

And yes, you can build your own index with index::build_from_path() function; you just have to use same file structure the JSON files in the data/licenses folder;

Current alternatives

here are some of alternatives you could use already now: