Skip to content
This repository was archived by the owner on Jun 6, 2024. It is now read-only.

schoenbergerb/noscrape

Repository files navigation



GitHub release License issues - noscrape Known Vulnerabilities CodeQL



Project Goal

this project should help you to prevent anyone from scraping your content




Concept

The key behind is to use any true-type font from which noscrape generates a new version with shuffled unicodes and nothing what one can use to calculate them back. Strings and Integers become obfuscated and are only readable by using the generated obfuscation-font.



What we cannot remove from inside the font are the glyph-paths. At the moment the paths are obfuscated by shifting them randomly a little bit ( @see obfuscation strength multiplier ) that makes it hard to calculate them back but not impossible or maybe "guessable" by a ML-Algorithm.
Would be nice if someone come up with a better solution or help to improve this 😅




IMPORTANT NOTE

Bots are not able to process obfuscated text or it comes to unpredictable analytics results etc.
So please beware of using this technology on relevant content for indexed pages!

Doing the whole obfuscation stuff tooks time (something around 50-60ms on my machine 😉).
This should not be problem with prerendered pages. For API-Requests, one sould consider putting obfuscation logic into a cronjob like task and use them multiple times instead of calculate everything again for every request.


Example

// server-side obfuscation
const object = { title: "noscrape", text: "obfuscation" }
const { font, value }  = obfuscate(object, 'path/to/your/font.ttf')

⬇⬇⬇⬇     provide data     â¬‡â¬‡â¬‡â¬‡


// font will be provided as buffer
const b64 = font.toString(`base64`)
<!-- client-side visualization-->


<style> 
    @font-face {        
        font-family: 'noscrape-obfuscated';        
        src: url('data:font/truetype;charset=utf-8;base64,${b64}');    
    }
</style>

...

<span style="font-family: noscrape-obfuscated">
    <div>{ value.title }</div>
    <div>{ value.text }</div>
</span>    

example-code

live demo


Options

strength

obfuscation strength multiplier ( default: 1 )
all under 0.1 makes no sense ( paths can be simply back calculated )
all over 10 makes no sense ( looks like 💩 )


characterRange

character range used for encryption
PRIVATE_USE_AREA       DEFAULT
LATIN
GREEK
CYRILLIC
HIRAGANA
KATAKANA

lowMemory

use only if you do not have a lot of memory and noscrape cannot load the given font file
DEFAULT: false



Contributions

Contributions, issues and feature requests are very welcome. If you are using this package and fixed a bug for yourself, please consider submitting a PR!




License

MIT @ Bernhard Schönberger