The key behind is to use any true-type font from which noscrape generates a new version with shuffled unicodes and nothing what one can use to calculate them back. Strings and Integers become obfuscated and are only readable by using the generated obfuscation-font.
What we cannot remove from inside the font are the glyph-paths. At the moment the paths are obfuscated by shifting them randomly a little bit ( @see obfuscation strength multiplier ) that makes it hard to calculate them back but not impossible or maybe "guessable" by a ML-Algorithm.
Would be nice if someone come up with a better solution or help to improve this 😅
Bots are not able to process obfuscated text or it comes to unpredictable analytics results etc.
So please beware of using this technology on relevant content for indexed pages!
Doing the whole obfuscation stuff tooks time (something around 50-60ms on my machine 😉).
This should not be problem with prerendered pages. For API-Requests, one sould consider putting obfuscation logic into a cronjob like task and use them multiple times instead of calculate everything again for every request.
// server-side obfuscation
const object = { title: "noscrape", text: "obfuscation" }
const { font, value } = obfuscate(object, 'path/to/your/font.ttf')
⬇⬇⬇⬇ provide data ⬇⬇⬇⬇
// font will be provided as buffer
const b64 = font.toString(`base64`)
<!-- client-side visualization-->
<style>
@font-face {
font-family: 'noscrape-obfuscated';
src: url('data:font/truetype;charset=utf-8;base64,${b64}');
}
</style>
...
<span style="font-family: noscrape-obfuscated">
<div>{ value.title }</div>
<div>{ value.text }</div>
</span>
character range used for encryption
Contributions, issues and feature requests are very welcome. If you are using this package and fixed a bug for yourself, please consider submitting a PR!
MIT @ Bernhard Schönberger