Skip to content
This repository was archived by the owner on Jun 6, 2024. It is now read-only.

schoenbergerb/noscrape

Repository files navigation



GitHub release License issues - noscrape Known Vulnerabilities

@noscrape/noscrape

Concept

The primary mechanism behind noscrape is the utilization of any true-type font. From this, noscrape generates a new version with shuffled unicodes, ensuring that it's impossible to reverse-calculate them. This means that both strings and integers are obfuscated and can only be deciphered using the generated obfuscation-font.

While the glyph-paths inside the font cannot be entirely removed, they are obfuscated by randomly shifting them slightly. This makes it challenging to reverse-calculate them, but it's not entirely impossible, especially for machine learning algorithms. The developers are open to suggestions for improving this aspect.

Installation

To install the @noscrape/noscrape package, simply run the following command in your project directory:

npm install @noscrape/noscrape

Basic Usage

Server-side
const { obfuscate } = require('@noscrape/noscrape');

// Sample object to obfuscate
const object = { title: "noscrape", text: "obfuscation" };

// Server-side obfuscation
const { font, value } = obfuscate(object, 'path/to/your/font.ttf');
Client-side
<style> 
    @font-face {        
        font-family: 'noscrape-obfuscated';        
        src: url('data:font/truetype;charset=utf-8;base64,${font.toString("base64")}');    
    }
</style>

The font is delivered in a buffer format. To utilize it in our web pages, we convert it into a base64 URL and embed it within a custom @font-face declaration. Once this is done, we can display the obfuscated data using the specified font-family in our styles.

<span style="font-family: noscrape-obfuscated">
    <div>{ value.title }</div>
    <div>{ value.text }</div>
</span>

IMPORTANT NOTE

Bots might not be able to process obfuscated text, which can lead to unpredictable analytics results. Therefore, it's advised not to use this technology on content that's essential for indexed pages. The obfuscation process takes some time (around 50-60ms on standard machines). For API requests, it's recommended to put the obfuscation logic into a scheduled task and reuse the results, rather than recalculating everything for every request.

Options

  • Strength (obfuscation strength multiplier): Default is 1. Values below 0.1 are not recommended as paths can be easily reverse-calculated. Values over 10 might not look visually appealing.


  • Character Range: This defines the character range used for encryption. Options include:
    • PRIVATE_USE_AREA (default)
    • LATIN
    • GREEK
    • CYRILLIC
    • HIRAGANA
    • KATAKANA

  • Low Memory: This option is for situations with limited memory where noscrape cannot load the provided font file. Default is false.

Contributions

The developers welcome contributions, issues, and feature requests. If you've used this package and fixed a bug, they encourage you to submit a PR.

License

The package is licensed under the MIT License by Bernhard Schönberger.