Skip to content

Generate an interactive heatmap matrix based on text reuse data

Notifications You must be signed in to change notification settings

broadwell/ReuseMapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ReuseMapper

Generate an interactive heatmap matrix based on text reuse data

Requirements: Uses built-in Python 2.7 modules, with the exception of slugify, which must be installed separately.

The Plotly.js Javascript library renders the heatmaps, and is included in the repository along with its various dependencies (D3.js, jQuery, etc.).

How to run

Running ReuseMapper.py (see below for usage) generates the requisite output files (itext_sim.txt, bin_labels.txt, and the files in the binsJSON/ folder) to populate the interactive heatmap, which can be viewed by opening itextmap.html in a web browser, provided it's in a web-accessible folder (e.g., http://localhost/~yourname/ReuseMapper/itextmap.html) or is being served via a simple server like node.js's simple-server module.

Note: Changing the --bins parameter can have a major effect on the resulting heatmap (see screenshots below).

  usage: ReuseMapper.py [-h] [--version] [--bins BINS] [--url URL] [--no_cache]
                        [--get_texts]

  optional arguments:
    -h, --help   show this help message and exit
    --version    show program's version number and exit
    --bins BINS  The number of "bins" on each dimension of the matrix. Affects
                 the resolution of the heatmap. Default=1000.
    --url URL    The URL (with port number) of the running Intertextualitet
                 service. Default=http://localhost:8080/
    --no_cache   Do not use the previous results from the API that are stored
                 locally. Will re-download all data. Default behavior is to use
                 the cache.
    --get_texts  Use to download the full text of each document. Can take a long
                 time. Default behavior is not to download the texts.

ReuseMapper.py also generates a summary of "commonly used phrases" in the output file itext_phrases.txt.

MergeMatrices.py

This script can be used to merge two matrix files, for example itext_sim.txt produced by ReuseHeatmapper.py and a dist_sim.txt output file from TextHeatmapper, for example, provided they have the same dimensions. It produces a merged matrix file, imerged_sim.txt that can be used with the Plotly.js viewer (see for example Screenshot 3), provided the link in the source code of itextmap.js is changed from "itext_sim.txt" to "imerged_sim.txt".

Screenshots

Screenshot 1 (100 bins per axis)

Screenshot 1

Screenshot 2 (1,000 bins per axis, zoomed in)

Screenshot 2

Screenshot 3 (3,555 bins per axis, merged with an n-gram cosine similarity matrix)

Screenshot 3

About

Generate an interactive heatmap matrix based on text reuse data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published