Skip to content

Latest commit

 

History

History

docs

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

SCP Data API

Welcome to the SCP Data API!

This is a static data dump of the SCP Wiki, broken down by article type. The data is crawled and updated on a daily basis.

There are two ways to use this data-

  1. Downloaded directly from the links below.
  2. Pulled in from this Github Repository.

Universal fields

  • history fields contain an ordered list of objects, each containing-
    • author - the display name of the author of the revision.
    • author_url - a link to the author's wikidot page.
    • comment - the comment for the revision. This is often blank, but the automated messages can be interesting.
    • date - the date time of the commit in the format "2020-02-20T19:10:00".
  • link fields are the primary keys for the data, generated by using the path fragment of the URL.
  • page_id - the Wikidot ID for the page itself. This can be used to hit the WikiDot API to pull additional information about the page.
  • created_at - the creation date of the article in the format "2020-02-20T19:10:00".
  • created_by - WikiDot Username of the author of the first commit.
  • url - a direct link to the crawled page.

Content and Metadata

Each SCP Item and Tale contains both Metadata about the article as well as the content of the article itself. This project splits those into separate files so that the metadata can be used directly without needing to download the entire wiki.

The content is also available for those who want it. Content is additional broken up into two types. For each page the #page-content content is stored as raw_content alongside the raw_source wikitext that generated it. This contains the story itself but excludes navigation elements, ads, and header data. Other than running it through a simple beautifier (beautifulsoup) no changes have been made.

If an item has a content_file field then that file is where you can get the content from. Otherwise the raw_content and raw_source fields will contain the article contents.

SCP Main Wiki Data

Hub

The SCP Wiki Hubs group articles together based on theme, canon, subject matter, or just whim. The Hub Dataset contains all of the Hubs with a list of the articles (Items, GOI, and Tales) that are part of that Hub.

The Hub data is relatively small so it all exists in a single file. It is formatted as an object where the key is the link and the value an object with these fields (in addition to the universal field)-

  • title - The user friendly name of the hub.
  • references - a list of link strings representing articles and tales that are in this hub.
  • tags - the tags for the hub page.
  • raw_content

Items

The SCP Items are perhaps the most well known part of the wiki. As such the Item Dataset is the largest dataset available.

This file contains the metadata for all of the SCP Items. It contains an object with the link as the key and the item data as the value.

In addition to the universal fields it contains-

  • content_file - the file, relative to the index.json file, that contains the content for the article.
  • references - a list of link strings representing articles and tales that are in this hub.
  • tags - the tags for the hub page.
  • title - The user friendly name of the hub.
  • hubs - a list of link strings for all of the hubs the item is in.
  • images - a list of URLs for the images on each page.
  • rating - the rating of the article based off of Wikidot votes.
  • scp - the full SCP label.
  • scp_number - the SCP number (SCP-682 would be 682).
  • series - the SCP Series that Item is part of. Includes the numbered series as well as joke, archive, and other categories.

Content Index File - data/scp/items/index.json

The content index file is a key value pair object where the key is the name of the series and the value is a filename containing the content for that series. The filename is relative to the content_index.json file itself.

The content files themselves are identical to the items in the Metadata File above with the exception that the content_file field is replaced with the raw_content and raw_source fields.

Tales

Tales are short stories in the SCP universe. This datasets contain all of the tales that are not part of the GOI dataset.

Tale Index File - data/scp/items/index.json

  • content_file - the file, relative to the index.json file, that contains the content for the article.
  • created_at - "2020-02-20T19:10:00"
  • created_by - WikiDot Username of the page creator (which is normally the author).
  • link - this is the same as the Key.
  • page_id - the WikiDot ID for the page.
  • references - a list of link strings representing articles and tales that are in this hub.
  • tags - the tags for the hub page.
  • title - The user friendly name of the hub.
  • url - a direct url to the hub page.
  • hubs - a list of link strings for all of the hubs the item is in.
  • images - a list of URLs for the images on each page.
  • rating - the rating of the article based off of Wikidot votes.
  • history

Content Index File - data/scp/items/index.json

The content index file is a key value pair object where the key is the year the article was created and the value is a filename containing the content for that series. The filename is relative to the content_index.json file itself.

The content files themselves are identical to the items in the Metadata File above with the exception that the content_file field is replaced with the raw_content and raw_source fields.

GOI

GOI, or "Groups of Interest", articles are typically created in special formats to match the GOI they are portraying. This dataset contains all of the GOI articles- the specific formats used can be inferred from the tags.

GOI Metadata File - data/scp/goi/index.json

The GOI structure is the same as the Tale structure.

The GOI items are small enough to fit in a single file. It is identical to the metadata file with the exception that the content_file field is replaced with the raw_content and raw_source fields.

<script defer data-domain="scp-data.tedivm.com" src="https://plausible.io/js/plausible.js"></script>

Licensing

This project is not affiliated with the SCP Wiki or any of its admins.

All content from the wiki is subject to the license of the wiki.