Prismatic Interest Graph API

What does this do?
How do I use the service?
How stable is the service?
What are all of the supported API endpoints?
I think the system made a mistake, where can I report it?
Do you have the topic I care about?
What aspects do you currently model?
You don’t currently model my interest. Where can I submit a request for you to model a new interest?
My question is not listed here.

##What does this do?

###Topic Tagging This service automatically analyzes the content of a document or piece of text and reports the interests present in the article. An interest is a non-hierarchical, single-phrase summary of the thematic content of a piece of text; examples include Functional Programming, Celebrity Gossip, or Flowers. At Prismatic, we’ve been using interests to automatically analyze the content of text in order to help connect people with the content they find interesting. Our interest graph can automatically analyze a piece of text and determine which interests it is about.

###Topic Similarity The service provides an endpoint for returning the set of topics that are similar to a given query topic.

###Aspect Tagging This service automatically analyzes the content of a webpage, analyzes the DOM, and reports the aspects, which describe the structure or function of the webpage.

###Feeds API The service provides recent, high-quality documents from all over the web for a given query (which can include both topics and aspects), including extracted metadata for each URL. The Interest Graph Explorer includes an interactive demo for the Feeds API.

###What's new? We are working hard to continually extend and improve the functionality of the Interest Graph API. Stay up-to-date by reading the change log.

##How do I use the service?

Step 1: Acquire an access token

Head over to http://interest-graph.getprismatic.com, enter your email address, and some additional info about how you plan to use the service, and we will email you an API access token for our free tier.

Our free tier offers a limited number of calls to each API. For more details about the free tier and our paid plans, please visit our Developer page.

Step 2: Make a query

Once you have your access token, you can try tagging a URL or piece of text via our web interface. Click the link in the email you received with your token to find an interface where you can explore the API and make queries.

You can also make requests programmatically. For example, if we want to run the tagging service on the Wikipedia article about Machine Learning, we can curl the service:

curl -H "X-API-TOKEN: <API-TOKEN>" 'http://interest-graph.getprismatic.com/url/topic' --data 'url=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FMachine_learning'

where the <API-TOKEN> is a stand-in for the access token string.

Step 3: Interpret the response

The response comes in the form of a JSON map, with a key topics that has a list of topic tags. Each topic tag has a numeric id of the topic in the system, a human-readable topic name topic, and a score. The score is a real value between 0 and 1, and represents the degree to which a significant part of article is about the corresponding topic.

As a Schema:

{:topics [{:id long
           :topic String
           :score Num}]}

##How stable is the service?

We are committed to offering a stable, robust, and reliable API for our customers. The change log documents important changes to the API, and will include a clear migration path when breaking changes are required.

Free tier rate limits are subject to change; please see our
Developer page for the latest information.

##What are all of the supported API endpoints?

The Interest Graph API swagger documentation lists all of the supported endpoints, their descriptions, and input/output specifications.

We have a number of endpoints that can analyze a piece of text or URL and return the aspects or topics. We also have a feeds endpoint that returns a feed of recent documents about a given aspect and/or topics.

You will need an access token in order to programmatically access the API. Passing the token is done in the X-API-TOKEN header. If for some reason you have trouble passing headers, you can alternatively pass it in a query parameter ?api-token=<API-TOKEN>. Omitting the token from both the query parameter and header will result in a 401 status code from the server.

Requests are rate limited based on your service package. Please see our Developer page for the latest information, or contact [email protected] with questions about rate limits.

##I think the system made a mistake, where can I report it?

Our approach to topic modeling is inherently data-driven, and as with all data-driven models, it is subject to some noise. It is impossible to have 100% precision and recall on all queries. There are some articles that might be mis-tagged with incorrect interests, and some articles whose content reflects a particular topic that our models fail to detect. On the whole, these models do a good job, but errors are inevitable. We will record all reported errors in order to feed them back into our training pipeline to ensure it improves over time. To report an error, visit our Topic Classification Error Reporting Page.

##Do you have the topic I care about?

We have over 5k modeled interests, and while we try to model the most popular interests that are applicable over a wide range of applications, we do not currently model everything. To check whether your topic is currently modeled, visit our Topic Search Page. Although we strongly encourage exploring the set of available topics via search -- it will return results even if there is no substring match -- the full list of topics is also available.

##What aspects do you currently model?

The Aspect Hierarchy organizes the web into a taxonomy of classes. It is structured from general to specific, where each class (e.g. Article) can be further refined into subclasses based on more specific attributes (e.g. News vs. Interview).

Each oval represents a class of webpages, and each diamond is an attribute that further partitions the webpages of its parent into mutually exclusive subclasses. For example, every webpage has exactly one type (e.g. Image, Article, Commerce, or Other), and every Article is further classified into a single content type. Therefore, a webpage can’t be both an Event and a Review because it can’t have type both Commerce an Article, but it can be both an Event and Risque.

Currently, there are two top-level classifications: type and flag_nsfw:

The type attribute partitions webpages into mutually exclusive sets of content types according to the primary focus of the webpage.

Content Type	Primary Focus of Webpage	Example URL
Image	image	example
Article	textual content	example
Audio	audiofile such as song, podcast	example
Video	video	example
Commerce	offer a product or other entity	example

The Article class is further refined according to the primary focus of the content of the text.

Type of Content	Primary Focus of Content	Example URL
Review	review of a product, piece of media, or app	example
News	story about a recent or significant event	example
Recipe	instructions for preparing a dish	example
Deal	timely savings on product or service, but not a direct page where the product can be purchased	example
Interview	content presented in a question and answer format	example
Listicle	content presented in a numbered or bulleted list	example

Each webpage in the Commerce class is partitioned based on the product that is offered.

Entity Offered	Description of Entity	Example URL
Product	a tangible item for purchase	example
Job	a paid position of employment	example
Event	tickets for purchase to a show, concert, or other event	example

Each of the preceding subdivisions also contain the subclass Other that is applied to all webpages that do not fall into one of the aforementioned sets.

The top-level flag_nsfw attribute partitions webpages into those that are safe for work and those that are not. Those that are not safe for work are divided into Porn, Softcore, and Risque. Porn applies to content that contains nudity published by the sex industry. Softcore pertains to articles that are not Porn but whose primary focus is imagery that objectifies people in sexual ways. Risque is for content that is sexually suggestive, but not covered by the first two categories. Note: at the moment, content classification for NSFW aspects is determined solely based on the text and metadata of the web page -- not the imagery.

You can also use the /aspects endpoint to programmatically list the set of all currently supported aspects, in the format expected by the /doc/search endpoint.

##You don’t currently model my interest. Where can I submit a request for you to model a new interest?

Currently, the set of interests is fixed. Given our resources, we are limited in how many interests we can reliably model. While we do plan to expand the set of modeled interests, we will prioritize which interests we add based on aggregate demand. If you would like to submit a request to model new topic, please visit our Interest Submission Page.

My question is not listed here.

If there is a question or issue that you don't see addressed here, please email us at [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
bin		bin
clients		clients
images		images
CHANGELOG.md		CHANGELOG.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prismatic Interest Graph API

Table of Contents

Step 1: Acquire an access token

Step 2: Make a query

Step 3: Interpret the response

My question is not listed here.

About

Releases

Packages

Languages

cpmaynard/interest-graph

Folders and files

Latest commit

History

Repository files navigation

Prismatic Interest Graph API

Table of Contents

Step 1: Acquire an access token

Step 2: Make a query

Step 3: Interpret the response

My question is not listed here.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages