-
-
Notifications
You must be signed in to change notification settings - Fork 277
Search API (v2)
Noah Santacruz edited this page Dec 5, 2023
·
9 revisions
Simpler search API than Search API (v1) which exposes the main search functionality available on the site.
Your POST should include the Content-Type: application/json; charset=utf-8
header.
In the POST body, you can include the following keys:
Parameter | Type | Description |
---|---|---|
query |
str (required) |
Your search query |
type |
one_of('text', 'sheet') (required) |
The index you want to query. See Search API (v2)#Index Types for an explanation of each index |
field |
str (default=exact ) |
The field you want to query. Common fields to query are exact or naive_lemmatizer for the text and merged indexes. For querying the sheets index, commonly you'll query the content field |
source_proj |
bool, str or list(str) (default=false ) |
By default, the ElasticSearch document is not returned. Specifying True will return the entire document. Specifying a str or list(str) will perform a projection on the document for the specified fields |
slop |
int (default=0 ) |
The maximum distance between each query word in the resulting document. 0 means an exact match must be found |
start |
int (default=0 ) |
For paginating results. The number document to start returning. 0 means start at the first result |
size |
int (default=100 ) |
For paginating results. The total number of results to return, starting from start
|
filters |
list(str) (default=[] ) |
A list of filters to filter results. These filters cannot include RegEx. Any RegEx characters will be escaped. Each filter is applied to the corresponding field in the filter_fields list. E.g. if filters is ["Passover", "Torah Talks"] and filter_fields is ["topics_en", "collections"] then the "Passover" filter will be applied to the "topics_en" field and the "Torah Talks" filter will be applied to the "collections" field. For text queries, filters always applies to the path field of documents. This essentially corresponds to the category path of the book in Sefaria's table of contents (there are some differences with regards to commentary paths). For sheet queries, filters can be applied to collections , topics_en or topics_he . These fields are explained in filter_fields below. |
filter_fields |
list(str) (default=[] required if filters is specified) |
Must be the same length as filters . Each entry specifies the field to apply the corresponding filter in filters . For queries of type text this has no effect since there's only one field to filter text queries on (path . this field is explained in filters above). For sheet queries, the following fields can appear in filter_fields : collections (corresponds to the collections that the sheet is in), topics_en (corresponds to the topics for this sheet, translated into English), topics_he (corresponds to the topics for this sheet, translated into Hebrew). |
aggs |
list(str) (default=[] ) |
List of fields to aggregate on. Common fields are path for the text type and group or topics for the sheet type |
sort_method |
one_of('sort', 'score') (default=sort ) |
How to sort results. If sort , the values are sorted according to sort_fields . If score , the value in sort_fields is multiplied with the default ElasticSearch score. |
sort_fields | list(str) |
List of fields to sort on. If sort_method = 'score' this list should have exactly one item. Common fields to sort on are comp_date , order , pagesheetrank , dateCreated , views
|
sort_reverse |
bool (default=False ) |
Whether or not to reverse the sort applied on sort_fields
|
sort_score_missing |
float (default=0 ) |
The number used in case there is a value missing in your sort_field
|
POST /api/search-wrapper
{
"query": "Moshe",
"type": "text"
}
In cURL:
curl -X POST "https://www.sefaria.org/api/search-wrapper" -d '{"query": "Moshe","type": "text"}'
POST /api/search-wrapper
{
"query": "Moshe",
"type": "sheet",
"field": "content" // NOTE: must specify field as 'content' when querying sheets
}
In cURL:
curl -X POST "https://www.sefaria.org/api/search-wrapper" -d '{"query": "Moshe","type": "sheet", "field": "content"}'
Results can have a separation of maximum 10 words between search terms. Also, search terms can have prefixes and can be spelled in מלא/חסר. Inexact search only works for Hebrew queries.
POST /api/search-wrapper
{
"query": "משה רבנו",
"type": "text",
"field": "naive_lemmatizer",
"slop": 10 // Maximum distance b/w search terms is 10 words
}
In cURL:
curl -X POST "https://www.sefaria.org/api/search-wrapper" -d '{"query": "משה רבנו","type": "text", "field": "naive_lemmatizer", "slop": 10}'
POST /api/search-wrapper
{
"query": "Moshe",
"type": "text",
"filters": ["Talmud/Bavli/Berakhot", "Midrash"],
"filter_fields": ["path", "path"]
}
In cURL:
curl -X POST "https://www.sefaria.org/api/search-wrapper" -d '{"query": "Moshe","type": "text", "filters": ["Talmud/Bavli/Seder Zeraim/Berakhot", "Midrash"], "filter_fields": ["path", "path"]}'
The API returns results in the standard ElasticSearch format. See Search API (v1) for a brief explanation