diff --git a/LICENSE b/LICENSE index a16a937..e3e2233 100644 --- a/LICENSE +++ b/LICENSE @@ -1,6 +1,6 @@ MIT License -Copyright (c) 2018-2021 SerpApi +Copyright (c) 2018-2022 SerpApi Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal diff --git a/README.md b/README.md index 05b9df1..91136da 100644 --- a/README.md +++ b/README.md @@ -3,442 +3,378 @@ [![Package](https://badge.fury.io/py/google-search-results.svg)](https://badge.fury.io/py/google-search-results) [![Build](https://github.com/serpapi/google-search-results-python/actions/workflows/python-package.yml/badge.svg)](https://github.com/serpapi/google-search-results-python/actions/workflows/python-package.yml) -This Python package is meant to scrape and parse search results from Google, Bing, Baidu, Yandex, Yahoo, Home Depot, eBay and more, using [SerpApi](https://serpapi.com). -The following services are provided: -- [Search API](https://serpapi.com/search-api) -- [Search Archive API](https://serpapi.com/search-archive-api) -- [Account API](https://serpapi.com/account-api) -- [Location API](https://serpapi.com/locations-api) (Google Only) +## About -SerpApi provides a [script builder](https://serpapi.com/demo) to get you started quickly. +[`google-search-results`](https://pypi.org/project/google-search-results) is a SerpApi API wrapper for Python that meant to [scrape](https://en.wikipedia.org/wiki/Web_scraping) search results from Google, Bing, Baidu, Yahoo and [10+ more search engines](#supported-engines) with a [SerpApi](https://serpapi.com) backend. SerpApi provides a [Playground](https://serpapi.com/playground) to get you started quickly by testing API interactively. -## Installation +Find SerpApi documentation at: https://serpapi.com/search-api + +Find SerpApi PyPi package at: https://pypi.org/project/google-search-results + +
+Table of Contents + +- [About](#about) +- [Installation](#installation---python-37) +- [Quick Start](#quick-start) +- [Search Google Images](#search-google-images) +- [Search Google Images with Pagination](#search-google-images-with-pagination) +- [Supported Engines](#supported-engines) +- [Google Search API Capability](#google-search-api-capability) +- [How to set SerpApi key](#how-to-set-serpapi-key) +- [Example by Specification](#example-by-specification) +- [Extra APIs](#extra-apis) +- [Generic search with SerpApiClient](#generic-search-with-serpapiclient) +- [Google Search By Location](#google-search-by-location) +- [Batch Asynchronous Searches](#batch-asynchronous-searches) +- [Python object as a result](#python-object-as-a-result) +- [Pagination using iterator](#pagination-using-iterator) +- [Error management](#error-management) +- [Changelog](#change-log) +
+ +## Installation - Python 3.7+ -Python 3.7+ ```bash pip install google-search-results ``` -[Link to the python package page](https://pypi.org/project/google-search-results/) +## Quick Start + +[Open example on Replit](https://replit.com/@serpapi/google-search-results-quick-start#main.py) (online IDE). -## Quick start +The following example runs a search for `"coffee"` using your secret API key which you can find at [SerpApi Dashboard](https://serpapi.com/manage-api-key) page. ```python from serpapi import GoogleSearch + search = GoogleSearch({ "q": "coffee", "location": "Austin,Texas", - "api_key": "" - }) + "api_key": "" +}) + result = search.get_dict() ``` -This example runs a search for "coffee" using your secret API key. - -The SerpApi service (backend) -- Searches Google using the search: q = "coffee" -- Parses the messy HTML responses -- Returns a standardized JSON response -The GoogleSearch class -- Formats the request -- Executes a GET http request against SerpApi service -- Parses the JSON response into a dictionary - -Et voilà... - -Alternatively, you can search: -- Bing using BingSearch class -- Baidu using BaiduSearch class -- Yahoo using YahooSearch class -- DuckDuckGo using DuckDuckGoSearch class -- eBay using EbaySearch class -- Yandex using YandexSearch class -- HomeDepot using HomeDepotSearch class -- GoogleScholar using GoogleScholarSearch class -- Youtube using YoutubeSearch class -- Walmart using WalmartSearch -- Apple App Store using AppleAppStoreSearch class -- Naver using NaverSearch class - - -See the [playground to generate your code.](https://serpapi.com/playground) - -## Summary -- [Google Search Results in Python](#google-search-results-in-python) - - [Installation](#installation) - - [Quick start](#quick-start) - - [Summary](#summary) - - [Google Search API capability](#google-search-api-capability) - - [How to set SERP API key](#how-to-set-serp-api-key) - - [Example by specification](#example-by-specification) - - [Location API](#location-api) - - [Search Archive API](#search-archive-api) - - [Account API](#account-api) - - [Search Bing](#search-bing) - - [Search Baidu](#search-baidu) - - [Search Yandex](#search-yandex) - - [Search Yahoo](#search-yahoo) - - [Search Ebay](#search-ebay) - - [Search Home depot](#search-home-depot) - - [Search Youtube](#search-youtube) - - [Search Google Scholar](#search-google-scholar) - - [Generic search with SerpApiClient](#generic-search-with-serpapiclient) - - [Search Google Images](#search-google-images) - - [Search Google News](#search-google-news) - - [Search Google Shopping](#search-google-shopping) - - [Google Search By Location](#google-search-by-location) - - [Batch Asynchronous Searches](#batch-asynchronous-searches) - - [Python object as a result](#python-object-as-a-result) - - [Python paginate using iterator](#pagination-using-iterator) - - [Error management](#error-management) - - [Change log](#change-log) - - [Conclusion](#conclusion) - -### Google Search API capability -Source code. -```python -params = { - "q": "coffee", - "location": "Location Requested", - "device": "desktop|mobile|tablet", - "hl": "Google UI Language", - "gl": "Google Country", - "safe": "Safe Search Flag", - "num": "Number of Results", - "start": "Pagination Offset", - "api_key": "Your SERP API Key", - # To be match - "tbm": "nws|isch|shop", - # To be search - "tbs": "custom to be search criteria", - # allow async request - "async": "true|false", - # output format - "output": "json|html" -} +How SerpApi backend works: +- Searches Google using the search: `q` = `"coffee"` +- Parses the messy HTML responses. +- Returns a standardized JSON response. -# define the search search -search = GoogleSearch(params) -# override an existing parameter -search.params_dict["location"] = "Portland" -# search format return as raw html -html_results = search.get_html() -# parse results -# as python Dictionary -dict_results = search.get_dict() -# as JSON using json package -json_results = search.get_json() -# as dynamic Python object -object_result = search.get_object() -``` -[Link to the full documentation](https://serpapi.com/search-api) +The `GoogleSearch()` class: +- Formats the request. +- Executes a `GET` HTTP request against SerpApi service. +- Parses the JSON response into a dictionary via [`get_dict()`](https://github.com/serpapi/google-search-results-python/blob/56447f2ac39f202fa663233c87fa7c7b9ca1e6b2/serpapi/serp_api_client.py#L98) function. -See below for more hands-on examples. +## Search Google Images -### How to set SERP API key +This code prints all the image links, and downloads the images if you un-comment the line with [`wget`](https://www.gnu.org/software/wget/) (Linux/OS X tool to download files). -You can get an API key here if you don't already have one: https://serpapi.com/users/sign_up +This tutorial covers more ground on this topic at [`serpapi/showcase-serpapi-tensorflow-keras-image-training`](https://github.com/serpapi/showcase-serpapi-tensorflow-keras-image-training). -The SerpApi `api_key` can be set globally: -```python -GoogleSearch.SERP_API_KEY = "Your Private Key" -``` -The SerpApi `api_key` can be provided for each search: ```python -query = GoogleSearch({"q": "coffee", "serp_api_key": "Your Private Key"}) -``` +from serpapi import GoogleSearch -### Example by specification +search = GoogleSearch({"q": "coffee", "tbm": "isch", "api_key": ""}) -We love true open source, continuous integration and Test Driven Development (TDD). - We are using RSpec to test [our infrastructure around the clock](https://travis-ci.org/serpapi/google-search-results-python) to achieve the best Quality of Service (QoS). - -The directory test/ includes specification/examples. +for image_result in search.get_dict()['images_results']: + link = image_result["original"] -Set your API key. -```bash -export API_KEY="your secret key" + try: + print("link: " + link) + # wget.download(link, '.') + except: + pass ``` -Run test -```python -make test -``` +## Search Google Images with Pagination -### Location API +[Open example on Replit](https://replit.com/@serpapi/google-search-results-google-images-pagination-example#main.py) (online IDE). ```python from serpapi import GoogleSearch -search = GoogleSearch({}) -location_list = search.get_location("Austin", 3) -print(location_list) -``` -This prints the first 3 locations matching Austin (Texas, Texas, Rochester). -```python -[ { 'canonical_name': 'Austin,TX,Texas,United States', - 'country_code': 'US', - 'google_id': 200635, - 'google_parent_id': 21176, - 'gps': [-97.7430608, 30.267153], - 'id': '585069bdee19ad271e9bc072', - 'keys': ['austin', 'tx', 'texas', 'united', 'states'], - 'name': 'Austin, TX', - 'reach': 5560000, - 'target_type': 'DMA Region'}, - ...] -``` +params = { + "engine": "google", # parsing engine + "q": "coffee", # search query + "tbm": "isch", # image results + "num": "100", # number of images per page + "ijn": 0, # page number: 0 -> first page, 1 -> second... + "api_key": "" # your serpapi api key + # other query parameters: hl (lang), gl (country), etc +} -### Search Archive API +search = GoogleSearch(params) # where data extraction happens -The search results are stored in a temporary cache. -The previous search can be retrieved from the cache for free. +image_results = [] -```python -from serpapi import GoogleSearch -search = GoogleSearch({"q": "Coffee", "location": "Austin,Texas"}) -search_result = search.get_dictionary() -assert search_result.get("error") == None -search_id = search_result.get("search_metadata").get("id") -print(search_id) -``` +while True: + results = search.get_dict() # JSON -> Python dictionary -Now let's retrieve the previous search from the archive. + # checks for "Google hasn't returned any results for this query." + if "error" not in results: + for image in results["images_results"]: + if image["original"] not in image_results: + image_results.append(image["original"]) + + # update to the next page + params["ijn"] += 1 + else: + print(results["error"]) + break + +print(image_results) +``` +

💡Pagination examples for currently supported APIs💡

+ +## Supported Engines + +| Engine | Class name | +|------------------------------------------------------------------------------|-------------------------| +| [Google Search Engine](https://serpapi.com/search-api) | `GoogleSearch()` | +| [Google Maps](https://serpapi.com/google-maps-api) | `GoogleSearch()` | +| [Google Jobs](https://serpapi.com/google-jobs-api) | `GoogleSearch()` | +| [Google Trends](https://serpapi.com/google-trends-api) | `GoogleSearch()` | +| [Google Autocomplete](https://serpapi.com/google-autocomplete-api) | `GoogleScholarSearch()` | +| [Google Related Questions](https://serpapi.com/google-related-questions-api) | `GoogleScholarSearch()` | +| [Google Scholar](https://serpapi.com/google-scholar-api) | `GoogleScholarSearch()` | +| [Google Play Store](https://serpapi.com/google-play-api) | `GoogleSearch()` | +| [Google Product](https://serpapi.com/google-product-api) | `GoogleSearch()` | +| [Google Immersive Product](https://serpapi.com/google-immersive-product-api) | `GoogleSearch()` | +| [Google Reverse Image](https://serpapi.com/google-reverse-image) | `GoogleSearch()` | +| [Google Events](https://serpapi.com/google-events-api) | `GoogleSearch()` | +| [Google Local Services](https://serpapi.com/google-local-services-api) | `GoogleSearch()` | +| [Bing](https://serpapi.com/bing-search-api) | `BingSearch()` | +| [Baidu](https://serpapi.com/baidu-search-api) | `BaiduSearch()` | +| [DuckDuckGo](https://serpapi.com/duckduckgo-search-api) | `DuckDuckGoSearch()` | +| [Yahoo](https://serpapi.com/yahoo-search-api) | `YahooSearch()` | +| [Yandex](https://serpapi.com/yandex-search-api) | `YandexSearch()` | +| [eBay](https://serpapi.com/ebay-search-api) | `EbaySearch()` | +| [Youtube](https://serpapi.com/youtube-search-api) | `YoutubeSearch()` | +| [Walmart](https://serpapi.com/walmart-search-api) | `WalmartSearch()` | +| [HomeDepot](https://serpapi.com/home-depot-search-api) | `HomeDepotSearch()` | +| [Apple App Store](https://serpapi.com/apple-app-store) | `AppleAppStoreSearch()` | +| [Naver](https://serpapi.com/naver-search-api) | `NaverSearch()` | +| [Yelp](https://serpapi.com/yelp-search-api) | `YelpSearch()` | + + +## Google Search API Capability ```python -archived_search_result = GoogleSearch({}).get_search_archive(search_id, 'json') -print(archived_search_result.get("search_metadata").get("id")) -``` -This prints the search result from the archive. +params = { + "api_key": "asdewqe1231241asm", # Your SerpApi API key. + "q": "coffee", # Search query. + "google_domain": "google.com", # Google domain to use. + "location": "Austin, Texas, United States", # Location requested for the search. + "uule": "w+CAIQICINVW5pdGVkIFN0YXRlcw", # Google encoded location you want to use for the search. + "ludocid": "CID ID", # ID (CID) of the Google My Business listing you want to scrape. + "lsig": "AB86z5W5r155sIcs3jqfYkm9Y8Fp", # Force the knowledge graph map view to show up. + "device": "desktop|mobile|tablet", # Device used when making a search. + "hl": "en", # Language of the search. + "gl": "gl", # Country of the search. + "lr": "lang_en|lang_fr", # One or multiple languages to limit the search to. + "safe": "active|off", # Level of filtering for adult content. + "nfpr": "1|0", # Exclusion of results from an auto-corrected query that is spelled wrong. + "num": "100", # Number of results per page. + "start": "20", # Pagination offset. + "ijn":"1", # Page number for Google Images. + "tbm": "nws|isch|shop|lcl|vid", # Type of search: news, images, shopping, local and video results. + "tbs": "custom to be search criteria", # Advanced search for patents, dates, news, videos, images, apps, or text contents + "async": True|False, # Allow async request. + "no_cache": True|False # Force SerpApi to fetch the results even if a cached version is already present +} + +search = GoogleSearch(params) # Define the search. +search.params_dict["location"] = "Portland" # Override an existing parameter. +search.get_html() # Search format return as raw html (parse results). +search.get_dict() # Search format return as Python dictionary. +search.get_json() # Search format return as JSON using json package. +search.get_object() # Search format return as dynamic Python object. +``` + +## How to set SerpApi key -### Account API -```python -from serpapi import GoogleSearch -search = GoogleSearch({}) -account = search.get_account() -``` -This prints your account information. +The SerpApi `api_key` can be set globally: -### Search Bing ```python -from serpapi import BingSearch -search = BingSearch({"q": "Coffee", "location": "Austin,Texas"}) -data = search.get_dict() +GoogleSearch.SERP_API_KEY = "" ``` -This code prints Bing search results for coffee as a Dictionary. - -https://serpapi.com/bing-search-api -### Search Baidu -```python -from serpapi import BaiduSearch -search = BaiduSearch({"q": "Coffee"}) -data = search.get_dict() -``` -This code prints Baidu search results for coffee as a Dictionary. -https://serpapi.com/baidu-search-api +The SerpApi `api_key` can be provided for each search: -### Search Yandex ```python -from serpapi import YandexSearch -search = YandexSearch({"text": "Coffee"}) -data = search.get_dict() +query = GoogleSearch({"q": "coffee", "api_key": ""}) ``` -This code prints Yandex search results for coffee as a Dictionary. -https://serpapi.com/yandex-search-api +Using [Python system operating interface `os`](https://docs.python.org/3/library/os.html): -### Search Yahoo ```python -from serpapi import YahooSearch -search = YahooSearch({"p": "Coffee"}) -data = search.get_dict() +import os +query = GoogleSearch({"q": "coffee", "api_key": os.getenv("")}) ``` -This code prints Yahoo search results for coffee as a Dictionary. - -https://serpapi.com/yahoo-search-api +## Example by Specification -### Search eBay -```python -from serpapi import EbaySearch -search = EbaySearch({"_nkw": "Coffee"}) -data = search.get_dict() -``` -This code prints eBay search results for coffee as a Dictionary. +We love open source, continuous integration and [Test Driven Development](https://en.wikipedia.org/wiki/Test-driven_development) (TDD). We are using [RSpec](http://rspec.info/) to test [our infrastructure around the clock](https://travis-ci.org/serpapi/google-search-results-python) to achieve the best [Quality of Service](https://en.wikipedia.org/wiki/Quality_of_service) (QoS). + +The directory `test/` includes specification/examples. -https://serpapi.com/ebay-search-api +Set your API key: -### Search Home Depot -```python -from serpapi import HomeDepotSearch -search = HomeDepotSearch({"q": "chair"}) -data = search.get_dict() +```bash +export API_KEY="" ``` -This code prints Home Depot search results for chair as Dictionary. -https://serpapi.com/home-depot-search-api +Run test: -### Search Youtube ```python -from serpapi import HomeDepotSearch -search = YoutubeSearch({"q": "chair"}) -data = search.get_dict() +make test ``` -This code prints Youtube search results for chair as Dictionary. -https://serpapi.com/youtube-search-api +## Extra APIs -### Search Google Scholar -```python -from serpapi import GoogleScholarSearch -search = GoogleScholarSearch({"q": "Coffee"}) -data = search.get_dict() -``` -This code prints Google Scholar search results. +### [Location API](https://serpapi.com/locations-api) -### Search Walmart ```python -from serpapi import WalmartSearch -search = WalmartSearch({"query": "chair"}) -data = search.get_dict() -``` -This code prints Walmart search results. +from serpapi import GoogleSearch -### Search Youtube -```python -from serpapi import YoutubeSearch -search = YoutubeSearch({"search_query": "chair"}) -data = search.get_dict() +search = GoogleSearch({}) +location_list = search.get_location("Austin", 3) +print(location_list) ``` -This code prints Youtube search results. -### Search Apple App Store -```python -from serpapi import AppleAppStoreSearch -search = AppleAppStoreSearch({"term": "Coffee"}) -data = search.get_dict() -``` -This code prints Apple App Store search results. +This prints the first 3 locations matching Austin (Texas, Texas, Rochester): -### Search Naver ```python -from serpapi import NaverSearch -search = NaverSearch({"query": "chair"}) -data = search.get_dict() -``` -This code prints Naver search results. +[ + { + "id":"585069bdee19ad271e9bc072", + "google_id":200635, + "google_parent_id":21176, + "name":"Austin, TX", + "canonical_name":"Austin,TX,Texas,United States", + "country_code":"US", + "target_type":"DMA Region", + "reach":5560000, + "gps":[ + -97.7430608, + 30.267153 + ], + "keys":[ + "austin", + "tx", + "texas", + "united", + "states" + ] + }, ... 2 other locations +] +``` + +### [Search Archive API](https://serpapi.com/search-archive-api) + +The search results are stored in a temporary cache. The previous search can be retrieved from the cache for free up to 31 days after the search has been completed. -### Generic search with SerpApiClient ```python -from serpapi import SerpApiClient -query = {"q": "Coffee", "location": "Austin,Texas", "engine": "google"} -search = SerpApiClient(query) -data = search.get_dict() +from serpapi import GoogleSearch + +search = GoogleSearch({"q": "Coffee", "location": "Austin,Texas"}) +search_result = search.get_dictionary() +assert search_result.get("error") == None +search_id = search_result.get("search_metadata").get("id") +print(search_id) ``` -This class enables interaction with any search engine supported by SerpApi.com -### Search Google Images +Now let's retrieve the previous search from the archive which prints the search result from the archive. ```python -from serpapi import GoogleSearch -search = GoogleSearch({"q": "coffe", "tbm": "isch"}) -for image_result in search.get_dict()['images_results']: - link = image_result["original"] - try: - print("link: " + link) - # wget.download(link, '.') - except: - pass +archived_search_result = GoogleSearch({}).get_search_archive(search_id, 'json') +print(archived_search_result.get("search_metadata").get("id")) ``` -This code prints all the image links, - and downloads the images if you un-comment the line with wget (Linux/OS X tool to download files). +### [Account API](https://serpapi.com/account-api) -This tutorial covers more ground on this topic. -https://github.com/serpapi/showcase-serpapi-tensorflow-keras-image-training - -### Search Google News +This prints your account information: ```python from serpapi import GoogleSearch -search = GoogleSearch({ - "q": "coffe", # search search - "tbm": "nws", # news - "tbs": "qdr:d", # last 24h - "num": 10 -}) -for offset in [0,1,2]: - search.params_dict["start"] = offset * 10 - data = search.get_dict() - for news_result in data['news_results']: - print(str(news_result['position'] + offset * 10) + " - " + news_result['title']) + +search = GoogleSearch({}) +account = search.get_account() ``` -This script prints the first 3 pages of the news headlines for the last 24 hours. +```json +{ + "account_id":"account-id", + "api_key":"api-key", + "account_email":"email", + "plan_id":"free", + "plan_name":"Free Plan", + "searches_per_month":100, + "plan_searches_left":0, + "extra_credits":100, + "total_searches_left":0, + "this_month_usage":0, + "this_hour_searches":0, + "last_hour_searches":0, + "account_rate_limit_per_hour":200000 +} +``` -### Search Google Shopping +## Generic search with SerpApiClient + +This class enables interaction with any search engine supported by SerpApi. ```python -from serpapi import GoogleSearch -search = GoogleSearch({ - "q": "coffe", # search search - "tbm": "shop", # news - "tbs": "p_ord:rv", # last 24h - "num": 100 -}) -data = search.get_dict() -for shopping_result in data['shopping_results']: - print(shopping_result['position']) + " - " + shopping_result['title']) +from serpapi import SerpApiClient +query = {"q": "Coffee", "location": "Austin,Texas", "engine": "google"} +search = SerpApiClient(query) +data = search.get_dict() +print(data) ``` -This script prints all the shopping results, ordered by review order. - -### Google Search By Location +## Google Search By Location With SerpApi, we can build a Google search from anywhere in the world. This code looks for the best coffee shop for the given cities. ```python from serpapi import GoogleSearch + for city in ["new york", "paris", "berlin"]: location = GoogleSearch({}).get_location(city, 1)[0]["canonical_name"] + search = GoogleSearch({ "q": "best coffee shop", # search search "location": location, "num": 1, "start": 0 }) + data = search.get_dict() top_result = data["organic_results"][0]["title"] ``` -### Batch Asynchronous Searches - -We offer two ways to boost your searches thanks to the`async` parameter. - - Blocking - async=false - more compute intensive because the search needs to maintain many connections. (default) -- Non-blocking - async=true - the way to go for large batches of queries (recommended) +## Batch Asynchronous Searches -```python -# Operating system -import os +We offer two ways to boost your searches thanks to the `async` parameter. -# regular expression library -import re +- Blocking: `async=false` - more compute intensive because the search needs to maintain many connections. (default) +- Non-blocking: `async=true` - the way to go for large batches of queries (recommended) -# safe queue (named Queue in python2) -from queue import Queue +This code shows how to run searches asynchronously: -# Time utility -import time - -# SerpApi search -from serpapi import GoogleSearch +```python +import os # Operating system +import re # regular expression library +from queue import Queue # safe queue (named Queue in python2) +import time # Time utility +from serpapi import GoogleSearch # SerpApi search -# store searches -search_queue = Queue() +search_queue = Queue() # store searches # SerpApi search search = GoogleSearch({ @@ -450,12 +386,16 @@ search = GoogleSearch({ # loop through a list of companies for company in ['amd', 'nvidia', 'intel']: print("execute async search: q = " + company) + search.params_dict["q"] = company result = search.get_dict() + if "error" in result: print("oops error: ", result["error"]) continue + print("add search to the queue where id: ", result['search_metadata']) + # add search to the search_queue search_queue.put(result) @@ -468,13 +408,16 @@ while not search_queue.empty(): # retrieve search from the archive - blocker print(search_id + ": get search from archive") + search_archived = search.get_search_archive(search_id) + print(search_id + ": status = " + search_archived['search_metadata']['status']) # check status if re.search('Cached|Success', search_archived['search_metadata']['status']): + print(search_id + ": search done with q = " + search_archived['search_parameters']['q']) else: @@ -488,26 +431,28 @@ while not search_queue.empty(): print('all searches completed') ``` -This code shows how to run searches asynchronously. -The search parameters must have {async: True}. This indicates that the client shouldn't wait for the search to be completed. +The search parameters must have `{async: True}`. This indicates that the client shouldn't wait for the search to be completed. The current thread that executes the search is now non-blocking, which allows it to execute thousands of searches in seconds. The SerpApi backend will do the processing work. -The actual search result is deferred to a later call from the search archive using get_search_archive(search_id). -In this example the non-blocking searches are persisted in a queue: search_queue. -A loop through the search_queue allows it to fetch individual search results. + +The actual search result is deferred to a later call from the search archive using `get_search_archive(search_id)`. +In this example the non-blocking searches are persisted in a queue: `search_queue`. +A loop through the `search_queue` allows it to fetch individual search results. + This process can easily be multithreaded to allow a large number of concurrent search requests. To keep things simple, this example only explores search results one at a time (single threaded). [See example.](https://github.com/serpapi/google-search-results-python/blob/master/tests/test_example.py) -### Python object as a result +## Python object as a result -The search results can be automatically wrapped in dynamically generated Python object. -This solution offers a more dynamic, fully Oriented Object Programming approach over the regular Dictionary / JSON data structure. +The search results can be automatically wrapped in dynamically generated Python object. This solution offers a more dynamic, fully [Oriented Object Programming](https://en.wikipedia.org/wiki/Object-oriented_programming) approach over the regular Dictionary / JSON data structure. ```python from serpapi import GoogleSearch + search = GoogleSearch({"q": "Coffee", "location": "Austin,Texas"}) r = search.get_object() + assert type(r.organic_results), list assert r.organic_results[0].title assert r.search_metadata.id @@ -516,41 +461,36 @@ assert r.search_parameters.q, "Coffee" assert r.search_parameters.engine, "google" ``` -### Pagination using iterator -Let's collect links across multiple search results pages. +## Pagination using iterator + +[Open on Replit](https://replit.com/@serpapi/google-search-results-pagination-using-iterator-example?v=1) (online IDE). + ```python -# to get 2 pages +from serpapi import GoogleSearch + +# parameters to get 2 pages start = 0 -end = 40 +end = 20 page_size = 10 -# basic search parameters parameter = { - "q": "coca cola", - "tbm": "nws", - "api_key": os.getenv("API_KEY"), - # optional pagination parameter - # the pagination method can take argument directly - "start": start, - "end": end, - "num": page_size + "q": "coca cola", # search query + "tbm": "nws", # news tab + "api_key": "", # serpapi api key + "start": start, # optional pagination parameter. Can take argument directly + "end": end, # explicit parameter to pagination stop at 2nd page. + "num": page_size # number of results per page } -# as proof of concept -# urls collects -urls = [] +search = GoogleSearch(parameter) # initialize a search -# initialize a search -search = GoogleSearch(parameter) +urls = [] # urls collects -# create a python generator using parameter -pages = search.pagination() -# or set custom parameter -pages = search.pagination(start, end, page_size) +pages = search.pagination() # create a python generator using parameter +pages = search.pagination(start, end, page_size) # or set custom parameter # fetch one search result per iteration -# using a basic python for loop -# which invokes python iterator under the hood. +# using a basic python for loop which invokes python iterator under the hood for page in pages: print(f"Current page: {page['serpapi_pagination']['current']}") for news_result in page["news_results"]: @@ -565,26 +505,29 @@ if len(urls) == len(set(urls)): print("all search results are unique!") ``` -Examples to fetch links with pagination: [test file](https://github.com/serpapi/google-search-results-python/blob/master/tests/test_example_paginate.py), [online IDE](https://replit.com/@DimitryZub1/Scrape-Google-News-with-Pagination-python-serpapi) +This example fetches links with pagination. [Test pagination file](https://github.com/serpapi/google-search-results-python/blob/master/tests/test_example_paginate.py). + +## Error management -### Error management +SerpApi keeps error management simple: -SerpApi keeps error management simple. - - backend service error or search fail - - client error +- backend service error or search fail. +- client error. If it's a backend error, a simple error message is returned as string in the server response. + ```python from serpapi import GoogleSearch -search = GoogleSearch({"q": "Coffee", "location": "Austin,Texas", "api_key": ""}) + +search = GoogleSearch({"q": "Coffee", "location": "Austin,Texas", "api_key": ""}) data = search.get_json() assert data["error"] == None ``` -In some cases, there are more details available in the data object. -If it's a client error, then a SerpApiClientException is raised. +In some cases, there are more details available in the data object. If it's a client error, then a `SerpApiClientException` is raised. ## Change log + 2021-12-22 @ 2.4.1 - add more search engine - youtube @@ -629,18 +572,4 @@ If it's a client error, then a SerpApiClientException is raised. - Support for Bing and Baidu 2019-06-25 @ 1.6 - - New search engine supported: Baidu and Bing - -## Conclusion -SerpApi supports all the major search engines. Google has the more advance support with all the major services available: Images, News, Shopping and more.. -To enable a type of search, the field tbm (to be matched) must be set to: - - * isch: Google Images API. - * nws: Google News API. - * shop: Google Shopping API. - * any other Google service should work out of the box. - * (no tbm parameter): regular Google search. - -The field `tbs` allows to customize the search even more. - -[The full documentation is available here.](https://serpapi.com/search-api) + - New search engine supported: Baidu and Bing \ No newline at end of file diff --git a/examples/pagination/appstore-reviews-pagination-example.py b/examples/pagination/appstore-reviews-pagination-example.py new file mode 100644 index 0000000..96096f4 --- /dev/null +++ b/examples/pagination/appstore-reviews-pagination-example.py @@ -0,0 +1,37 @@ +# docs: https://serpapi.com/apple-app-store + +from serpapi import AppleAppStoreSearch +from urllib.parse import (parse_qsl, urlsplit) + +params = { + "api_key": "...", # your serpapi api key + "engine": "apple_reviews", # search engine + "product_id": "479516143" # product query ID +} + +search = AppleAppStoreSearch(params) # where data extraction happens + +# show the page number +page_num = 0 + +# iterate over all pages +while True: + results = search.get_dict() # JSON -> Python dict + + if "error" in results: + print(results["error"]) + break + + page_num += 1 + print(f"Current page: {page_num}") + + # iterate over organic results and extract the data + for result in results.get("reviews", []): + print(result["position"], result["title"], result["review_date"], sep="\n") + + # check if the next page key is present in the JSON + # if present -> split URL in parts and update to the next page + if "next" in results.get("serpapi_pagination", {}): + search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination", {}).get("next")).query))) + else: + break \ No newline at end of file diff --git a/examples/pagination/appstore-search-pagination-example.py b/examples/pagination/appstore-search-pagination-example.py new file mode 100644 index 0000000..7aab11f --- /dev/null +++ b/examples/pagination/appstore-search-pagination-example.py @@ -0,0 +1,40 @@ +# docs: https://serpapi.com/apple-app-store + +from serpapi import AppleAppStoreSearch +from urllib.parse import (parse_qsl, urlsplit) + +params = { + "api_key": "...", # your serpapi api key + "engine": "apple_app_store", # search engine + "term": "minecraft", # search query + "country": "us", # country of to search from + "lang": "en-us", # language + "num": "200" # number of results per page +} + +search = AppleAppStoreSearch(params) # where data extraction happens + +# to show the page number +page_num = 0 + +# iterate over all pages +while True: + results = search.get_dict() # JSON -> Python dict + + if "error" in results: + print(results["error"]) + break + + page_num += 1 + print(f"Current page: {page_num}") + + # iterate over organic results and extract the data + for result in results.get("organic_results", []): + print(result["position"], result["title"], sep="\n") + + # check if the next page key is present in the JSON + # if present -> split URL in parts and update to the next page + if "next" in results.get("serpapi_pagination", {}): + search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination", {}).get("next")).query))) + else: + break \ No newline at end of file diff --git a/examples/pagination/baidu-search-pagination-example.py b/examples/pagination/baidu-search-pagination-example.py new file mode 100644 index 0000000..b2cacba --- /dev/null +++ b/examples/pagination/baidu-search-pagination-example.py @@ -0,0 +1,38 @@ +# docs: https://serpapi.com/baidu-search-api + +from serpapi import BaiduSearch +from urllib.parse import (parse_qsl, urlsplit) + +params = { + "api_key": "...", # your serpapi api key + "engine": "baidu", # search engine + "q": "minecraft redstone" # search query + # other parameters +} + +search = BaiduSearch(params) # where data extraction happens + +# to show the page number +page_num = 0 + +# iterate over all pages +while True: + results = search.get_dict() # JSON -> Python dict + + if "error" in results: + print(results["error"]) + break + + page_num += 1 + print(f"Current page: {page_num}") + + # iterate over organic results and extract the data + for result in results["organic_results"]: + print(result["position"], result["title"], sep="\n") + + # check if the next page key is present in the JSON + # if present -> split URL in parts and update to the next page + if "next" in results.get("serpapi_pagination", {}): + search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination", {}).get("next")).query))) + else: + break \ No newline at end of file diff --git a/examples/pagination/bing-search-pagination-example.py b/examples/pagination/bing-search-pagination-example.py new file mode 100644 index 0000000..4a2a9fe --- /dev/null +++ b/examples/pagination/bing-search-pagination-example.py @@ -0,0 +1,33 @@ +# docs: https://serpapi.com/bing-search-api + +from serpapi import BingSearch +from urllib.parse import (parse_qsl, urlsplit) + +params = { + "api_key": "...", # your serpapi api key + "engine": "bing", # parsing engine + "q": "brabus", # search query + "device": "desktop", # device used for search + "mkt": "en-us", # language of the search + "count": "50" # number of results per page. 50 is the maximum +} + +search = BingSearch(params) # where data extraction happens + +while True: + results = search.get_dict() # JSON -> Python dict + + if "error" in results: + print(results["error"]) + break + + # iterate over organic results and extract the data + for result in results["organic_results"]: + print(result["position"], result["title"], sep="\n") + + # check if the next page key is present in the JSON + # if present -> split URL in parts and update to the next page + if "next" in results.get("serpapi_pagination", {}): + search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination", {}).get("next")).query))) + else: + break \ No newline at end of file diff --git a/examples/pagination/duckduck-go-pagination-example.py b/examples/pagination/duckduck-go-pagination-example.py new file mode 100644 index 0000000..de24faf --- /dev/null +++ b/examples/pagination/duckduck-go-pagination-example.py @@ -0,0 +1,29 @@ +# docs: https://serpapi.com/duckduckgo-search-api + +from serpapi import GoogleSearch # will be changed to DuckDuckGoSearch later + +params = { + "api_key": "...", # your serpapi api key + "engine": "duckduckgo", # search engine + "q": "minecraft redstone", # search query + "kl": "us-en" # language +} + +search = GoogleSearch(params) # where data extraction happens +pages = search.pagination() # paginating over all pages + +# to show the page number +page_num = 0 + +for page in pages: + + # checks for "Duckduckgo hasn't retured anything for this query..." + if "error" in page: + print(page["error"]) + break + + page_num += 1 + print(f"Page number: {page_num}") + + for result in page["organic_results"]: + print(result["title"]) \ No newline at end of file diff --git a/examples/pagination/ebay-search-pagination-example.py b/examples/pagination/ebay-search-pagination-example.py new file mode 100644 index 0000000..7c16108 --- /dev/null +++ b/examples/pagination/ebay-search-pagination-example.py @@ -0,0 +1,48 @@ +# Docs: https://serpapi.com/ebay-search-api + +from serpapi import EbaySearch +from urllib.parse import (parse_qsl, urlsplit) + +params = { + "api_key": "...", # serpapi api key + "engine": "ebay", # search engine + "ebay_domain": "ebay.com", # ebay domain + "_nkw": "minecraft redstone", # search query + # other params +} + +search = EbaySearch(params) # where data extraction happens + +page_num = 0 + +while True: + results = search.get_dict() # JSON -> Python dict + + if "error" in results: + print(results["error"]) + break + + for organic_result in results.get("organic_results", []): + title = organic_result.get("title") + print(title) + + page_num += 1 + print(page_num) + + # {'_nkw': 'minecraft redstone', '_pgn': '19', 'engine': 'ebay'} + next_page_query_dict = dict(parse_qsl(urlsplit(results["serpapi_pagination"]["next"]).query)) + current_page = results["serpapi_pagination"]["current"] # 1,2,3... + + # looks for the next page data (_pgn): + # {'_nkw': 'minecraft redstone', '_pgn': '19', 'engine': 'ebay'} + if "next" in results.get("pagination", {}): + + # if current_page = 20 and next_page_query_dict["_pgn"] = 20: break + if int(current_page) == int(next_page_query_dict["_pgn"]): + break + + # update next page data + search.params_dict.update(next_page_query_dict) + else: + break + \ No newline at end of file diff --git a/examples/pagination/google-images-pagination-example.py b/examples/pagination/google-images-pagination-example.py new file mode 100644 index 0000000..a596f42 --- /dev/null +++ b/examples/pagination/google-images-pagination-example.py @@ -0,0 +1,40 @@ +# Docs: https://serpapi.com/images-results + +from serpapi import GoogleSearch +from urllib.parse import (parse_qsl, urlsplit) + +params = { + "api_key": "...", # your serpapi api key + "engine": "google", # search engine + "q": "minecraft redstone", # search query + "gl": "us", # country of the search + "hl": "en", # language + "ijn": 0, # page number: 0 -> first page, 1 -> second... + "tbm": "isch" # image results +} + +search = GoogleSearch(params) # where data extraction happens + +# to show the page number +page_num = 0 + +image_results = [] + +while True: + results = search.get_dict() # # JSON -> Python dictionary + + page_num += 1 + print(f"Current page: {page_num}") + + # checks for "Google hasn't returned any results for this query." + if "error" not in results: + for image in results.get("images_results", []): + if image["original"] not in image_results: + print(image["original"]) + image_results.append(image["original"]) + + # update to the next page + params["ijn"] += 1 + else: + print(results["error"]) + break \ No newline at end of file diff --git a/examples/pagination/google-jobs-pagination-example.py b/examples/pagination/google-jobs-pagination-example.py new file mode 100644 index 0000000..d979ced --- /dev/null +++ b/examples/pagination/google-jobs-pagination-example.py @@ -0,0 +1,42 @@ +# docs: https://serpapi.com/google-jobs-api + +from serpapi import GoogleSearch +import json + +params = { + "api_key": "...", # serpapi api key + "engine": "google_jobs", # parsing engine + "google_domain": "google.com", # google domain for the search + "q": "Barista", # search query + "start": 0 # page number +} + +search = GoogleSearch(params) # where data extraction happens on the backend + +jobs_data = [] + +# to show page number +page_num = 0 + +while True: + results = search.get_dict() # JSON -> Python dict + + # checks for "Google hasn't returned any results for this query." + if "error" in results: + print(results["error"]) + break + + page_num += 1 + print(f"Current page: {page_num}") + + # iterate over organic results and extract the data + for result in results["jobs_results"]: + jobs_data.append({ + "title": result["title"], + "company_name": result["company_name"], + "location": result["location"] + }) + + params["start"] += 10 + +print(json.dumps(jobs, indent=2)) \ No newline at end of file diff --git a/examples/pagination/google-local-results-pagination-example.py b/examples/pagination/google-local-results-pagination-example.py new file mode 100644 index 0000000..571a615 --- /dev/null +++ b/examples/pagination/google-local-results-pagination-example.py @@ -0,0 +1,38 @@ +# docs: https://serpapi.com/local-results + +from serpapi import GoogleSearch +from urllib.parse import (parse_qsl, urlsplit) + +params = { + "api_key": "...", # serpapi api key + "engine": "google", # search engine + "q": "minecraft", # search query + "gl": "us", # country of the search + "hl": "en", # language + "tbm": "lcl" # local results +} + +search = GoogleSearch(params) # where data extraction happens + +# to show the page number +page_num = 0 + +# iterate over all pages +while True: + results = search.get_dict() # JSON -> Python dict + + if "error" in results: + print(results["error"]) + break + + page_num += 1 + print(f"Current page: {page_num}") + + # iterate over organic results and extract the data + for result in results.get("local_results", []): + print(result.get("position"), result.get("title"), result.get("address"), sep="\n") + + if results.get("serpapi_pagination", {}).get("next"): + search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination").get("next")).query))) + else: + break \ No newline at end of file diff --git a/examples/pagination/google-maps-place-photos-pagination-example.py b/examples/pagination/google-maps-place-photos-pagination-example.py new file mode 100644 index 0000000..6bc4e2b --- /dev/null +++ b/examples/pagination/google-maps-place-photos-pagination-example.py @@ -0,0 +1,51 @@ +# docs: https://serpapi.com/google-maps-photos-api + +# For the given URL we want DATA ID parameter. Look where 👇 starts and ends. That's what we need. + +# www.google.com +# maps +# place +# GameStop +# @40.70605,-74.183206,10z +# data= 👇 👇 +# !4m9!1m2!2m1!1sgamestop+!3m5!1s0x89c2632baa1222a1:0xc8ccef76fb4dc917!8m2!3d40.7060247!4d-73.6558696 +# 0x89c2632baa1222a1:0xc8ccef76fb4dc917 +# !15sCghnYW1lc3RvcCIDiAEBWgoiCGdhbWVzdG9wkgEQdmlkZW9fZ2FtZV9zdG9yZQ + +from serpapi import GoogleSearch +from urllib.parse import (parse_qsl, urlsplit) + +params = { + "api_key": "...", # your serpapi api key + "engine": "google_maps_photos", # parsing engine + "hl": "en", # language of the search + "data_id": "0x89c2632baa1222a1:0xc8ccef76fb4dc917" # place id +} + +search = GoogleSearch(params) # where data extraction happens + +# to show the page number +page_num = 0 + +# iterate over all pages +results_is_present = True +while results_is_present: + results = search.get_dict() # JSON -> Python dict + + if "error" in results: + print(results["error"]) + break + + page_num += 1 + print(f"Current page: {page_num}") + + # iterate over organic results and extract the data + for result in results.get("photos", []): + print(result.get("thumbnail"), result.get("image"), sep="\n") + + # check if the next page key is present in the JSON + # if present -> split URL in parts and update to the next page + if "next" in results.get("serpapi_pagination", {}): + search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination").get("next")).query))) + else: + break \ No newline at end of file diff --git a/examples/pagination/google-maps-place-reviews-pagination-example.py b/examples/pagination/google-maps-place-reviews-pagination-example.py new file mode 100644 index 0000000..7de288d --- /dev/null +++ b/examples/pagination/google-maps-place-reviews-pagination-example.py @@ -0,0 +1,48 @@ +# Docs: https://serpapi.com/maps-place-results + +# For the given URL we want DATA ID parameter. Look where 👇 starts and ends. That's what we need. + +# www.google.com +# maps +# place +# GameStop +# @40.70605,-74.183206,10z +# data= 👇 👇 +# !4m9!1m2!2m1!1sgamestop+!3m5!1s0x89c2632baa1222a1:0xc8ccef76fb4dc917!8m2!3d40.7060247!4d-73.6558696 +# 0x89c2632baa1222a1:0xc8ccef76fb4dc917 +# !15sCghnYW1lc3RvcCIDiAEBWgoiCGdhbWVzdG9wkgEQdmlkZW9fZ2FtZV9zdG9yZQ + +from serpapi import GoogleSearch +from urllib.parse import (parse_qsl, urlsplit) + +params = { + "api_key": "...", # your serpapi api key + "engine": "google_maps_reviews", # parsing engine + "hl": "en", # language of the search + "data_id": "0x89c251dd1776880d:0xa3ea85dca1b55baf" # place id +} + +search = GoogleSearch(params) # where data extraction happens on the backend + +# to show the page number +page_num = 0 + +# iterate over all pages +while True: + results = search.get_dict() # JSON -> Python dict + + if "error" in results: + print(results["error"]) + break + + page_num += 1 + print(f"Current page: {page_num}") + + # iterate over organic results and extract the data + for result in results.get("reviews", []): + print(result.get("user"), result.get("date"), sep="\n") + + if results.get("serpapi_pagination").get("next") and results.get("serpapi_pagination").get("next_page_token"): + search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination").get("next")).query))) + else: + break \ No newline at end of file diff --git a/examples/pagination/google-maps-search-pagination-example.py b/examples/pagination/google-maps-search-pagination-example.py new file mode 100644 index 0000000..53563c3 --- /dev/null +++ b/examples/pagination/google-maps-search-pagination-example.py @@ -0,0 +1,41 @@ +# docs: https://serpapi.com/maps-local-results + +from serpapi import GoogleSearch +from urllib.parse import (parse_qsl, urlsplit) + +params = { + "api_key": "...", # serpapi api key + "engine": "google_maps", # search engine + "type": "search", # type is set to search. there's also place search type. + "q": "gamestop", # search query + "hl": "en", # language + "ll": "@40.7107323,-74.1542422,10z" # GPS coordinates of location where you want your q (query) to be applied. +} + +search = GoogleSearch(params) # where data extraction happens + +# to show the page number +page_num = 0 + +# iterate over all pages +results_is_present = True +while results_is_present: + results = search.get_dict() # JSON -> Python dict + + if "error" in results: + print(results["error"]) + break + + page_num += 1 + print(f"Current page: {page_num}") + + # iterate over organic results and extract the data + for result in results.get("local_results", []): + print(result.get("position"), result.get("title"), result.get("gps_coordinates"), result.get("address"), sep="\n") + + # check if the next page key is present in the JSON + # if present -> split URL in parts and update to the next page + if "next" in results.get("serpapi_pagination", {}): + search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination").get("next")).query))) + else: + break \ No newline at end of file diff --git a/examples/pagination/google-play-product-reviews-pagination-example.py b/examples/pagination/google-play-product-reviews-pagination-example.py new file mode 100644 index 0000000..ec6f687 --- /dev/null +++ b/examples/pagination/google-play-product-reviews-pagination-example.py @@ -0,0 +1,41 @@ +# docs: https://serpapi.com/google-play-product-reviews + +from serpapi import GoogleSearch +from urllib.parse import (parse_qsl, urlsplit) + +params = { + "api_key": "...", # serpapi api key + "engine": "google_play_product", # search engine + "store": "apps", # app results. there's also books, movies... + "gl": "us", # country of the search + "product_id": "com.my.minecraftskins", # app name + "all_reviews": "true" # shows all reviews +} + +search = GoogleSearch(params) # where data extraction happens + +# to show the page number +page_num = 0 + +# iterate over all pages +results_is_present = True +while results_is_present: + results = search.get_dict() # JSON -> Python dict + + if "error" in results: + print(results["error"]) + break + + page_num += 1 + print(f"Current page: {page_num}") + + # iterate over organic results and extract the data + for result in results.get("reviews", []): + print(result.get("title"), result.get("date"), sep="\n") + + # check if the next page key is present in the JSON + # if present -> split URL in parts and update to the next page + if "next" in results.get("serpapi_pagination", {}): + search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination").get("next")).query))) + else: + break \ No newline at end of file diff --git a/examples/pagination/google-play-search-pagination-example.py b/examples/pagination/google-play-search-pagination-example.py new file mode 100644 index 0000000..d8c7a18 --- /dev/null +++ b/examples/pagination/google-play-search-pagination-example.py @@ -0,0 +1,43 @@ +# docs: https://serpapi.com/google-play-api + +from serpapi import GoogleSearch +from urllib.parse import (parse_qsl, urlsplit) + + +params = { + "api_key": "...", # serpapi api key + "engine": "google_play", # search engine + "q": "minecraft", # search query + "hl": "en", # language + "store": "apps", # app results. there's also books, movies... + "gl": "us" # country +} + +search = GoogleSearch(params) # where data extraction happens + +# to show page number +page_num = 0 + +# iterate over all pages +results_is_present = True +while results_is_present: + results = search.get_dict() # JSON -> Python dict + + if "error" in results: + print(results["error"]) + break + + page_num += 1 + print(f"Current page: {page_num}") + + # iterate over organic results and extract the data + for result in results.get("organic_results", []): + for item in result.get("items", []): + print(item.get("title"), item.get("product_id"), sep="\n") + + # check if the next page key is present in the JSON + # if present -> split URL in parts and update to the next page + if "next" in results.get("serpapi_pagination", {}): + search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination").get("next")).query))) + else: + break \ No newline at end of file diff --git a/examples/pagination/google-scholar-author-publications-pagination-example.py b/examples/pagination/google-scholar-author-publications-pagination-example.py new file mode 100644 index 0000000..4f843c0 --- /dev/null +++ b/examples/pagination/google-scholar-author-publications-pagination-example.py @@ -0,0 +1,39 @@ +# docs: https://serpapi.com/google-scholar-author-articles + +from serpapi import GoogleScholarSearch +from urllib.parse import (parse_qsl, urlsplit) + +params = { + "api_key": "...", # serpapi api key + "engine": "google_scholar_author", # search engine + "hl": "en", # language + "author_id": "jlas1aMAAAAJ" # search query +} + +search = GoogleScholarSearch(params) # where data extraction happens + +# to show page number +page_num = 0 + +# iterate over all pages +results_is_present = True +while results_is_present: + results = search.get_dict() # JSON -> Python dict + + if "error" in results: + print(results["error"]) + break + + page_num += 1 + print(f"Current page: {page_num}") + + # iterate over organic results and extract the data + for result in results.get("articles", []): + print(result.get("title"), result.get("authors"), sep="\n") + + # check if the next page key is present in the JSON + # if present -> split URL in parts and update to the next page + if "next" in results.get("serpapi_pagination", {}): + search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination").get("next")).query))) + else: + break \ No newline at end of file diff --git a/examples/pagination/google-scholar-profiles-pagination-example.py b/examples/pagination/google-scholar-profiles-pagination-example.py new file mode 100644 index 0000000..7eb9329 --- /dev/null +++ b/examples/pagination/google-scholar-profiles-pagination-example.py @@ -0,0 +1,39 @@ +# docs: https://serpapi.com/google-scholar-profiles-api + +from serpapi import GoogleScholarSearch +from urllib.parse import (parse_qsl, urlsplit) + +params = { + "api_key": "...", # serpapi api key + "engine": "google_scholar_profiles", # search engine + "hl": "en", # language + "mauthors": "Blizzard" # search query +} + +search = GoogleScholarSearch(params) # where data extraction happens + +# to show page number +page_num = 0 + +# iterate over all pages +results_is_present = True +while results_is_present: + results = search.get_dict() # JSON -> Python dict + + if "error" in results: + print(results["error"]) + break + + page_num += 1 + print(f"Current page: {page_num}") + + # iterate over organic results and extract the data + for result in results.get("profiles", []): + print(result.get("name"), result.get("email"), sep="\n") + + # check if the next page key is present in the JSON + # if present -> split URL in parts and update to the next page + if "next" in results.get("pagination", {}): + search.params_dict.update(dict(parse_qsl(urlsplit(results.get("pagination").get("next")).query))) + else: + break \ No newline at end of file diff --git a/examples/pagination/google-scholar-search-pagination-example.py b/examples/pagination/google-scholar-search-pagination-example.py new file mode 100644 index 0000000..e301f25 --- /dev/null +++ b/examples/pagination/google-scholar-search-pagination-example.py @@ -0,0 +1,39 @@ +# docs: https://serpapi.com/google-scholar-api + +from serpapi import GoogleSearch +from urllib.parse import (parse_qsl, urlsplit) + +params = { + "api_key": "...", # serpapi api key + "engine": "google_scholar", # search engine + "q": "minecraft redstone", # language + "hl": "en" # search query +} + +search = GoogleSearch(params) # where data extraction happens + +# to show page number +page_num = 0 + +# iterate over all pages +results_is_present = True +while results_is_present: + results = search.get_dict() # JSON -> Python dict + + if "error" in results: + print(results["error"]) + break + + page_num += 1 + print(f"Current page: {page_num}") + + # iterate over organic results and extract the data + for result in results.get("organic_results", []): + print(result.get("position"), result.get("title"), sep="\n") + + # check if the next page key is present in the JSON + # if present -> split URL in parts and update to the next page + if "next" in results.get("serpapi_pagination", {}): + search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination").get("next")).query))) + else: + break \ No newline at end of file diff --git a/examples/pagination/google-search-news-pagination-example.py b/examples/pagination/google-search-news-pagination-example.py new file mode 100644 index 0000000..0912a64 --- /dev/null +++ b/examples/pagination/google-search-news-pagination-example.py @@ -0,0 +1,39 @@ +# docs: https://serpapi.com/news-results + +from serpapi import GoogleSearch +from urllib.parse import (parse_qsl, urlsplit) + +params = { + "api_key": "...", # serpapi api key + "engine": "google", # search engine + "q": "minecraft", # search query + "gl": "us", # country of the search + "hl": "en", # language + "num": "100", # number of news per page + "tbm": "nws" # news results +} + +search = GoogleSearch(params) # where data extraction happens + +# to show page number +page_num = 0 + +# iterate over all pages +while True: + results = search.get_dict() # JSON -> Python dict + + if "error" in results: + print(results["error"]) + break + + page_num += 1 + print(f"Current page: {page_num}") + + # iterate over organic results and extract the data + for result in results.get("news_results", []): + print(result.get("position"), result.get("title"), sep="\n") + + if "next" in results.get("serpapi_pagination", {}): + search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination").get("next")).query))) + else: + break \ No newline at end of file diff --git a/examples/pagination/google-search-pagination-example.py b/examples/pagination/google-search-pagination-example.py new file mode 100644 index 0000000..77ae245 --- /dev/null +++ b/examples/pagination/google-search-pagination-example.py @@ -0,0 +1,31 @@ +# docs: https://serpapi.com/organic-results + +from serpapi import GoogleSearch + +params = { + "api_key": "...", # serpapi api key + "engine": "google", # search engine + "q": "minecraft redstone", # search query + "gl": "us", # country of the search + "hl": "en" # language +} + +search = GoogleSearch(params) # where data extraction happens +pages = search.pagination() # paginates over all pages + +# to show page number +page_num = 0 + +# iterate over all pages +for page in pages: + + if "error" in page: + print(page["error"]) + break + + page_num += 1 + print(f"Current page: {page_num}") + + # iterate over organic results and extract the data + for result in page["organic_results"]: + print(result["position"], result["title"], sep="\n") \ No newline at end of file diff --git a/examples/pagination/google-shopping-search-pagination-example.py b/examples/pagination/google-shopping-search-pagination-example.py new file mode 100644 index 0000000..44398be --- /dev/null +++ b/examples/pagination/google-shopping-search-pagination-example.py @@ -0,0 +1,39 @@ +# docs: https://serpapi.com/shopping-results + +from serpapi import GoogleSearch +from urllib.parse import (parse_qsl, urlsplit) + +params = { + "api_key": "...", # serpapi api key + "engine": "google", # search engine + "q": "minecraft", # search query + "gl": "us", # country of the search + "hl": "en", # language + "num": "100", # number of results per page + "tbm": "shop" # shopping results +} + +search = GoogleSearch(params) # where data extraction + +# to show page number +page_num = 0 + +# iterate over all pages +while True: + results = search.get_dict() # JSON -> Python dict + + if "error" in results: + print(results["error"]) + break + + page_num += 1 + print(f"Current page: {page_num}") + + # iterate over organic results and extract the data + for result in results.get("shopping_results", []): + print(result.get("position"), result.get("title"), result.get("price"), sep="\n") + + if "next" in results.get("serpapi_pagination", {}): + search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination").get("next")).query))) + else: + break \ No newline at end of file diff --git a/examples/pagination/homedepot-pagination-example.py b/examples/pagination/homedepot-pagination-example.py new file mode 100644 index 0000000..257e88e --- /dev/null +++ b/examples/pagination/homedepot-pagination-example.py @@ -0,0 +1,38 @@ +# docs: https://serpapi.com/home-depot-search-api + +from serpapi import HomeDepotSearch +from urllib.parse import (parse_qsl, urlsplit) + +params = { + "api_key": "...", # serpapi api key + "engine": "home_depot", # search engine + "q": "hammer" # search query +} + +search = HomeDepotSearch(params) # where data extraction happens + +# # just to show the page number +page_num = 0 + +# iterate over all pages +while True: + results = search.get_dict() # JSON -> Python dict + + if "error" in results: + print(results["error"]) + break + + page_num += 1 + print(f"Current page: {page_num}") + + # iterate over organic results and extract the data + for result in results.get("products", []): + print(result["position"], result["title"], sep="\n") + + # check if the next page key is present in the JSON + # if present -> split URL in parts and update to the next page + if "next" in results.get("serpapi_pagination", {}): + search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination", {}).get("next")).query))) + else: + break + diff --git a/examples/pagination/naver-search-pagination-example.py b/examples/pagination/naver-search-pagination-example.py new file mode 100644 index 0000000..08d826c --- /dev/null +++ b/examples/pagination/naver-search-pagination-example.py @@ -0,0 +1,37 @@ +# docs: https://serpapi.com/naver-search-api + +from serpapi import NaverSearch +from urllib.parse import (parse_qsl, urlsplit) + +params = { + "api_key": "...", # serpapi api key + "engine": "naver", # search engine + "query": "minecraft redstone" # search query +} + +search = NaverSearch(params) # where data extraction happens + +# just to show the page number +page_num = 0 + +# iterate over all pages +while True: + results = search.get_dict() # JSON -> Python dict + + if "error" in results: + print(results["error"]) + break + + page_num += 1 + print(f"Current page: {page_num}") + + # iterate over organic results and extract the data + for result in results.get("organic_results", []): + print(result.get("position"), result.get("title"), sep="\n") + + # check if the next page key is present in the JSON + # if present -> split URL in parts and update to the next page + if "next" in results.get("serpapi_pagination", {}): + search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination", {}).get("next")).query))) + else: + break \ No newline at end of file diff --git a/examples/pagination/walmart-search-pagination-example.py b/examples/pagination/walmart-search-pagination-example.py new file mode 100644 index 0000000..1aa62df --- /dev/null +++ b/examples/pagination/walmart-search-pagination-example.py @@ -0,0 +1,36 @@ +from serpapi import WalmartSearch +from urllib.parse import (parse_qsl, urlsplit) + +params = { + "api_key": "...", # serpapi api key + "engine": "walmart", # search engine + "device": "desktop", # device type + "query": "minecraft redstone" # search query +} + +search = WalmartSearch(params) # where data extraction happens + +# just to show the page number +page_num = 0 + +# iterate over all pages +while True: + results = search.get_dict() # JSON -> Python dict + + if "error" in results: + print(results["error"]) + break + + page_num += 1 + print(f"Current page: {page_num}") + + # iterate over organic results and extract the data + for result in results.get("organic_results", []): + print(result["seller_name"], result["title"], sep="\n") + + # check if the next page key is present in the JSON + # if present -> split URL in parts and update to the next page + if "next" in results.get("pagination", {}): + search.params_dict.update(dict(parse_qsl(urlsplit(results.get("pagination", {}).get("next")).query))) + else: + break \ No newline at end of file diff --git a/examples/pagination/yahoo-search-pagination-example.py b/examples/pagination/yahoo-search-pagination-example.py new file mode 100644 index 0000000..11caaed --- /dev/null +++ b/examples/pagination/yahoo-search-pagination-example.py @@ -0,0 +1,37 @@ +# docs: https://serpapi.com/yahoo-search-api + +from serpapi import YahooSearch +from urllib.parse import (parse_qsl, urlsplit) + +params = { + "api_key": "...", # serpapi api key + "engine": "yahoo", # search engine + "p": "minecraft redstone" # search query +} + +search = YahooSearch(params) # where data extraction happens + +# just to show the page number +page_num = 0 + +# iterate over all pages +while True: + results = search.get_dict() # JSON -> Python dict + + if "error" in results: + print(results["error"]) + break + + page_num += 1 + print(f"Current page: {page_num}") + + # iterate over organic results and extract the data + for result in results.get("organic_results", []): + print(result["position"], result["title"], sep="\n") + + # check if the next page key is present in the JSON + # if present -> split URL in parts and update to the next page + if "next" in results.get("serpapi_pagination", {}): + search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination", {}).get("next")).query))) + else: + break \ No newline at end of file diff --git a/examples/pagination/yahoo-shopping-pagination-example.py b/examples/pagination/yahoo-shopping-pagination-example.py new file mode 100644 index 0000000..eab7936 --- /dev/null +++ b/examples/pagination/yahoo-shopping-pagination-example.py @@ -0,0 +1,26 @@ +# docs: https://serpapi.com/yahoo-shopping-search-api + +from serpapi import YahooSearch + +params = { + "api_key": "...", # serpapi api key + "engine": "yahoo_shopping", # search engine + "p": "redstone" # search query +} + +search = YahooSearch(params) # where data extraction happens +pages = search.pagination() # paginates over all pages + +page_num = 0 + +for page in pages: + + if "error" in page: + print(page["error"]) + break + + page_num += 1 + print(f"Current page: {page_num}") + + for result in page.get("shopping_results", []): + print(result["title"]) \ No newline at end of file diff --git a/examples/pagination/yelp-search-pagination-example.py b/examples/pagination/yelp-search-pagination-example.py new file mode 100644 index 0000000..e0cb0b5 --- /dev/null +++ b/examples/pagination/yelp-search-pagination-example.py @@ -0,0 +1,38 @@ +# docs: https://serpapi.com/yelp-search-api + +from serpapi import GoogleSearch # will be changed to YelpSearch in the future +from urllib.parse import (parse_qsl, urlsplit) + +params = { + "api_key": "...", # serpapi api key + "engine": "yelp", # search engine + "find_loc": "Austin, TX, United States", # location of the search + "find_desc": "Spa" # search query +} + +search = GoogleSearch(params) # where data extraction happens + +# to show page number +page_num = 0 + +# iterate over all pages +while True: + results = search.get_dict() # JSON -> Python dict + + if "error" in results: + print(results["error"]) + break + + page_num += 1 + print(f"Current page: {page_num}") + + # iterate over organic results and extract the data + for result in results.get("organic_results", []): + print(result.get("position"), result.get("title"), sep="\n") + + # check if the next page key is present in the JSON + # if present -> split URL in parts and update to the next page + if "next" in results.get("serpapi_pagination", {}): + search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination", {}).get("next")).query))) + else: + break \ No newline at end of file diff --git a/examples/pagination/youtube-search-pagination-example.py b/examples/pagination/youtube-search-pagination-example.py new file mode 100644 index 0000000..91aa850 --- /dev/null +++ b/examples/pagination/youtube-search-pagination-example.py @@ -0,0 +1,37 @@ +# docs: https://serpapi.com/youtube-search-api + +from serpapi import YoutubeSearch +from urllib.parse import (parse_qsl, urlsplit) + +params = { + "api_key": "...", # serpapi api key + "engine": "youtube", # search engine + "search_query": "minecraft redstone" # search query +} + +search = YoutubeSearch(params) # where data extraction happens + +# just to show the page number +page_num = 0 + +# iterate over all pages +while True: + results = search.get_dict() # JSON -> Python dict + + if "error" in results: + print(results["error"]) + break + + page_num += 1 + print(f"Current page: {page_num}") + + # iterate over organic results and extract the data + for result in results.get("video_results", []): + print(result["position_on_page"], result["title"], sep="\n") + + # check if the next page key is present in the JSON + # if present -> split URL in parts and update to the next page + if "next" in results.get("serpapi_pagination", {}): + search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination", {}).get("next")).query))) + else: + break \ No newline at end of file