Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serp API Code Challenge #282

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -49,3 +49,6 @@ build-iPhoneSimulator/
# unless supporting rvm < 1.11.0 or doing something fancy, ignore this:
.rvmrc
.DS_Store

#other
venv/
49 changes: 49 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Makefile

# Python interpreter to use
PYTHON = python3

# Path to the main script
MAIN_SCRIPT = src/scrape.py

# Directory containing tests
TEST_DIR = tests

# Virtual environment directory
VENV = venv

# Virtual environment activation
VENV_ACTIVATE = $(VENV)/bin/activate

# Install dependencies and set up virtual environment
.PHONY: install
install:
$(PYTHON) -m venv $(VENV)
. $(VENV_ACTIVATE) && pip install -r requirements.txt


# Run the main script
.PHONY: run
run:
. $(VENV_ACTIVATE) && $(PYTHON) $(MAIN_SCRIPT)

# Run all tests
.PHONY: test
test:
. $(VENV_ACTIVATE) && $(PYTHON) -m unittest discover $(TEST_DIR)

.PHONY: clean
clean:
find . -type f -name "*.pyc" -delete
find . -type d -name "__pycache__" -delete
rm -rf $(VENV)
rm -f files/generated_array.json

# Help command to show available commands
.PHONY: help
help:
@echo "Available commands:"
@echo " make install Install dependencies and set up virtual environment"
@echo " make run Run the main script"
@echo " make test Run all tests"
@echo " make clean Clean up pyc files, virtual environment, and generated_array.json"
100 changes: 84 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,96 @@
# Extract Van Gogh Paintings Code Challenge

Goal is to extract a list of Van Gogh paintings from the attached Google search results page.
## Prerequisites

![Van Gogh paintings](https://github.com/serpapi/code-challenge/blob/master/files/van-gogh-paintings.png?raw=true "Van Gogh paintings")
- Python 3.7+
- Make

## Instructions
### Checking Prerequisites

This is already fully supported on SerpApi. ([relevant test], [html file], [sample json], and [expected array].)
Try to come up with your own solution and your own test.
Extract the painting `name`, `extensions` array (date), and Google `link` in an array.
1. Check Python version:
<code>python3 --version</code>
This should return a version number 3.7 or higher.
2. Check if Make is installed:
<code>make --version</code>
This should display the version of Make installed on your system.

Fork this repository and make a PR when ready.
### Installing Prerequisites

Programming language wise, Ruby (with RSpec tests) is strongly suggested but feel free to use whatever you feel like.
- Python: If not installed, download and install from [python.org](https://www.python.org/downloads/)
- Make: If not installed:
- On Ubuntu/Debian: <code>sudo apt-get install make</code>
- On macOS: Install Xcode Command Line Tools by running <code>xcode-select --install</code>

Parse directly the HTML result page ([html file]) in this repository. No extra HTTP requests should be needed for anything.
## Installation

[relevant test]: https://github.com/serpapi/test-knowledge-graph-desktop/blob/master/spec/knowledge_graph_claude_monet_paintings_spec.rb
[sample json]: https://raw.githubusercontent.com/serpapi/code-challenge/master/files/van-gogh-paintings.json
[html file]: https://raw.githubusercontent.com/serpapi/code-challenge/master/files/van-gogh-paintings.html
[expected array]: https://raw.githubusercontent.com/serpapi/code-challenge/master/files/expected-array.json
To set up the project, run:
<code>make install</code>

Add also to your array the painting thumbnails present in the result page file (not the ones where extra requests are needed).
This command creates a virtual environment and installs all necessary dependencies.

Test against 2 other similar result pages to make sure it works against different layouts. (Pages that contain the same kind of carrousel. Don't necessarily have to be paintings.)
## Usage

The suggested time for this challenge is 4 hours. But, you can take your time and work more on it if you want.
### Running the Script

To generate the search array:
<code>make run</code>

This command will:
1. Parse van-gogh-paintings.html
2. Process the data
3. Generate a JSON array named `generated_array.json` in the files directory

### Running Tests

To run the test suite:

<code>make test</code>

This command runs all tests in the `tests` directory.

### Cleaning Up

To remove generated files, cached Python files, and the virtual environment:

<code>make clean</code>

## Running Without Make

If you prefer not to use Make, or if it's not available on your system, you can run the project directly using Python commands. Here's how:

1. Create a virtual environment:

<code>python3 -m venv venv</code>

2. Activate the virtual environment:

<code>source venv/bin/activate</code>

3. Install dependencies:

<code>pip install -r requirements.txt</code>

### Running the Script

4. Run the main script:

<code>python3 src/scrape.py</code>

### Running Tests

To run the test suite:

<code>python3 -m unittest discover tests</code>


## Project Structure

- `src/`: Main script for parsing html and generating json output
- `files/`: Supporting files including html and generated files
- `tests/`: Directory containing test files
- `Makefile`: Contains commands for installing dependencies, running the script, running tests, and cleaning up
- `.env`: (Not in repository) Contains API URL and API key

## Other HTML files

To use other HTML files, change file to 49ers-players.html or warriors.html in file/config.ini
Loading