Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
9d19a23
re-number existing sections
honzajavorek Feb 24, 2026
4ea1d7a
kick off a stub of the AI course
honzajavorek Feb 24, 2026
8855f71
start working on the first lesson
honzajavorek Feb 24, 2026
78085fa
add todo marks
honzajavorek Feb 24, 2026
da2f454
few edits
honzajavorek Feb 24, 2026
44df7d5
continue with the first lesson
honzajavorek Feb 25, 2026
36bf716
remove line numbers
honzajavorek Feb 25, 2026
720c613
rename the course
honzajavorek Feb 25, 2026
e322b2d
rename lessons
honzajavorek Feb 25, 2026
f672475
lure Vale into ignoring crawlee.dev
honzajavorek Feb 25, 2026
5cfb90f
rename lesson files
honzajavorek Feb 25, 2026
428a9d7
make Vale happy
honzajavorek Feb 25, 2026
9bacf7a
ooops
honzajavorek Feb 25, 2026
4836f05
rework the installation
honzajavorek Feb 25, 2026
c4d7d23
finish the first lesson
honzajavorek Feb 26, 2026
86ec2fa
make markdownlint happy
honzajavorek Feb 26, 2026
133e83d
better writing
honzajavorek Feb 26, 2026
39277bf
we, not you
honzajavorek Feb 26, 2026
8f51f6f
improve the Apify paragraph
honzajavorek Feb 26, 2026
34914bb
re-number lessons
honzajavorek Mar 31, 2026
fae1e36
in progress first lesson
honzajavorek Mar 31, 2026
c43bfa5
wrap up the draft of the first lesson
honzajavorek Apr 1, 2026
2a01afa
make Vale happier
honzajavorek Apr 1, 2026
7b899bc
fix language and other improvements
honzajavorek Apr 1, 2026
c18d413
make Vale happier
honzajavorek Apr 1, 2026
4483aff
repurpose the lesson to agentic development
honzajavorek Apr 2, 2026
ed4d864
language improvements
honzajavorek Apr 2, 2026
8c8e660
better wording
honzajavorek Apr 2, 2026
e4d307d
change the new actor flow
honzajavorek Apr 7, 2026
5d74a60
better grammar and flow
honzajavorek Apr 7, 2026
fabcddc
polish the very intro to the course
honzajavorek Apr 7, 2026
3bb3983
refine info about back-and-forth between ChatGPT and Web IDE
honzajavorek Apr 7, 2026
3ca0f88
add admonition about why ChatGPT
honzajavorek Apr 7, 2026
8b2e17f
create new folder
honzajavorek Apr 8, 2026
77d6a6f
add other OS variants
honzajavorek Apr 8, 2026
5b4c635
simplify creating a folder
honzajavorek Apr 8, 2026
19afa70
progress with the second lesson
honzajavorek Apr 8, 2026
bb73f7f
remove noise
honzajavorek Apr 8, 2026
986751b
remove more noise
honzajavorek Apr 8, 2026
26d54ec
progress further with the second lesson
honzajavorek Apr 8, 2026
028c606
specify language
honzajavorek Apr 8, 2026
7798a94
add cursor installation
honzajavorek Apr 8, 2026
b54f085
make the images nicer and put them to webp
honzajavorek Apr 8, 2026
1673090
finish the second lesson (no proofreading yet)
honzajavorek Apr 9, 2026
39ed529
fix typo
honzajavorek Apr 9, 2026
cedbfda
proofreading
honzajavorek Apr 10, 2026
a3d5c5e
add description to the front matter
honzajavorek Apr 10, 2026
c31cd95
apply suggestions from code review
honzajavorek May 13, 2026
a64943a
fix word order
honzajavorek May 13, 2026
0da7fd5
hopefully improvements to flow of the lesson
honzajavorek May 13, 2026
2bf5e45
improving flow
honzajavorek May 13, 2026
63afd1a
update sources/academy/platform/scraping_with_apify_and_ai/01_develop…
honzajavorek May 15, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/styles/config/vocabularies/Docs/accept.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
SDK(s)
[Ss]torages
Crawlee
crawlee.dev
[Aa]utoscaling
CU

Expand Down
2 changes: 1 addition & 1 deletion sources/academy/platform/apify_platform.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Introduction to the Apify platform
description: Learn all about the Apify platform, all of the tools it offers, and how it can improve your overall development experience.
sidebar_position: 7
sidebar_position: 1
category: apify platform
slug: /apify-platform
---
Expand Down
2 changes: 1 addition & 1 deletion sources/academy/platform/deploying_your_code/index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Deploying your code to Apify
description: In this course learn how to take an existing project of yours and deploy it to the Apify platform as an Actor.
sidebar_position: 9
sidebar_position: 3
category: apify platform
slug: /deploying-your-code
---
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Expert scraping with Apify
description: After learning the basics of Actors and Apify, learn to develop pro-level scrapers on the Apify platform with this advanced course.
sidebar_position: 13
sidebar_position: 6
category: apify platform
slug: /expert-scraping-with-apify
---
Expand Down
2 changes: 1 addition & 1 deletion sources/academy/platform/getting_started/index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Getting started
description: Get started with the Apify platform by creating an account and learning about Apify Console, which is where all Apify Actors are born!
sidebar_position: 8
sidebar_position: 2
category: apify platform
slug: /getting-started
---
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
---
title: Developing a scraper with AI chat
description: Use ChatGPT and Apify to build a price-tracking web scraper with no coding knowledge. Learn how AI-generated code runs automatically in the cloud.
slug: /scraping-with-apify-and-ai/developing-scraper-with-ai-chat
unlisted: true
---

**In this lesson, we'll use ChatGPT and the Apify platform to create an app for tracking prices on an e-commerce website.**

---

Want to extract data from a website? Even without knowing how to code, we can open [ChatGPT](https://chatgpt.com/) and have a scraper ready. Let's say you want to track prices from [this Sales page](https://warehouse-theme-metal.myshopify.com/collections/sales). You'd type something like:

```text
Create a scraper in JavaScript which downloads
https://warehouse-theme-metal.myshopify.com/collections/sales,
extracts all the products in Sales and saves a CSV file,
which contains:

- Product name
- Product detail page URL
- Price
```

Try it! The generated code will most likely work out of the box, but the resulting program will still have a few caveats. Some are usability issues:

- _User-operated:_ We have to run the scraper ourselves. If we're tracking price trends, we need to remember to run it daily. If we want, for example, alerts for big discounts, manually running the program isn't much better than just checking the site in a browser every day.
- _Manual data management:_ Tracking prices over time means figuring out how to organize the exported data ourselves. Processing the data could also be tricky, since different analysis tools often require different formats.

Some are technical challenges:

- _No monitoring:_ Even if we knew how to set up a server or home installation to run our scraper regularly, we'd have little insight into whether it ran successfully, what errors or warnings occurred, how long it took, or what resources it used.
- _Anti-scraping risks:_ If the target website detects our scraper, they can rate-limit or block us. Sure, we could run it from a coffee shop's Wi-Fi, but eventually they'd block that too, and we'd seriously annoy our barista.

To overcome these limitations, we'll use [Apify](https://apify.com/), a platform where our scraper can run independently of our computer.

:::info Why ChatGPT

We use ChatGPT from OpenAI in this course only because it's the most widely used AI chat. Any similar tool, such as Google Gemini or Claude by Anthropic, will do.

:::

## Creating Apify account

First, let's [create a new Apify account](https://console.apify.com/sign-up). The signup flow takes us through a few checks to confirm we're human and that our email is valid. It adds a few steps, but it's necessary to prevent abuse of the platform.

Once we have an active account, we can start working on our scraper. Using the platform's resources costs money, but worry not, everything we cover here fits within [Apify's free tier](https://apify.com/pricing).

## Creating a new Actor

After logging in, we land on a page called **Apify Store**. Apify serves as both infrastructure where we can privately deploy and run our own scrapers, and as a marketplace where anyone can offer ready-made scrapers to others for rent. But let's hold off on exploring Apify Store for now. We'll navigate to **My Actors** under the **Development** menu:

![Apify Store welcome screen with Development menu highlighted](images/apify-nav-store.webp)

Your phone runs apps, Apify runs Actors. If we want Apify to run something for us, it must be wrapped in the Actor structure. Conveniently, the platform provides ready-made templates we can use. In **My Actors**, we'll click **Use template**:

![My Actors page with Use template button](images/apify-nav-my-actors.webp)

This opens the template selection screen. There are several templates to choose from, each for a different programming language or use case. We'll pick the first template, **Crawlee + Cheerio**. It has a yellow logo with the letters **JS**, which stands for JavaScript. That's the programming language our scraper will be written in:

![Template selection screen with Crawlee + Cheerio highlighted](images/apify-nav-templates.webp)

This opens a preview of the template, where we'll confirm our choice:

![Template preview screen with Use template button](images/apify-nav-template.webp)

And just like that, we have our first Actor! It's only a sample scraper that walks through a website and extracts page titles, but it's something we can already run, and it'll work.

## Running sample Actor

The Actor's detail page has plenty of tabs and settings, but for now we'll stay at **Source** → **Code**. That's where the **Web IDE** is.

IDE stands for _integrated development environment_. Fear not, it's just jargon for “an app for editing code, somewhat comfortably”. In the Web IDE, we can browse the files the Actor is made of, and change their contents.

![Web IDE](images/apify-web-ide.webp)

But for now, we'll hold off on changing anything. First, let's check that the Actor works. We'll hit the **Build** button, which tells the platform to take all the Actor files and prepare the program so we can run it.

The _build_ takes approximately one minute to finish. When done, the button becomes a **Start** button. Finally, we are ready. Let's press it!

The scraper starts running, and after another short wait, the first rows start to appear in the output table.

![Sample Actor output](images/apify-output-sample.webp)

In the end, we should get around 100 results, which we can immediately export to several formats suitable for data analysis, including those which MS Excel or Google Sheets can open.

## Modifying the code with ChatGPT

Of course, we don't want page titles. We want a scraper that tracks e-commerce prices. Let's prompt ChatGPT to change the code so that it scrapes the [Sales page](https://warehouse-theme-metal.myshopify.com/collections/sales).

:::info The Warehouse store

In this course, we'll scrape a real e-commerce site instead of artificial playgrounds or sandboxes. Shopify, a major e-commerce platform, has a demo store at [warehouse-theme-metal.myshopify.com](https://warehouse-theme-metal.myshopify.com/). It strikes a good balance between being realistic and stable enough for a tutorial.

:::

We'll open **New chat** in [ChatGPT](https://chatgpt.com/) and prepare a beginning of a prompt like this:

```text
I'm building an Apify Actor that will run on the Apify platform.
I need to modify a sample template project so it downloads
https://warehouse-theme-metal.myshopify.com/collections/sales,
extracts all products in Sales, and returns data with
the following information for each product:

- Product name
- Product detail page URL
- Price

Before the program ends, it should log how many products it collected.
Code from routes.js follows. Reply with a code block containing
a new version of that file.
```

Now let's switch back to Apify. In **Source** → **Code**, where we have the Web IDE, we'll select a file called `routes.js` inside the `src` folder. We'll see code similar to this:

```js
import { createCheerioRouter } from '@crawlee/cheerio';

export const router = createCheerioRouter();

router.addDefaultHandler(async ({ enqueueLinks, request, $, log, pushData }) => {
log.info('enqueueing new URLs');
await enqueueLinks();

// Extract title from the page.
const title = $('title').text();
log.info(`${title}`, { url: request.loadedUrl });

// Save url and title to Dataset - a table-like storage.
await pushData({ url: request.loadedUrl, title });
});
```

We'll select the full contents of the `routes.js` file and copy them to our clipboard. Then we'll use <kbd>Shift+↵</kbd> to add a few empty lines and paste the copied code.

After we submit it, ChatGPT should return a large code block with a new version of `routes.js`. We'll copy it, switch back to the Web IDE, and replace the original `routes.js` content.

And that's it, our scraper is ready!

## Changing Actor input

Almost ready… Before we test whether the new code works, we should also change what the Actor takes as input. The sample scraper walked through whatever website it got in the **Start URLs** input field, but we want our new scraper to use the Warehouse store Sales URL:

```text
https://warehouse-theme-metal.myshopify.com/collections/sales
```

Let's navigate through the tabs to **Source** → **Input**, change the URL, and click the **Save** button, which is somewhat hidden below the form:

![Actor input](images/apify-input.webp)

Now we're finally all set.

## Scraping products

After our changes, the main button we previously used for building and running conveniently became a **Save, Build & Start** button. Let's press it and see what happens!

Our project will automatically go through all phases, and then, in a minute or so, we should see the results appearing in the output area.

![Warehouse scraper output](images/apify-output-warehouse.webp)

At this point, we haven't told the platform much about the data we expect, so the **Overview** pane lists only product URLs. But if we go to **All fields**, we'll see that it really scraped everything we asked for:

| name | url | price |
| --- | --- | --- |
| JBL Flip 4 Waterproof Portable Bluetooth Speaker | https://warehouse-theme-metal.myshopify.com/products/jbl-flip-4-waterproof-portable-bluetooth-speaker | Sale price$74.95 |
| Sony XBR-950G BRAVIA 4K HDR Ultra HD TV | https://warehouse-theme-metal.myshopify.com/products/sony-xbr-65x950g-65-class-64-5-diag-bravia-4k-hdr-ultra-hd-tv | Sale priceFrom $1,398.00 |
| Sony SACS9 10" Active Subwoofer | https://warehouse-theme-metal.myshopify.com/products/sony-sacs9-10-inch-active-subwoofer | Sale price$158.00 |

…and so on. Looks good!

Well, does it? If we look closely, the prices include extra text, which isn't ideal. We'll improve this in the next lesson.

:::tip If output doesn't appear

If the scraper doesn't produce any rows, make sure you changed the input URL and applied all code changes.

If that doesn't help, check the **Log** next to **Output**. You can copy the whole log, paste it into ChatGPT, and let it figure out what went wrong.

If you're still stuck, open a clean new chat in ChatGPT and try the same prompt for `routes.js` again.

:::

## Wrapping up

Despite a few flaws, we've successfully created our first working prototype of a price-watching app with no coding knowledge.

And thanks to Apify, our scraper can [run automatically on a weekly basis](https://docs.apify.com/platform/schedules), we have its output [ready to download in a variety of formats](https://docs.apify.com/platform/storage/dataset), we can [monitor its runs](https://docs.apify.com/platform/monitoring), and we can [work around anti-scraping measures](https://docs.apify.com/platform/proxy).

To improve our project further, we'd copy the code, ask ChatGPT to refine it, paste it back into the Web IDE, and rebuild.

Sounds tedious? In the next lesson, we'll take a look at how we can get the Actor code onto our computer and use the Cursor IDE with a built-in AI agent instead of the Web IDE, so we can develop our scraper faster and with less back-and-forth.
Loading
Loading