Our mission is to enable all people to do the best work of their lives—the first act in achieving that mission is to help companies automate tedious but critical business processes. This RPA challenge showcases your ability to build a bot for process automation.
My challenge was to automate the process of extracting data from a news site. The goal was to demonstrate the ability to build an RPA bot that can perform a series of automated actions to retrieve, process, and store news data.
For this challenge, I used ONLY the Al Jazeera news website:
The process handles two main parameters via the Robocloud work item:
- news_topic: A list of search phrases (in Python list format).
- period_months: The number of months for which you need to receive news.
- news_topic:
["climate change", "politics", "technology"]
- period_months:
3
These parameters were provided via a Robocloud work item, allowing dynamic control over the bot's search criteria and the time frame for retrieving news articles.
-
Open the site by navigating to https://www.aljazeera.com/.
-
Enter a phrase in the search field and initiate the search.
-
On the result page:
- Select a news category or section from the available options if applicable.
- Choose the latest news articles.
-
Extract the following values for each article:
- Title
- Date
- Description
- Picture filename
- Count of search phrases in the title and description
- True or False, depending on whether the title or description contains any amount of money
Possible formats for amounts of money:
- $11.1
- $111,111.11
- 11 dollars
- 11 USD
-
Store the extracted data in an Excel file with columns:
- Title
- Date
- Description
- Picture filename
- Count of search phrases in the title and description
- True or False for the presence of monetary amounts
-
Download the news picture and specify the file name in the Excel file.
-
Repeat steps 4-6 for all news articles that fall within the required time period.
-
Clone the Repository:
git clone [your-public-repo-link] cd [your-repo-directory]
-
Create a Robocorp Control Room Process:
- Follow the Robocorp Control Room setup guide.
- Create a new process in Robocorp Control Room.
- Upload your code to the process.
-
Configure Parameters in Robocorp:
- Define the
news_topic
andperiod_months
parameters within the Robocloud work item. - Example configuration:
{ "news_topic": ["climate change", "politics", "technology"], "period_months": 3 }
- Define the
-
Ensure Successful Run:
- Run the process and ensure it completes successfully.
- Write the output files to the
/output
directory to make them visible in the artifacts list.
-
Invite Reviewers:
- Once completed, invite [email protected] to your Robocorp Org.
The output is an Excel file stored in the /output
directory containing the extracted news data with the following columns:
- Title
- Date
- Description
- Picture filename
- Count of search phrases in the title and description
- True or False for the presence of monetary amounts
Additionally, the images downloaded from the news articles are stored in the same directory.
This challenge demonstrates the ability to automate data extraction from the Al Jazeera news site using RPA tools and techniques. The solution handles parameterized inputs, processes the data, and stores the results efficiently.
Feel free to reach out for any clarifications or further assistance.