README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# newscatcheR

<!-- badges: start -->

[![CRAN\_Status\_Badge](https://www.r-pkg.org/badges/version/newscatcheR)](https://cran.r-project.org/package=newscatcheR)
[![CRAN\_Download\_Badge](https://cranlogs.r-pkg.org/badges/newscatcheR)](https://CRAN.R-project.org/package=newscatcheR)
[![CRAN\_Download\_Badge](https://cranlogs.r-pkg.org/badges/grand-total/newscatcheR)](https://CRAN.R-project.org/package=newscatcheR)
[![R-CMD-check](https://github.com/discindo/newscatcheR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/discindo/newscatcheR/actions/workflows/R-CMD-check.yaml)
![pkgdown](https://github.com/discindo/newscatcheR/workflows/pkgdown/badge.svg)
<!-- [![codecov](https://codecov.io/gh/discindo/newscatcheR/graph/badge.svg?token=u3BlKDT8b3)](https://codecov.io/gh/discindo/newscatcheR)>

<!-- badges: end -->

Programmatically collect normalized news from (almost) any website using R

newscatcheR is an R clone of the python package [newscatcher](https://github.com/kotartemiy/newscatcher).  

The package provides a dataset of news sites and their rss feeds, together with some characteristics of the websites such as the topic, country or language of the website, and few functions explore and access the feeds from `R`.

Two functions that work as a wrapper around [tidyRSS](https://github.com/RobertMyles/tidyRSS) can be used to fetch the feed from a given website. Two additional functions can be used to conveniently browse the websites dataset.

## Installation

You can install the released version of newscatcheR from [CRAN](https://CRAN.R-project.org) with:

``` r
install.packages("newscatcheR")
```

And the development version from [GitHub](https://github.com/) with:

```r
# install.packages("remotes")
remotes::install_github("discindo/newscatcheR")
```

Or

``` r
# install.packages("devtools")
devtools::install_github("discindo/newscatcheR")
```
## Overview

`get_news(website)` returns the contents of a rss feed of a website. 

```{r example_1}
library(newscatcheR)
# adding a small time delay to avoid simultaneous posts to the API
Sys.sleep(3)
get_news("ycombinator.com")
```

`get_headlines(website)` returns just the headlines of the website's rss feed.

```{r example_2}
library(newscatcheR)
# adding a small time delay to avoid simultaneous posts to the API
Sys.sleep(3)  
get_headlines("ycombinator.com")
```

`describe_url(website)` returns the topics of a given web site. 

```{r example_3}
describe_url("bbc.com")
```

`filter_urls(topic, country, language )` can be used to browse the dataset by topic, country or language.

```{r example_4}
filter_urls(topic = "tech", country = "IT", language = "it")
```

## Use Case

This package can be convenient if you need to fetch news from various websites for further analysis and you don't want to search manually for the URL of their RSS feeds.

Assuming we have the news sites we want to follow:

```{r example5, eval = FALSE}
sites = c("bbc.com", "spiegel.de", "washingtonpost.com")
```

We can get a list of data frames with:

```{r example_6, eval = FALSE}
library(newscatcheR)
lapply(sites, get_news)
```