A high-performance, generic RSS parser for Rust that supports streaming parsing from any async input source including files, TCP streams, HTTP responses, and more.
- 🚀 Generic AsyncRead Support: Parse RSS from files, TCP streams, HTTP responses, or any
AsyncRead
source - 🔄 Streaming Interface: Implements
tokio_stream::Stream
for memory-efficient processing of large feeds - 📊 Gradual Parsing: Process RSS items one at a time without loading entire feed into memory
- 🏷️ CDATA Support: Handles both regular text content and CDATA sections
- 🔤 Case Insensitive: Robust parsing of RSS feeds with inconsistent tag casing
- ⚡ Async/Await: Built on tokio for high-performance async I/O
- 🛡️ Type Safe: Leverage Rust's type system with custom RSS item structures
Add this to your Cargo.toml
:
[dependencies]
rss-parser = "0.1.0"
tokio = { version = "1.0", features = ["full"] }
quick-xml = "0.31"
tokio-stream = "0.1"
use rss_parser::{GradualRssItem, XmlNode};
#[derive(Debug)]
struct Article {
title: Option<String>,
description: Option<String>,
link: Option<String>,
pub_date: Option<String>,
}
impl GradualRssItem for Article {
fn init() -> Self {
Article {
title: None,
description: None,
link: None,
pub_date: None,
}
}
fn populate(&mut self, node: XmlNode) {
match node.tag.as_str() {
"title" => self.title = node.value.or(node.cdata),
"description" => self.description = node.value.or(node.cdata),
"link" => self.link = node.value.or(node.cdata),
"pubdate" => self.pub_date = node.value.or(node.cdata),
_ => {}
}
}
}
use rss_parser::RssParser;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut parser = RssParser::<Article, _>::from_file("feed.xml").await?;
while let Some(article) = parser.next().await {
println!("Title: {:?}", article.title);
println!("Link: {:?}", article.link);
}
Ok(())
}
use reqwest;
use rss_parser::RssParser;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let response = reqwest::get("https://example.com/feed.xml").await?;
let stream = response.bytes_stream();
// Convert bytes stream to AsyncRead
let reader = tokio_util::io::StreamReader::new(
stream.map(|result| result.map_err(std::io::Error::other))
);
let mut parser = RssParser::<Article, _>::new(reader).await?;
while let Some(article) = parser.next().await {
println!("Article: {:?}", article);
}
Ok(())
}
use tokio::net::TcpStream;
use rss_parser::RssParser;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let stream = TcpStream::connect("example.com:80").await?;
let parser = RssParser::<Article, _>::from_tcp(stream).await?;
// Use as stream
use tokio_stream::StreamExt;
let articles: Vec<Article> = parser.collect().await;
println!("Parsed {} articles", articles.len());
Ok(())
}
use tokio_stream::StreamExt;
use rss_parser::RssParser;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let parser = RssParser::<Article, _>::from_file("feed.xml").await?;
// Process articles as they're parsed
parser
.for_each(|article| async move {
println!("Processing: {:?}", article.title);
// Process article...
})
.await;
Ok(())
}
The parser accepts any type implementing AsyncRead + Unpin
:
use std::io::Cursor;
use rss_parser::RssParser;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let rss_data = r#"<?xml version="1.0"?>
<rss version="2.0">
<channel>
<item>
<title>Example Article</title>
<description>This is an example</description>
</item>
</channel>
</rss>"#;
let cursor = Cursor::new(rss_data.as_bytes());
let mut parser = RssParser::<Article, _>::new(cursor).await?;
if let Some(article) = parser.next().await {
println!("Parsed: {:?}", article);
}
Ok(())
}
use tokio_stream::StreamExt;
let parser = RssParser::<Article, _>::from_file("feed.xml").await?;
let recent_articles: Vec<Article> = parser
.filter(|article| {
// Filter articles based on some criteria
article.title.as_ref().map_or(false, |title| title.contains("Rust"))
})
.take(10) // Take only first 10 matching articles
.collect()
.await;
The main parser struct, generic over:
T
: Your RSS item type implementingGradualRssItem
R
: The input source implementingAsyncRead + Unpin
new(input: R) -> Result<Self, std::io::Error>
: Create parser from any AsyncRead sourcefrom_file(path: &str) -> Result<Self, std::io::Error>
: Convenience constructor for filesfrom_tcp(stream: TcpStream) -> Result<Self, std::io::Error>
: Convenience constructor for TCP streamsnext(&mut self) -> Option<T>
: Parse and return the next RSS item- Implements
Stream<Item = T>
for use withtokio-stream
Implement this trait for your RSS item structures:
pub trait GradualRssItem {
fn init() -> Self;
fn populate(&mut self, node: XmlNode);
}
Represents a parsed XML node:
pub struct XmlNode {
pub tag: String, // The XML tag name (lowercase)
pub value: Option<String>, // Text content
pub cdata: Option<String>, // CDATA content
}
The parser is designed for high performance and low memory usage:
- Streaming: Processes RSS items one at a time, not loading entire feed into memory
- Zero-copy: Minimizes string allocations where possible
- Async: Non-blocking I/O for handling multiple feeds concurrently
The parser uses Rust's standard error handling patterns:
- Constructor methods return
Result<RssParser<T, R>, std::io::Error>
next()
returnsOption<T>
-None
indicates end of feed or parse error- Malformed XML is handled gracefully, skipping problematic sections when possible
- Rust 1.75+
- tokio runtime
quick-xml
for XML parsingtokio-stream
for Stream implementation
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
cargo test
cargo run --example basic_usage
cargo run --example http_parsing
cargo run --example stream_processing
This project is licensed under the MIT License - see the LICENSE file for details.
- Initial release
- Generic AsyncRead support
- Stream trait implementation
- File and TCP convenience constructors
- CDATA support
- Case-insensitive parsing
- quick-xml - Fast XML parser used internally
- tokio - Async runtime
- feed-rs - Alternative feed parser with more format support
Note: This parser is specifically designed for RSS feeds. For Atom feeds or other syndication formats, consider using a more comprehensive feed parsing library.