diff --git a/Cargo.toml b/Cargo.toml index 48f9930..9728f3c 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "pregel-rs" -version = "0.0.11" +version = "0.0.12" authors = [ "Ángel Iglesias Préstamo " ] description = "A Graph library written in Rust for implementing your own algorithms in a Pregel fashion" documentation = "https://docs.rs/crate/pregel-rs/latest" diff --git a/README.md b/README.md index 2b45c03..8a11925 100644 --- a/README.md +++ b/README.md @@ -6,73 +6,73 @@ [![documentation](https://img.shields.io/docsrs/pregel-rs/latest)](https://docs.rs/pregel-rs/latest/pregel_rs/) `pregel-rs` is a Graph processing library written in Rust that features -a Pregel-based Framework for implementing your own algorithms in a -message-passing fashion. It is designed to be efficient and scalable, +a Pregel-based Framework for implementing your own algorithms in a +message-passing fashion. It is designed to be efficient and scalable, making it suitable for processing large-scale graphs. ## Features - _Pregel-based framework_: `pregel-rs` is a powerful graph processing model -that allows users to implement graph algorithms in a message-passing fashion, -where computation is performed on vertices and messages are passed along edges. -`pregel-rs` provides a framework that makes it easy to implement graph -algorithms using this model. + that allows users to implement graph algorithms in a message-passing fashion, + where computation is performed on vertices and messages are passed along edges. + `pregel-rs` provides a framework that makes it easy to implement graph + algorithms using this model. -- _Rust-based implementation_: `pregel-rs` is implemented in Rust, a systems -programming language known for its safety, concurrency, and performance. -Rust's strong type system and memory safety features help ensure that `pregel-rs` -is robust and reliable. +- _Rust-based implementation_: `pregel-rs` is implemented in Rust, a systems + programming language known for its safety, concurrency, and performance. + Rust's strong type system and memory safety features help ensure that `pregel-rs` + is robust and reliable. - _Efficient and scalable_: `pregel-rs` designed to be efficient and scalable, -making it suitable for processing large-scale graphs. It uses parallelism and -optimization techniques to minimize computation and communication overhead, -allowing it to handle graphs with millions or even billions of vertices and edges. -For us to achieve this, we have built it on top of [polars](https://github.com/pola-rs/polars) -a blazingly fast DataFrames library implemented in Rust using Apache Arrow -Columnar Format as the memory model. - -- _Graph abstraction_: `pregel-rs` provides a graph abstraction that makes -it easy to represent and manipulate graphs in Rust. It supports both directed and -undirected graphs, and provides methods for adding, removing, and querying vertices -and edges. + making it suitable for processing large-scale graphs. It uses parallelism and + optimization techniques to minimize computation and communication overhead, + allowing it to handle graphs with millions or even billions of vertices and edges. + For us to achieve this, we have built it on top of [polars](https://github.com/pola-rs/polars) + a blazingly fast DataFrames library implemented in Rust using Apache Arrow + Columnar Format as the memory model. + +- _Graph abstraction_: `pregel-rs` provides a graph abstraction that makes + it easy to represent and manipulate graphs in Rust. It supports both directed and + undirected graphs, and provides methods for adding, removing, and querying vertices + and edges. - _Customizable computation_: `pregel-rs` allows users to implement their own -computation logic by defining vertex computation functions. This gives users the -flexibility to implement their own graph algorithms and customize the behavior -of `pregel-rs` to suit their specific needs. + computation logic by defining vertex computation functions. This gives users the + flexibility to implement their own graph algorithms and customize the behavior + of `pregel-rs` to suit their specific needs. ## Getting started To get started with `pregel-rs`, you can follow these steps: 1. _Install Rust_: `pregel-rs` requires Rust to be installed on your system. -You can install Rust by following the instructions on the official Rust website: -https://www.rust-lang.org/tools/install + You can install Rust by following the instructions on the official Rust website: + https://www.rust-lang.org/tools/install 2. _Create a new Rust project_: Once Rust is installed, you can create a new Rust -project using the Cargo package manager, which is included with Rust. You can -create a new project by running the following command in your terminal: + project using the Cargo package manager, which is included with Rust. You can + create a new project by running the following command in your terminal: ```sh cargo new my_pregel_project ``` -3. _Add `pregel-rs` as a dependency_: Next, you need to add `pregel-rs` as a -dependency in your `Cargo.toml` file, which is located in the root directory -of your project. You can add the following line to your `Cargo.toml` file: +3. _Add `pregel-rs` as a dependency_: Next, you need to add `pregel-rs` as a + dependency in your `Cargo.toml` file, which is located in the root directory + of your project. You can add the following line to your `Cargo.toml` file: ```toml [dependencies] -pregel-rs = "0.0.11" +pregel-rs = "0.0.12" ``` 4. _Implement your graph algorithm_: Now you can start implementing your graph -algorithm using the `pregel-rs` framework. You can define your vertex computation -functions and use the graph abstraction provided by `pregel-rs` to manipulate the graph. + algorithm using the `pregel-rs` framework. You can define your vertex computation + functions and use the graph abstraction provided by `pregel-rs` to manipulate the graph. 5. _Build and run your project_: Once you have implemented your graph algorithm, you -can build and run your project using the Cargo package manager. You can build your -project by running the following command in your terminal: + can build and run your project using the Cargo package manager. You can build your + project by running the following command in your terminal: ```sh cargo build @@ -87,14 +87,14 @@ cargo run ## Acknowledgments Read [Pregel: A System for Large-Scale Graph Processing](https://15799.courses.cs.cmu.edu/fall2013/static/papers/p135-malewicz.pdf) -for a reference on how to implement your own Graph processing algorithms in a Pregel fashion. If you want to take some +for a reference on how to implement your own Graph processing algorithms in a Pregel fashion. If you want to take some inspiration from some curated-sources, just explore the [/examples](https://github.com/angelip2303/graph-rs/tree/main/examples) folder of this repository. ## Related projects -1. [GraphX](https://github.com/apache/spark/tree/master/graphx) is a library enabling Graph processing in the context of -Apache Spark. +1. [GraphX](https://github.com/apache/spark/tree/master/graphx) is a library enabling Graph processing in the context of + Apache Spark. 2. [GraphFrames](https://github.com/graphframes/graphframes) is the DataFrame-based equivalent to GraphX. ## License @@ -108,11 +108,11 @@ the Free Software Foundation, either version 3 of the License, or This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of -MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License -along with this program. If not, see . +along with this program. If not, see . **By contributing to this project, you agree to release your contributions under the same license.** diff --git a/examples/pagerank.rs b/examples/pagerank.rs index d00c249..b5a8b8a 100644 --- a/examples/pagerank.rs +++ b/examples/pagerank.rs @@ -26,6 +26,10 @@ fn main() -> Result<(), Box> { .max_iterations(4) .with_vertex_column(Custom("rank")) .initial_message(lit(1.0 / num_vertices)) + .send_messages( + MessageReceiver::Subject, + Column::subject(Column::Custom("rank")) / Column::subject(Column::Custom("out_degree")), + ) .send_messages( MessageReceiver::Object, Column::subject(Custom("rank")) / Column::subject(Custom("out_degree")), diff --git a/src/pregel.rs b/src/pregel.rs index 7a025f6..29e4d77 100644 --- a/src/pregel.rs +++ b/src/pregel.rs @@ -233,17 +233,6 @@ impl<'a> SendMessage<'a> { /// each iteration of the algorithm. The vertex program can take as input the current /// state of the vertex, the messages received from its neighbors or and any other /// relevant information. -/// -/// * `replace_nulls`: `replace_nulls` is an expression that defines how null values -/// in the vertex DataFrame should be replaced. This is useful when the vertex -/// DataFrame contains null values that need to be replaced during the execution of -/// the Pregel algorithm. As an example, when not all vertices are connected to an -/// edge, the edge DataFrame will contain null values in the `dst` column. These -/// null values need to be replaced. -/// -/// * `parquet_path` is a property of the `PregelBuilder` struct that represents -/// the path to the Parquet file where the results of the Pregel computation -/// will be stored. pub struct Pregel<'a> { /// The `graph` property is a `GraphFrame` struct that represents the /// graph data structure used in the Pregel algorithm. It contains information about @@ -277,13 +266,6 @@ pub struct Pregel<'a> { /// current state of the vertex, the messages received from its neighbors or /// and any other relevant information. v_prog: FnBox<'a>, - /// `replace_nulls` is an expression that defines how null values in the vertex - /// DataFrame should be replaced. This is useful when the vertex DataFrame - /// contains null values that need to be replaced during the execution of the - /// Pregel algorithm. As an example, when not all vertices are connected to an - /// edge, the edge DataFrame will contain null values in the `dst` column. These - /// null values need to be replaced. - replace_nulls: Expr, } /// The `PregelBuilder` struct represents a builder for configuring the Pregel @@ -325,13 +307,6 @@ pub struct Pregel<'a> { /// each iteration of the algorithm. The vertex program can take as input the current /// state of the vertex, the messages received from its neighbors or and any other /// relevant information. -/// -/// /// * `replace_nulls`: `replace_nulls` is an expression that defines how null values -/// in the vertex DataFrame should be replaced. This is useful when the vertex -/// DataFrame contains null values that need to be replaced during the execution of -/// the Pregel algorithm. As an example, when not all vertices are connected to an -/// edge, the edge DataFrame will contain null values in the `dst` column. These -/// null values need to be replaced. pub struct PregelBuilder<'a> { /// The `graph` property is a `GraphFrame` struct that represents the /// graph data structure used in the Pregel algorithm. It contains information about @@ -365,13 +340,6 @@ pub struct PregelBuilder<'a> { /// current state of the vertex, the messages received from its neighbors or /// and any other relevant information. v_prog: FnBox<'a>, - /// `replace_nulls` is an expression that defines how null values in the vertex - /// DataFrame should be replaced. This is useful when the vertex DataFrame - /// contains null values that need to be replaced during the execution of the - /// Pregel algorithm. As an example, when not all vertices are connected to an - /// edge, the edge DataFrame will contain null values in the `dst` column. These - /// null values need to be replaced. - replace_nulls: Expr, } /// This code is defining an enumeration type `MessageReceiver` in Rust with @@ -419,7 +387,6 @@ impl<'a> PregelBuilder<'a> { send_messages: Default::default(), aggregate_messages: Box::new(Default::default), v_prog: Box::new(Default::default), - replace_nulls: Default::default(), } } @@ -665,27 +632,6 @@ impl<'a> PregelBuilder<'a> { self } - /// This function sets the value of a field called "replace_nulls" in a struct to a - /// given expression and returns the modified struct. - /// - /// Arguments: - /// - /// * `replace_nulls`: `replace_nulls` is a parameter of type `Expr` that is used in - /// a method of a struct. The method takes ownership of the struct (`self`) and the - /// `replace_nulls` parameter, and sets the `replace_nulls` field of the struct to the - /// value of the `replace_nulls` parameter. - /// - /// Returns: - /// - /// The `replace_nulls` method returns `Self`, which refers to the same struct - /// instance that the method was called on. This allows for method chaining, where - /// multiple methods can be called on the same struct instance in a single - /// expression. - pub fn replace_nulls(mut self, replace_nulls: Expr) -> Self { - self.replace_nulls = replace_nulls; - self - } - /// The function returns a Pregel struct with the specified properties. This is, /// Pregel structs are to be created using the `Builder Pattern`, a creational /// design pattern that provides a way to construct complex structs in a @@ -743,7 +689,6 @@ impl<'a> PregelBuilder<'a> { send_messages: self.send_messages, aggregate_messages: self.aggregate_messages, v_prog: self.v_prog, - replace_nulls: self.replace_nulls, } } } @@ -898,7 +843,6 @@ impl<'a> Pregel<'a> { col(Column::VertexId.as_ref()), // id column of the current_vertices DataFrame Column::msg(Some(Column::VertexId)), // msg.id column of the message_df DataFrame ) - .with_column(Column::msg(None).fill_null(self.replace_nulls.to_owned())) .select(&[ col(Column::VertexId.as_ref()), v_prog().alias(self.vertex_column.as_ref()), @@ -971,7 +915,11 @@ mod tests { .max_iterations(iterations) .with_vertex_column(Column::Custom("rank")) .initial_message(lit(1.0 / num_vertices)) - .replace_nulls(lit(0.0)) + .send_messages( + MessageReceiver::Subject, + Column::subject(Column::Custom("rank")) + / Column::subject(Column::Custom("out_degree")), + ) .send_messages( MessageReceiver::Object, Column::subject(Column::Custom("rank")) @@ -1077,7 +1025,6 @@ mod tests { v_prog: Box::new(|| { max_exprs([col(Column::Custom("max_value").as_ref()), Column::msg(None)]) }), - replace_nulls: lit(0), }) }