mapreduce

A simple mapreduce implementation heavily influenced by MIT 6.824 and the original paper. The project contains one simple, sequential implementation and one more complex, distributed implementation using gRPC.

Usage

Simply clone the repository and run either the simple implementation or the more complex one like displayed below:

Sequential implementation

$ cargo r --bin mapreduce-sequential file.txt anotherfile.txt ...

Distributed implementation

$ cargo r --bin mapreduce-server file.txt anotherfile.txt ...
$ cargo r --bin mapreduce-client

Note on the input files

Currently, the implementation supports a simple wordcount. Therefore, the input files just need to contain some plain text.

After executing, you should be able to locate at least one new file /tmp/mr-out-*. This should contain a list of words associated with their respective number of occurences in the input files. Note that increasing the number of worker nodes using the make_coordinator() function inside main() of src/server.rs will lead to multiple output files.

Contributing

Fork it
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am "feat: Add something")
Push to the branch (git push origin my-new-feature)
Create Pull Request

License

SPDX-License-Identifier: AGPL-3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

mapreduce

Usage

Sequential implementation

Distributed implementation

Note on the input files

Contributing

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

mapreduce

Usage

Sequential implementation

Distributed implementation

Note on the input files

Contributing

License