Documentation for the older versions of the package are at: V1_Doc
Introduction
parallel_sync is a python package for uploading or downloading files using multiprocessing and md5 checks. It can do operations such as rsync, scp, wget. It can use used on both Windows and Linux and Mac OS. Note that on Windows, you need to have OpenSsh enabled and the package will automaticalled use scp instead of rsync.
How to install:
pip install parallel_sync
Requirement:
- Python >= 3
- ssh service must be installed and running.
- if rsync is installed on the local machine, it will be used, otherwise it will fall back to using scp.
- To use the wget method, you need to install wget on the target machine
- To untar/unzip files you need tar/zip packages installed on the target machine
Benefits:
- Very fast file transfer (parallelized)
- If the file exists and is not changed, it will not waste time copying it
- You can specify retries in case you have a bad connection
- It can handle large files
In most of the examples below, you can specify parallelism
and tries
which allow you to parallelize tasks and retry upon failure.
By default parallelism
is set to 10 workers and tries is 1.
from parallel_sync import rsync, Credential
creds = Credential(username='user',
hostname='192.168.168.9',
port=3022,
key_filename='~/.ssh/id_rsa')
rsync.upload('/tmp/x', '/tmp/y', creds=creds, exclude=['*.pyc', '*.sh'])
from parallel_sync import rsync
creds = {'user': 'myusername', 'key':'~/.ssh/id_rsa', 'host':'192.168.16.31'}
rsync.download('/tmp/y', '/tmp/z', creds=creds)
from parallel_sync import rsync, Credential
creds = Credential(username='user',
hostname='192.168.168.9',
port=3022,
key_filename='~/.ssh/id_rsa')
rsync.download('/tmp/y', '/tmp/z', creds=creds)
For this, you need to have wget installed on the remote machine.
from parallel_sync import wget, Credential
creds = Credential(username='user',
hostname='192.168.168.9',
port=3022,
key_filename='~/.ssh/id_rsa')
urls = ['http://something.png', 'http://somthing.tar.gz', 'http://somthing.zip']
wget.download('/tmp', urls=urls, creds=creds)
Downloading files using requests package locally is simple but what if you want to parallelize it? Here is the solution for that:
from parallel_sync import downloader
urls = ['http://something1', 'http://somthing2', 'http://somthing3']
download('c:/temp/x',
extension='.png', parallelism=10)
from fabric import task
from parallel_sync import rsync, wget, get_fabric_credentials
@task
def deploy(conn):
creds = get_fabric_credentials(conn)
urls = ['http://something1', 'http://somthing2', 'http://somthing3']
wget.download(creds, '/tmp/images', urls)
rsync.upload('/src', '/dst', creds, tries=3)
Here you have a task called deploy. You can run it using the following command:
fab [user]@[hostname]:[port] -i [path to you key file] deploy
If you come across any bugs, please report it on github.