Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/PAXOS_OP_QUEUE is too small for this app #11

Open
vschiavoni opened this issue Nov 2, 2016 · 4 comments
Open

/PAXOS_OP_QUEUE is too small for this app #11

vschiavoni opened this issue Nov 2, 2016 · 4 comments

Comments

@vschiavoni
Copy link

Executing a benchmark that hammers the Crane proxy with several requests, the proxy stops working. On the primary's logs, we find:

/PAXOS_OP_QUEUE-valerio is too small for this app. Please enlarge it in paxos-op-queue.h

We've tried several different values by changing the value of the ELEM_CAPACITY here: https://github.com/columbia/crane/blob/master/xtern/lib/runtime/paxos-op-queue.cpp#L34

Is that the right variable to change ? It doesn't seem to produce beneficial effects on our benchmark.

@hemingcui
Copy link
Contributor

Dear Valerio,
We ran hundreds of thousands of requests on our benchmarks. The "
PAXOS_OP_QUEUE" should not has a restrition as it is just a inter-process
shared file there. Could you double check your systems settings and let us
know? Thanks.

On Wed, Nov 2, 2016 at 6:31 PM, Valerio Schiavoni [email protected]
wrote:

Executing a benchmark that hammers the Crane proxy with several requests,
the proxy stops working. On the primary's logs, we find:

/PAXOS_OP_QUEUE-valerio is too small for this app. Please enlarge it in
paxos-op-queue.h

We've tried several different values by changing the value of the
ELEM_CAPACITY here: https://github.com/columbia/
crane/blob/master/xtern/lib/runtime/paxos-op-queue.cpp#L34

Is that the right variable to change ? It doesn't seem to produce
beneficial effects on our benchmark.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#11, or mute the thread
https://github.com/notifications/unsubscribe-auth/AC8Q7OyGj1ONkvjIDq8bHgm8LGCj9vKGks5q6GZsgaJpZM4KnFDB
.

@vschiavoni
Copy link
Author

I can double-check all the settings: what but in particular ?
What could be the source of the issue here ?
Have you ever tried with the wrk2 tool ?
This is the command that we use:
LD_PRELOAD=/home/valerio/crane/libevent_paxos/client-ld-preload/libclilib.so.1.0 /opt/wrk2/bin/wrk -d 30s -c 8 -t 8 -R 1000 http://10.3.1.1:9000/test.php
The wrk2 tool is this: https://github.com/giltene/wrk2.

@hemingcui
Copy link
Contributor

Dear Valerio,
Fro your output/commands, I am not able to find out what the issues may
be. In general a distributed system is quite complicated and some minor
mismatching configurations can cause big problems. We have provided an
integrated script framework to run our benchmark programs, which contain
correct configurations to our knowledge. I encourage you start from our
framework and port your wrk2 tool into our framework (e.g., write a new cfg
for your tool).

On Thu, Nov 3, 2016 at 6:01 AM, Valerio Schiavoni [email protected]
wrote:

I can double-check all the settings: what but in particular ?
What could be the source of the issue here ?
Have you ever tried with the wrk2 tool ?
This is the command that we use:
LD_PRELOAD=/home/valerio/crane/libevent_paxos/client-ld-preload/libclilib.so.1.0
/opt/wrk2/bin/wrk -d 30s -c 8 -t 8 -R 1000 http://10.3.1.1:9000/test.php
The wrk2 tool is this: https://github.com/giltene/wrk2.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#11 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AC8Q7Mod7mwOnk81DPkX--l2FFEtywicks5q6bENgaJpZM4KnFDB
.

@vschiavoni
Copy link
Author

vschiavoni commented Nov 5, 2016

Dear Heming,
I'm quite aware of the difficulties in debugging a distributed system. The tool that I've mentioned is a client-side http stress tool: it simply sends HTTP requests to the proxy or the server and gets back numbers, similar to apache-bench you used in the Crane paper.
From that perspective, the wrk2 tool itself is well configured.

On the server side, we use the apache.sh configuration provided by your framework (https://github.com/columbia/crane/blob/master/eval-container/configs/apache.sh), where the only difference on our side is to execute the client on a separate machine, and not on the primary itself.
We use the joint_sched mode.

Do you know/remember if you had to do special system configurations on the primary and replica nodes with respect to the shared memory segment size ? These are the settings on the cluster:

# sysctl -a | grep -E "shm"
kernel.shm_next_id = -1
kernel.shm_rmid_forced = 0
kernel.shmall = 18446744073692774399
kernel.shmmax = 18446744073692774399
kernel.shmmni = 4096
vm.hugetlb_shm_group = 0

# ipcs -lm

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 18014398509465599
max total shared memory (kbytes) = 18446744073642442748
min seg size (bytes) = 1

We run the system on Linux 4.4.0-34-generic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants