/PAXOS_OP_QUEUE is too small for this app #11

vschiavoni · 2016-11-02T10:31:07Z

Executing a benchmark that hammers the Crane proxy with several requests, the proxy stops working. On the primary's logs, we find:

/PAXOS_OP_QUEUE-valerio is too small for this app. Please enlarge it in paxos-op-queue.h

We've tried several different values by changing the value of the ELEM_CAPACITY here: https://github.com/columbia/crane/blob/master/xtern/lib/runtime/paxos-op-queue.cpp#L34

Is that the right variable to change ? It doesn't seem to produce beneficial effects on our benchmark.

The text was updated successfully, but these errors were encountered:

hemingcui · 2016-11-03T00:32:36Z

Dear Valerio,
We ran hundreds of thousands of requests on our benchmarks. The "
PAXOS_OP_QUEUE" should not has a restrition as it is just a inter-process
shared file there. Could you double check your systems settings and let us
know? Thanks.

On Wed, Nov 2, 2016 at 6:31 PM, Valerio Schiavoni [email protected]
wrote:

Executing a benchmark that hammers the Crane proxy with several requests,
the proxy stops working. On the primary's logs, we find:

/PAXOS_OP_QUEUE-valerio is too small for this app. Please enlarge it in
paxos-op-queue.h

We've tried several different values by changing the value of the
ELEM_CAPACITY here: https://github.com/columbia/
crane/blob/master/xtern/lib/runtime/paxos-op-queue.cpp#L34

Is that the right variable to change ? It doesn't seem to produce
beneficial effects on our benchmark.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#11, or mute the thread
https://github.com/notifications/unsubscribe-auth/AC8Q7OyGj1ONkvjIDq8bHgm8LGCj9vKGks5q6GZsgaJpZM4KnFDB
.

vschiavoni · 2016-11-03T10:01:49Z

I can double-check all the settings: what but in particular ?
What could be the source of the issue here ?
Have you ever tried with the wrk2 tool ?
This is the command that we use:
LD_PRELOAD=/home/valerio/crane/libevent_paxos/client-ld-preload/libclilib.so.1.0 /opt/wrk2/bin/wrk -d 30s -c 8 -t 8 -R 1000 http://10.3.1.1:9000/test.php
The wrk2 tool is this: https://github.com/giltene/wrk2.

hemingcui · 2016-11-05T08:57:07Z

Dear Valerio,
Fro your output/commands, I am not able to find out what the issues may
be. In general a distributed system is quite complicated and some minor
mismatching configurations can cause big problems. We have provided an
integrated script framework to run our benchmark programs, which contain
correct configurations to our knowledge. I encourage you start from our
framework and port your wrk2 tool into our framework (e.g., write a new cfg
for your tool).

On Thu, Nov 3, 2016 at 6:01 AM, Valerio Schiavoni [email protected]
wrote:

I can double-check all the settings: what but in particular ?
What could be the source of the issue here ?
Have you ever tried with the wrk2 tool ?
This is the command that we use:
LD_PRELOAD=/home/valerio/crane/libevent_paxos/client-ld-preload/libclilib.so.1.0
/opt/wrk2/bin/wrk -d 30s -c 8 -t 8 -R 1000 http://10.3.1.1:9000/test.php
The wrk2 tool is this: https://github.com/giltene/wrk2.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#11 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AC8Q7Mod7mwOnk81DPkX--l2FFEtywicks5q6bENgaJpZM4KnFDB
.

vschiavoni · 2016-11-05T10:07:17Z

Dear Heming,
I'm quite aware of the difficulties in debugging a distributed system. The tool that I've mentioned is a client-side http stress tool: it simply sends HTTP requests to the proxy or the server and gets back numbers, similar to apache-bench you used in the Crane paper.
From that perspective, the wrk2 tool itself is well configured.

On the server side, we use the apache.sh configuration provided by your framework (https://github.com/columbia/crane/blob/master/eval-container/configs/apache.sh), where the only difference on our side is to execute the client on a separate machine, and not on the primary itself.
We use the joint_sched mode.

Do you know/remember if you had to do special system configurations on the primary and replica nodes with respect to the shared memory segment size ? These are the settings on the cluster:

# sysctl -a | grep -E "shm"
kernel.shm_next_id = -1
kernel.shm_rmid_forced = 0
kernel.shmall = 18446744073692774399
kernel.shmmax = 18446744073692774399
kernel.shmmni = 4096
vm.hugetlb_shm_group = 0

# ipcs -lm

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 18014398509465599
max total shared memory (kbytes) = 18446744073642442748
min seg size (bytes) = 1

We run the system on Linux 4.4.0-34-generic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

/PAXOS_OP_QUEUE is too small for this app #11

/PAXOS_OP_QUEUE is too small for this app #11

vschiavoni commented Nov 2, 2016

hemingcui commented Nov 3, 2016

vschiavoni commented Nov 3, 2016

hemingcui commented Nov 5, 2016

vschiavoni commented Nov 5, 2016 •

edited

Loading

/PAXOS_OP_QUEUE is too small for this app #11

/PAXOS_OP_QUEUE is too small for this app #11

Comments

vschiavoni commented Nov 2, 2016

hemingcui commented Nov 3, 2016

vschiavoni commented Nov 3, 2016

hemingcui commented Nov 5, 2016

vschiavoni commented Nov 5, 2016 • edited Loading

vschiavoni commented Nov 5, 2016 •

edited

Loading