-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues faced while getting mongodb test suite running locally #17
Comments
@aphyr And what do you know, just as I went to run the tests again hoping to send you a stack trace, they worked ! 🍻 🙂 |
Some of the code here for example the |
Huh, okay... I can say that the test is designed for a specific version of debian--it's been a while since I poked my head into the docker and mongo tests, but this miiiight be due to a mismatch between those versions? The libcurl transition has been a real bear: some systems need 3, some 4, etc. etc. |
If this change (the change in setup! to install MongoDB) works well, can I open a PR to submit this change ? What other kinds of tests do you require before taking contributions ? Any guides about other instructions for code contributions ? |
I think it'd be good to figure out what version of Debian worked before, and what version it works with now, and to document that in the README, for starters! I do apologize, this was a rush job in my free time, and I wasn't as diligent about future-proofing things as I should have been! |
So running it 5 times, caused 1 instance of the test suite crashing
I think the exceptions I was seeing earlier were of a similar nature |
Ah, well that looks like there's a problem in the MongoDB setup process--it's not accepting connections. Likely a race condition between the code and MongoDB itself, if it's sporadic. Maybe there needs to be some additional health checks during db/setup!... |
From the dockerfile, I can see that the docker image is based on this Debian docker image : https://github.com/jgoerzen/docker-debian-base-standard I am not sure I understand when you say If I can help with updating the README, do let me know I can do that. |
Do you mean something like
To check if the mongo service is up and running ? |
More that I'm not sure whether this ever worked with the Docker setup, and if you're having problems, it might be because this version of Jepsen and the version of Mongo it installs were intended to run on, say, Jessie, when the Docker env is giving you, say, Bullseye. I honestly forget, so much has happened this year. I'd love to go dig into this for you but I am scrambling to keep up with waaaay too much client stuff right now! |
Maybe. I think the current code probably does its own health checks already... lemme check. Ah, yes, here it is: mongodb/src/jepsen/mongodb/db.clj Lines 183 to 185 in 83548bb
We've got blocking on individual node startup, blocking on cluster join, blocking on elections, blocking on the cluster, blocking on the primary. That is, apparently, not enough blocking! This isn't just you: Mongo's... historically been difficult to set up reliably. |
Aah, that makes sense. FWIW, the debian version that the current jepsen's main branch sets up is buster. |
Ooof, yeah, Again, I'm sorry. This is a holdover from an older time in Jepsen when Debian versions lasted (compared to the lifetime of a test) forever and were often cross-compatible: we never really established a convention around OS versioning. Now that people are trying to dredge up tests written n years ago (or even 7 months ago!), those assumptions don't always hold. This is a good reminder to me to write more of that documentation, and start splitting out future It looks like this test uses jepsen 0.1.19, which... I think should be using Jessie. Jepsen 0.2.1 transitioned to Buster. |
From this commit It seems the control node used ubuntu and the db nodes used stretch around the time these mongo tests were written. |
Oh, yeah, but that doesn't (and I am so sorry, I know this is confusing) mean this test was supposed to work with Docker. The So, I think you've got two options here. One is if you get a Jessie environment going (are the mirrors still around?) you should be able to run the test as-is. The other is using Buster and figuring out how to port the test forward to Buster, which miiight be as simple as bumping the version of |
Okay, I understand now. Thanks. As regards the 2 options, I would say bringing the tests up to date is more fruitful in the longer run. I can give that a crack to see what else needs changing. Right off the bat, I think there are some code changes that might be needed. Strangely, in the source code, I see this namespace mentioned in the docs but only see it used in the |
Ah, now THIS I actually have good docs for! https://github.com/jepsen-io/jepsen/releases/tag/0.2.0 |
(also be advised there's bug in 0.2.0 that might affect generators--best to jump straight to 0.2.1 I think) |
Okay, so this morning I seem to be able to get the original SSH related exceptions rather frequently :
This is for node n4 but similar exceptions happen for all nodes. |
That's a long-standing bug in the SSH library--some kind of race condition I think. We can generally recover transparently.On Dec 3, 2020 22:35, Rhishikesh <[email protected]> wrote:
Okay, so this morning I seem to be able to get the original SSH related exceptions rather frequently :
WARN [2020-12-04 03:17:54,150] jepsen node n4 - jepsen.control Encountered error with conn [:control "n4"]; reopening
java.lang.InterruptedException: sleep interrupted
at java.base/java.lang.Thread.sleep(Native Method)
at clj_ssh.ssh$ssh_exec.invokeStatic(ssh.clj:690)
at clj_ssh.ssh$ssh_exec.invoke(ssh.clj:670)
at clj_ssh.ssh$ssh.invokeStatic(ssh.clj:723)
at clj_ssh.ssh$ssh.invoke(ssh.clj:699)
at jepsen.control.SSHRemote.execute_BANG_(control.clj:331)
at jepsen.control$ssh_STAR_$fn__3063.invoke(control.clj:172)
at jepsen.control$ssh_STAR_.invokeStatic(control.clj:172)
at jepsen.control$ssh_STAR_.invoke(control.clj:168)
at jepsen.control$exec_STAR_.invokeStatic(control.clj:194)
at jepsen.control$exec_STAR_.doInvoke(control.clj:191)
at clojure.lang.RestFn.applyTo(RestFn.java:137)
at clojure.core$apply.invokeStatic(core.clj:665)
at clojure.core$apply.invoke(core.clj:660)
at jepsen.control$exec.invokeStatic(control.clj:210)
at jepsen.control$exec.doInvoke(control.clj:204)
at clojure.lang.RestFn.invoke(RestFn.java:436)
at jepsen.db$tcpdump$reify__3446.teardown_BANG_(db.clj:112)
at jepsen.mongodb.db.ShardedDB.teardown_BANG_(db.clj:406)
at jepsen.db$fn__3273$G__3269__3277.invoke(db.clj:11)
at jepsen.db$fn__3273$G__3268__3282.invoke(db.clj:11)
at clojure.core$partial$fn__5824.invoke(core.clj:2625)
at jepsen.control$on_nodes$fn__3161.invoke(control.clj:430)
This is for node n4 but similar exceptions happen for all nodes.
A simple ssh n4 from the control node seems to work so there isn't an obvious problem with the docker cluster.
Any pointers for me to explore here ?
—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or unsubscribe.
|
Oh bro! It is exciting that you have delt with the problem that running mongodb jepsen test in docker-compose, even though the test may crash in some situations.
I am interested to your work and it would be help if you could share you config and fixment. Thanks. |
I had a chance to go through the mongo code today and get everything fixed up for the lastest Jepsen and Debian Buster. |
@aphyr nice ! 😊 Would love to see that happen. Also I have opened a pull request making some of the changes for jepsen 0.2.1 |
Here are some issues I faced while getting this MongoDB jepsen suite running locally with docker. Information about the code that I am using
Jepsen :
commit a2bcad59f0df5bd39cea1e61d9b64376c479df9c (HEAD -> main)
MongoDB :
commit 83548bb8e054170ecc4b8fda70390e40fcca5e30 (origin/master, origin/HEAD)
Initially I had an issue of not enough nodes (by default Jepsen starts 5 nodes in docker) as evident by this function
jepsen.mongodb.db/shard-node-plan
I fixed that by adding 2 more nodes.Then I hit another roadblock, while installing mongoDB on each node, it error'd out saying that a required dependency can't be found, specifically
libcurl3
So apparently,libcurl4
andlibcurl3
don't work well together and in-spite of efforts I wasn't able to getlibcurl3
and mongo running. So I changed the way Jepsen was installing MongoDB and followed the official documentation that installs Mongo 4.2. That worked.But now I am still unable to run the tests as every time there seems to be some SSH related exception saying the control node cant reach the DB nodes.
I changed the installation instructions for MongoDB since the default instructions in setup! were error'ing out due to a libcurl3 dependency. Instructions that I have coded into setup! instead
The text was updated successfully, but these errors were encountered: