Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test race conditions in bulkblocks API #85

Merged
merged 8 commits into from
Oct 18, 2022

Conversation

d-ylee
Copy link
Contributor

@d-ylee d-ylee commented Oct 5, 2022

This is an initial implementation to test the racing conditions in the bulkblocks API as described in these issues: #30, #31

@d-ylee d-ylee marked this pull request as ready for review October 5, 2022 14:46
@d-ylee d-ylee marked this pull request as draft October 5, 2022 14:47
@d-ylee
Copy link
Contributor Author

d-ylee commented Oct 5, 2022

@vkuznet I believe I resolved the issue with non-unique block names. At the same time, there was an issue of non-unique LFNs, but I also have resolved this.

Now, when I run the test, I create 10,000 blocks with 50 files each. I iterate over each block with no delay. As a result, I do not get any errors. When I check the database (/tmp/dbs-one.db), I do see that 10,000 blocks were inserted, and 500,000 files were inserted. Do I need to increase the number of blocks or files to instigate the racing condition?

@vkuznet
Copy link
Contributor

vkuznet commented Oct 5, 2022

I don't think you need to increase number of blocks/files. If I read your statement correctly you send 10K requests with different blocks, right? The question is how do you send them? For instance, how many HTTP connections your test creates. It is possible that all your request can come through single or small pool of connection(s). Please clarify these details.

@d-ylee
Copy link
Contributor Author

d-ylee commented Oct 5, 2022

I'm going to try to make it so that each HTTP request uses the connection only once based off the discussions here: https://stackoverflow.com/questions/57683132/turning-off-connection-pool-for-go-http-client

Will get back to you on what I see.

@d-ylee
Copy link
Contributor Author

d-ylee commented Oct 5, 2022

@vkuznet I might be approaching this incorrectly, but I still seem to not be able to induce the racing conditions while disabling the keepalive in the HTTP transport. Would there be a better way for me to generate blocks and files with the same base metadata?

@vkuznet
Copy link
Contributor

vkuznet commented Oct 5, 2022

you need to show me how you generate blocks, but if you mean "same base metadata" that JSON will have common configuration section, yes this is what we need.

@d-ylee
Copy link
Contributor Author

d-ylee commented Oct 6, 2022

web/server.go Outdated
@@ -263,6 +263,7 @@ func dbInit(dbtype, dburi string) (*sql.DB, error) {
}
db.SetMaxOpenConns(Config.MaxDBConnections)
db.SetMaxIdleConns(Config.MaxIdleConnections)
db.Exec("PRAGMA journal_mode=WAL;")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to ensure that this ONLY applies to SQLite DB and not for ORACLE

"log_file": "/tmp/dbs2go_writer_1_2.log",
"verbose": 2,
"limiter_rate": "10000-s",
"max_db_connections": 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use use it in this test since it diminish the purpose of the race testing. If we have only one db connection then all HTTP concurrent calls will be sequential.

@vkuznet
Copy link
Contributor

vkuznet commented Oct 7, 2022

Dennis, I think the way you generate block data is correct since you reuse algo part which refers to dataset config. But in order to imitate racing conditions we need multiple connections to DB. So far you explicitly set only one connection, and this what causing sequential access even if you run concurrent clients. We need multiple db connection, i.e. two concurrent clients will use different db connections.

Removed max_db_connections in writer config
PRAGMA only with sqlite3
Changed time.Now to static value in racing condition test
@d-ylee
Copy link
Contributor Author

d-ylee commented Oct 10, 2022

@vkuznet I removed the max_db_connections in the writer2 config. When I run the test while only generating 2 blocks, the database is locked error occurs again. This error is thrown during InsertOutputConfigs. One block was inserted, while the other block caused the error to occur.

@vkuznet
Copy link
Contributor

vkuznet commented Oct 10, 2022

I think this is the issue we are facing. While using SQLIte we have database lock, while I should expect the transaction lock, and if using ORACLE we have transaction failure. And, your error InsertOutputConfigs is correct one since this is where racing condition I expect to happen. So, can you write test with catching this error, and then retry it after some time where it should disappear and second block should be inserted?

@d-ylee
Copy link
Contributor Author

d-ylee commented Oct 11, 2022

@vkuznet I made it so that a request would retry up to 3 times before considering the bulkblocks insert a failure. What are your thoughts?

@d-ylee d-ylee marked this pull request as ready for review October 12, 2022 14:14
Copy link
Contributor

@vkuznet vkuznet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dennis, I think code wise everything is fine, but I suggest you add proper documentation in different parts of the code to explain what is this test and why we are doing it.

@@ -263,6 +263,9 @@ func dbInit(dbtype, dburi string) (*sql.DB, error) {
}
db.SetMaxOpenConns(Config.MaxDBConnections)
db.SetMaxIdleConns(Config.MaxIdleConnections)
if dbtype == "sqlite3" {
db.Exec("PRAGMA journal_mode=WAL;")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add proper comment why do we need this option, and provide pointer to docs we found on google about it. It is good practice to make reference to features we borrow from someone else discussion

//return resp.StatusCode
}

// TestRaceConditions tests for the racing conditions in the DBSWriter bulkblocks API
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please add more description about what racing conditions we are talking here. Please add a link to our github ticket.

defer resp.Body.Close()
t.Logf("Try #%d, Block: %s, response: %d\n", retries, block.Block.BlockName, resp.StatusCode)
if resp.StatusCode != http.StatusOK {
if retries < 3 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please add proper comment about why do we need 3 retries to see the error.


// Race Condition Tests
// This file contains code necessary to test for racing conditions in DBSWriter APIs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please provide reference here to github issue about racing condition and explain what this test suppose to do, e.g. how many servers we need, how many blocks we must submit to trigger racing condition, etc.

@d-ylee d-ylee requested a review from vkuznet October 17, 2022 14:43
@vkuznet vkuznet merged commit f9dde71 into dmwm:master Oct 18, 2022
@d-ylee d-ylee deleted the testRaceConditions branch October 24, 2022 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants