Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unlock mtxs before calling osql_bplog_free to avoid crash #2013

Merged
merged 9 commits into from
Jan 15, 2020

Conversation

adizaimi
Copy link

@adizaimi adizaimi commented Jan 10, 2020

Unlock mtx before calling osql_bplog_free() to avoid crash. Before this PR, we destroy the session with the mutexes held via callstack: osql_bplog_free() -> osql_close_session() -> _destroy_session().
fixes PR #2011

Copy link

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review suggestions:

@@ db/osqlcomm.c @@
+ Please remove debugging statements, and replace fprintf/printf with logmsg.

Copy link

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coding style check: Error. Use clang-format or view diff ⚠.
Smoke testing: Error ⚠ .
Failed to compile.

Detail | Log | Resubmit

Copy link

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review suggestions:

@@ db/osqlcomm.c @@
+ Please remove debugging statements, and replace fprintf/printf with logmsg.

Copy link

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coding style check: Error. Use clang-format or view diff ⚠.
Smoke testing: Error ⚠ .
Cbuild submission: Success ✓.
Regression testing: 3/326 tests failed ⚠.

The first 10 failing tests are:
sc_timepart_logicalsc_generated
restart_socksql
enospc

Detail | Log | Resubmit

Copy link

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sysbench Transactions Per Second (commit: 918534c):

@@ sysbench/bulk_insert @@
-8.16%
@@ sysbench/select_random_ranges @@
-5.77%
@@ sysbench/oltp_read_write @@
-4.74%

Copy link

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review suggestions:

@@ db/osqlcomm.c @@
+ Please remove debugging statements, and replace fprintf/printf with logmsg.

Copy link

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coding style check: Error. Use clang-format or view diff ⚠.
Smoke testing: Success ✓.
Cbuild submission: Success ✓.
Regression testing: Success ✓.

Detail | Log | Resubmit

Copy link

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sysbench Transactions Per Second (commit: 918534c):

@@ sysbench/select_random_points @@
-4.44%

Copy link

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review suggestions:

@@ db/osqlcomm.c @@
+ Please remove debugging statements, and replace fprintf/printf with logmsg.

Copy link

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coding style check: Error. Use clang-format or view diff ⚠.
Smoke testing: Success ✓.
Cbuild submission: Success ✓.
Regression testing: 1/326 tests failed ⚠.

The first 10 failing tests are:
enospc

Detail | Log | Resubmit

Copy link

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sysbench Transactions Per Second (commit: 918534c):

@@ sysbench/select_random_points @@
-4.45%
@@ sysbench/bulk_insert @@
-7.76%
@@ sysbench/oltp_read_only @@
-5.61%
@@ sysbench/oltp_update_non_index @@
-5.16%
@@ sysbench/oltp_read_write @@
-5.38%


msglen = OSQLCOMM_DONE_RPL_LEN;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug probably, should have been OSQLCOMM_DONE_UUID_RPL_LEN.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It most certainly was. I tried to fix it as well (#1868).

sorese->host);
}

} else {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was moved up

__LINE__); /* already have exclusive lock */
Pthread_mutex_unlock(&sess->completed_lock);
Pthread_mutex_unlock(&sess->mtx);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the real bug, these two should be unlocked before calling osql_repository_rem.

@@ -5111,7 +5105,7 @@ int osql_comm_signal_sqlthr_rc(osql_sess_t *sorese, struct errstat *xerr,
max = OSQLCOMM_DONE_XERR_RPL_LEN;
if (OSQLCOMM_DONE_RPL_LEN > max)
max = OSQLCOMM_DONE_RPL_LEN;
buf = alloca(max);
Copy link
Contributor

@akshatsikarwar akshatsikarwar Jan 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know not introduced in this PR and noted in prior code-comment about being kludgy - runtime decision for buf size. Perhaps not even a performance issue after optimization. However, while we are here can use a union to clean this up:

union {
   char a[OSQLCOMM_DONE_XERR_UUID_RPL_LEN];
   char b[OSQLCOMM_DONE_UUID_RPL_LEN];
   char c[OSQLCOMM_DONE_XERR_RPL_LEN];
   char d[OSQLCOMM_DONE_RPL_LEN];
} u;
uint8_t *buf = (uint8_t *) &u;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea.

struct ireq *iq = NULL;
uint8_t *malcd = malloc(OSQL_BP_MAXLEN);
Copy link
Contributor

@akshatsikarwar akshatsikarwar Jan 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32k - we can probably just allocate on stack uint8_t malcd[OSQL_BP_MAXLEN] instead of malloc, free. This (malcd) needs to survive if successful?

Copy link
Contributor

@akshatsikarwar akshatsikarwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Let THAT thread free it...
free(p_buf);
*/
Let THAT thread free p_buf (malcd)... */
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to answer the above comment, malcd needs to survive...see this line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks - we can try passing it in so don't have to get from heap

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also simplifies cleanup on error

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, will try in separate PR.

Copy link

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review suggestions:

@@ db/osqlcomm.c @@
+ Please remove debugging statements, and replace fprintf/printf with logmsg.

Copy link

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coding style check: Error. Use clang-format or view diff ⚠.
Smoke testing: Success ✓.
Cbuild submission: Success ✓.
Regression testing: Success ✓.

Detail | Log | Resubmit

@adizaimi
Copy link
Author

Thank you Akshat.

@adizaimi adizaimi merged commit 5a1fc01 into bloomberg:master Jan 15, 2020
@adizaimi adizaimi deleted the az_fix_osqlcomm_sess branch January 15, 2020 20:31
@adizaimi adizaimi changed the title Unlock mtxs before calling osql_repository_rem to avoid crash Unlock mtxs before calling osql_bplog_free to avoid crash Jan 15, 2020
adizaimi added a commit that referenced this pull request Jan 17, 2020
R7: unlock mutexes before calling destroy on session obj.
Port of PR #2013.
mistachkin pushed a commit to mistachkin/comdb2 that referenced this pull request Mar 5, 2020
R7: unlock mutexes before calling destroy on session obj.
Port of PR bloomberg#2013.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants