Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flow timeout timing/v14 #12084

Closed
wants to merge 16 commits into from

Conversation

victorjulien
Copy link
Member

More precise flow timeout enforcement.

Cleans up unix socket initialization.

SV_BRANCH=OISF/suricata-verify#2116

When a thread fails to spawn, include the thread name in the error
message.
No longer init then deinit part of the engine at startup of the unix
socket mode.
Timeout checks would access certain fields w/o locking, which could lead
to thread safety issues.
Use a more precise calculation for timing out flows, using both the
seconds and the micro seconds.
The flow worker needs to get the opportunity to run the flow update
before globally making it's current timestamp available. This is to
avoid another thread using the time to evict the flow that is about to
get a legitimate update.
If seconds match a flow can still be timing out.
When timing out flows, use the timestamp from the "owning" thread. This
avoids problems with threads being out of sync with each other.
As this may mean that a threads ts is a bit ahead of the minimum time
the flow manager normally uses, it can evict flows a bit faster.
The idea of sealing the thread store is that its members can be accessed
w/o holding a lock to the whole store at runtime.
Some checks can be done w/o holding a lock:
- seeing if the flow matches the packet
- if the hash row needs a timeout check

This patch skips taking a lock in these conditions.
Copy link

codecov bot commented Nov 4, 2024

Codecov Report

Attention: Patch coverage is 90.27778% with 14 lines in your changes missing coverage. Please review.

Project coverage is 83.36%. Comparing base (3a7eef8) to head (b1ea592).
Report is 113 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #12084      +/-   ##
==========================================
- Coverage   83.42%   83.36%   -0.07%     
==========================================
  Files         910      910              
  Lines      257642   257611      -31     
==========================================
- Hits       214949   214747     -202     
- Misses      42693    42864     +171     
Flag Coverage Δ
fuzzcorpus 61.54% <43.16%> (-0.11%) ⬇️
livemode 19.46% <46.04%> (+0.05%) ⬆️
pcap 44.42% <84.89%> (-0.06%) ⬇️
suricata-verify 62.75% <89.20%> (-0.01%) ⬇️
unittests 59.34% <19.71%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@suricata-qa
Copy link

Information:

ERROR: QA failed on SURI_TLPW2_autofp_suri_time.

ERROR: QA failed on SURI_TLPR1_alerts_cmp.

field baseline test %
SURI_TLPW1_stats_chk
.app_layer.flow.dhcp 563 600 106.57%
SURI_TLPR1_stats_chk
.app_layer.flow.ftp 32307 33405 103.4%
.app_layer.flow.dcerpc_tcp 40 42 105.0%

Pipeline 23246

@victorjulien
Copy link
Member Author

Passed my QA. Ran this PR with SV PR OISF/suricata-verify#2116. Local pipeline 5127, run 623.

@inashivb
Copy link
Member

inashivb commented Nov 5, 2024

Information:

ERROR: QA failed on SURI_TLPW2_autofp_suri_time.

ERROR: QA failed on SURI_TLPR1_alerts_cmp.
field baseline test %
SURI_TLPW1_stats_chk
.app_layer.flow.dhcp 563 600 106.57%
SURI_TLPR1_stats_chk
.app_layer.flow.ftp 32307 33405 103.4%
.app_layer.flow.dcerpc_tcp 40 42 105.0%

Pipeline 23246

Interesting deviations.
Q1: Are you able to explain all the deviations?
OR
Q2: Does this need a rerun?

I think I can explain the change in flow count and alert count theoretically. But, refraining from verifying until I get a response on the questions above.

@suricata-qa
Copy link

Information:

ERROR: QA failed on SURI_TLPW2_autofp_suri_time.

ERROR: QA failed on SURI_TLPR1_alerts_cmp.

field baseline test %
SURI_TLPW1_stats_chk
.app_layer.flow.dhcp 563 600 106.57%
SURI_TLPR1_stats_chk
.app_layer.flow.dcerpc_tcp 40 42 105.0%

Pipeline 23250

@@ -157,6 +157,7 @@ static OutputInitResult EveStreamLogInitCtxSub(ConfNode *conf, OutputCtx *parent
ctx->trigger_flags |= SetFlag(conf, "state-update", STREAM_PKT_FLAG_STATE_UPDATE);
ctx->trigger_flags |=
SetFlag(conf, "spurious-retransmission", STREAM_PKT_FLAG_SPURIOUS_RETRANSMISSION);
ctx->trigger_flags |= SetFlag(conf, "tcp-port-reuse", STREAM_PKT_FLAG_TCP_PORT_REUSE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Why is it called "port" reuse?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what wireshark calls it, but I will change it to ssn reuse or something as that is more consistent with our logging.

@@ -182,16 +185,20 @@ void FlowDisableFlowManagerThread(void)
*/
static bool FlowManagerFlowTimeout(Flow *f, SCTime_t ts, uint32_t *next_ts, const bool emerg)
{
uint32_t flow_times_out_at = f->timeout_at;
SCTime_t timesout_at; // = f->lastts;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the comment: why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed up

Comment on lines +583 to +587
/* update time */
if (!(PKT_IS_PSEUDOPKT(p))) {
TimeSetByThread(tv->id, p->ts);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we merge it w the else if condition above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, it should run for each packet.

@@ -2266,7 +2266,7 @@ void TmThreadsGetMinimalTimestamp(struct timeval *ts)
if (t->type != TVT_PPT)
continue;
if (SCTIME_CMP_NEQ(t->pktts, nullts)) {
SCTime_t sys_sec_stamp = SCTIME_ADD_SECS(t->sys_sec_stamp, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a bit more explanation to the commit

Comment on lines +845 to +854
if (TimeModeIsLive()) {
if (SCTIME_CMP_LT(pktts, timesout_at)) {
return false;
}
} else {
if (ftid == f->thread_id[0] || f->thread_id[0] == 0) {
/* do the timeout check */
if (SCTIME_CMP_LT(pktts, timesout_at)) {
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to do the same thing in both the blocks? Can we merge them? Then it becomes similar to flow manager..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will still be slightly different, but I'll compact it a bit

@catenacyber
Copy link
Contributor

Is there a redmine ticket to do for this ?

@victorjulien
Copy link
Member Author

Is there a redmine ticket to do for this ?

added https://redmine.openinfosecfoundation.org/issues/7455

@victorjulien
Copy link
Member Author

Replaced by #12278.

@victorjulien victorjulien deleted the flow-timeout-timing/v14 branch January 13, 2025 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants