Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stream: RST no longer acks all data #12186

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

victorjulien
Copy link
Member

Since forever (1578ef1) a valid RST would update the internal last_ack representation to include all unack'd data. This was originally done to make sure the unACK'd data was inspected/processed at flow timeout.

It was observed however, that if GAPs existed in this unACK'd data, a GAP could be reported in the stats and a GAP event would be raised. This doesn't make sense, as missing segments in the unACK'd part of the stream are completely normal. Segments simply do not all arrive in order.

It turns out that the original behavior of updating last_ack to include all unACK'd data is no longer needed. Both raw stream inspection and app-layer updates will include the unACK'd data on stream timeout.

Since the GAP detection uses last_ack to determine GAPs, not moving last_ack addresses the GAP false positives.

Ticket: #7422.

SV_BRANCH=OISF/suricata-verify#2154

Copy link

codecov bot commented Nov 30, 2024

Codecov Report

Attention: Patch coverage is 85.71429% with 4 lines in your changes missing coverage. Please review.

Project coverage is 83.16%. Comparing base (e9173f3) to head (1d96aa2).

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #12186      +/-   ##
==========================================
- Coverage   83.17%   83.16%   -0.02%     
==========================================
  Files         912      912              
  Lines      257111   257090      -21     
==========================================
- Hits       213856   213798      -58     
- Misses      43255    43292      +37     
Flag Coverage Δ
fuzzcorpus 60.97% <85.71%> (-0.05%) ⬇️
livemode 19.42% <0.00%> (+0.01%) ⬆️
pcap 44.41% <71.42%> (+0.02%) ⬆️
suricata-verify 62.79% <71.42%> (+0.02%) ⬆️
unittests 59.18% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@suricata-qa
Copy link

Information:

ERROR: QA failed on SURI_TLPW1_files_sha256.

field baseline test %
SURI_TLPR1_stats_chk
.flow.end.tcp_liberal 13588 12288 90.43%
.app_layer.error.http.parser 700 561 80.14%
.app_layer.error.tls.gap 1718 1649 95.98%

Pipeline 23616

@victorjulien
Copy link
Member Author

@ct0br0 I have the pcaps, no action needed at this point.

@victorjulien
Copy link
Member Author

After investigating the missing hashes, my conclusion is that the behavior in this PR is correct. They all seem to have the following pattern:

  1. unACK'd data in flight, with some GAPs (some segments missing)
  2. RST comes in, last_ack doesn't cover all in flight segments

In master, the RST will auto-ACK all data, which means we're now sending this incomplete data to the app-layer parser. The parsers get confused and GAPs are counted. The HTTP parser specifically, uses some of the post-GAP data from the in-flight section to create new complete nonsense transaction, with no URI, etc. It's it this nonsense tx that is treated like a file, and for which the hashes are reported missing for this PR.

Since the issues are with TLPW1, I'm crafting some additional SV tests to show the issue.

Since forever (1578ef1) a valid RST
would update the internal `last_ack` representation to include all
unack'd data. This was originally done to make sure the unACK'd data was
inspected/processed at flow timeout.

It was observed however, that if GAPs existed in this unACK'd data, a
GAP could be reported in the stats and a GAP event would be raised. This
doesn't make sense, as missing segments in the unACK'd part of the
stream are completely normal. Segments simply do not all arrive in
order.

It turns out that the original behavior of updating `last_ack` to
include all unACK'd data is no longer needed.

For raw stream inspection, the detection engine will already include the
unACK'd data on flow end.

For app-layer updates the unACK'd data is often harmful, as the data
often has GAPs. Parser like the http parser would report these GAPs and
could also get confused about the post-GAP data being a new transaction
including a file. This lead to many reported errors and fantom txs and
files.

Since the GAP detection uses `last_ack` to determine GAPs, not moving
`last_ack` addresses the GAP false positives.

Ticket: OISF#7422.
@victorjulien
Copy link
Member Author

OISF/suricata-verify@b7cf987 has a couple of tests to show the issue with TLPW1.

@@ -3242,10 +3216,9 @@ static int StreamTcpPacketStateEstablished(
ssn->client.window = window << ssn->client.wscale;

if ((tcph->th_flags & TH_ACK) && StreamTcpValidateAck(ssn, &ssn->server, p) == 0)
StreamTcpUpdateLastAck(
ssn, &ssn->server, StreamTcpResetGetMaxAck(&ssn->server, window));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, notice this fix here ;)

@suricata-qa
Copy link

Information:

ERROR: QA failed on SURI_TLPW1_files_sha256.

field baseline test %
SURI_TLPR1_stats_chk
.flow.end.tcp_liberal 13588 12288 90.43%
.app_layer.error.http.parser 700 561 80.14%
.app_layer.error.tls.gap 1718 1649 95.98%

Pipeline 23719

@victorjulien victorjulien mentioned this pull request Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants