Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can_read() in read_command.c cause partial output in rare case. #68

Open
xmlijhu opened this issue Jul 23, 2024 · 4 comments
Open

can_read() in read_command.c cause partial output in rare case. #68

xmlijhu opened this issue Jul 23, 2024 · 4 comments
Labels
module/logcollector reporter/community Issue reported by the community type/bug Bug issue

Comments

@xmlijhu
Copy link

xmlijhu commented Jul 23, 2024

|Wazuh version|Component|Install type|Install method|Platform|
|4.4.3|Logcollector|RPM|Amazon Linux2|
| 4.4.3-40409 | Logcollector| Agent | RPM | AL2|

In rare case, when we run the logcollector with command output, we only get partial command output.

sample configuration in ossec.conf.

  <localfile>
    <log_format>command</log_format>
    <command>sleep 13; nice -n 10 bash /var/ossec/etc/share/sample.shell</command>
    <alias>sample_shell</alias>
    <out_format>$(timestamp) $(hostname) sample_shell: $(log)</out_format>
    <frequency>180</frequency>
  </localfile>

In most of the time, the output will output 80 lines of message, however in rare case, it will output less than 80.

After we drilled down, and then built a customized version of wazuh-logcollector with more debug information, we found the culprit is the can_read() function, which is false during the iteration of the fgets().

https://github.com/wazuh/wazuh/blob/master/src/logcollector/read_command.c#L43

  1. What's the purpose of can_read() here for reading command output? I think it's mainly for monitoring file purpose when the file is rotated or truncated.
  2. Further investigation turned out there is NO can_read() is used in read_fullcommand.c source,

In the ossec.log after turn on the debug=2

Most time the output will output 80 lines like below
2024/07/19 20:08:36 wazuh-logcollector[12738] read_command.c:73 at read_command(): DEBUG: Read 80 lines from command 'sleep 13; nice -n 10 bash /var/ossec/etc/shared/sample.shell'

While in rare case, it only show 1 line of output
2024/07/19 23:06:23 wazuh-logcollector[12738] read_command.c:73 at read_command(): DEBUG: Read 1 lines from command 'sleep 13; nice -n 10 bash /var/ossec/etc/shared/sample.shell'
@juliancnn juliancnn self-assigned this Jul 31, 2024
@juliancnn juliancnn added type/bug Bug issue module/logcollector reporter/community Issue reported by the community labels Jul 31, 2024
@juliancnn
Copy link
Member

Hi @xmlijhu,

The set_read and can_read functions are part of an older synchronization mechanism between reader threads and the thread managing runtime configuration in Logcollector. This mechanism can occasionally lead to issues where command outputs are not fully captured due to the function's position within the while loop condition.
I am still not sure if this should be before or after the execution of the command, because of the implications it may have, but in rare cases as a condition of the while it may cause it not to send all the logs.

Suggested Workaround:
Consider using the Command wodle, which provides a more stable mechanism for executing and capturing command outputs. More details can be found here:

I will escalate this as a potential bug for further review and enhancement. Thank you for bringing this to our attention!

Regards

@juliancnn juliancnn removed their assignment Jul 31, 2024
@jeffery-jen
Copy link

@juliancnn Thanks for looking at this.

From commit history the implementation had been there for quiet a while, but compared to read_fullcommand.c, where the ENTIRE command output is captured and delivered through w_msg_hash_queues_push not checking lock in can_read().

With this in mind, read_command.c checks other input threads for no apparent reason.

@juliancnn
Copy link
Member

Hi @jeffery-jen, yes, it is an old code, I think the main reason for this is that all the reader threads leave their tasks as soon as possible so that the main thread can refresh its configs, following this logic I think the can_read() check should be before the command execution, although this does not address the timeout problems.

@juliancnn juliancnn transferred this issue from wazuh/wazuh Aug 8, 2024
@jeffery-jen
Copy link

A related issue to this is also observed here since 2021

wazuh/wazuh#9130

Would any action be taken on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module/logcollector reporter/community Issue reported by the community type/bug Bug issue
Projects
None yet
Development

No branches or pull requests

3 participants