Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for agent self-restarting: Development Phase #386

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 3 additions & 24 deletions packages/debs/SPECS/wazuh-agent/debian/prerm
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,9 @@ case "$1" in
elif ${BINARY_DIR}wazuh-agent --status 2>/dev/null | grep "is running" > /dev/null 2>&1; then
pid=$(ps -ef | grep "${BINARY_DIR}wazuh-agent" | grep -v grep | awk '{print $2}')
if [ -n "$pid" ]; then
kill -SIGTERM "$pid" 2>/dev/null
kill -15 "$pid" 2>/dev/null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we replace -SIGTERM with -15? I think the former is more readable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned here, this change is intended to avoid the following error:

/var/lib/dpkg/info/wazuh-agent.prerm: 43: kill: Illegal option -S
dpkg: error processing package wazuh-agent (--remove):

fi
fi

# # Process: wazuh-agent
# if pgrep -f "wazuh-agent" > /dev/null 2>&1; then
# kill -15 $(pgrep -f "wazuh-agent") > /dev/null 2>&1
# fi

# if pgrep -f "wazuh-agent" > /dev/null 2>&1; then
# kill -9 $(pgrep -f "wazuh-agent") > /dev/null 2>&1
# fi
;;

remove)
Expand All @@ -38,10 +29,9 @@ case "$1" in
elif ${BINARY_DIR}wazuh-agent --status 2>/dev/null | grep "is running" > /dev/null 2>&1; then
pid=$(ps -ef | grep "${BINARY_DIR}wazuh-agent" | grep -v grep | awk '{print $2}')
if [ -n "$pid" ]; then
kill -SIGTERM "$pid" 2>/dev/null
kill -15 "$pid" 2>/dev/null
fi
fi

;;

failed-upgrade)
Expand All @@ -52,20 +42,9 @@ case "$1" in
elif ${BINARY_DIR}wazuh-agent --status 2>/dev/null | grep "is running" > /dev/null 2>&1; then
pid=$(ps -ef | grep "${BINARY_DIR}wazuh-agent" | grep -v grep | awk '{print $2}')
if [ -n "$pid" ]; then
kill -SIGTERM "$pid" 2>/dev/null
kill -15 "$pid" 2>/dev/null
fi
fi

# if [ -f ${INSTALLATION_WAZUH_DIR}/bin/wazuh-agent ]; then
# # pkill wazuh-agent
# if pgrep -f "wazuh-agent" > /dev/null 2>&1; then
# kill -15 $(pgrep -f "wazuh-agent") > /dev/null 2>&1
# fi

# if pgrep -f "wazuh-agent" > /dev/null 2>&1; then
# kill -9 $(pgrep -f "wazuh-agent") > /dev/null 2>&1
# fi
# fi
;;

*)
Expand Down
1 change: 0 additions & 1 deletion src/agent/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,6 @@ target_link_libraries(Agent
MultiTypeQueue
ModuleManager
ModuleCommand
CentralizedConfiguration
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was duplicated, so I removed it.

Boost::asio
sysinfo
PRIVATE
Expand Down
12 changes: 10 additions & 2 deletions src/agent/command_handler/include/command_handler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -123,8 +123,16 @@ namespace command_handler
{
for (auto& cmd : *cmds)
{
cmd.ExecutionResult.ErrorCode = module_command::Status::FAILURE;
cmd.ExecutionResult.Message = "Agent stopped during execution";
if (cmd.Command == "restart")
{
cmd.ExecutionResult.ErrorCode = module_command::Status::IN_PROGRESS;
cmd.ExecutionResult.Message = "Restarting Agent...";
}
else
{
cmd.ExecutionResult.ErrorCode = module_command::Status::FAILURE;
cmd.ExecutionResult.Message = "Agent stopped during execution";
}
ReportCommandResult(cmd);
m_commandStore.UpdateCommand(cmd);
}
Expand Down
4 changes: 3 additions & 1 deletion src/agent/command_handler/src/command_handler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@
namespace command_handler
{
const std::unordered_map<std::string, std::string> VALID_COMMANDS_MAP = {
{"set-group", "CentralizedConfiguration"}, {"update-group", "CentralizedConfiguration"}};
{"set-group", "CentralizedConfiguration"},
{"update-group", "CentralizedConfiguration"},
{"restart", "restart"}};

void CommandHandler::Stop()
{
Expand Down
3 changes: 3 additions & 0 deletions src/agent/include/isignal_handler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,7 @@ class ISignalHandler

/// @brief Waits for a signal to be received
virtual void WaitForSignal() = 0;

/// @brief Send restart signal
virtual void Restart() = 0;
};
3 changes: 3 additions & 0 deletions src/agent/include/signal_handler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@ class SignalHandler : public ISignalHandler
/// @param signalToHandle The signal to be handled
static void HandleSignal(int signalToHandle);

/// @brief Handles the self-restart signal
void Restart() override;

/// @brief Keeps track of whether the agent should continue running
static std::atomic<bool> KeepRunning;

Expand Down
5 changes: 3 additions & 2 deletions src/agent/service/wazuh-agent.service
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,13 @@ After=network.target network-online.target

[Service]
Type=simple

PIDFile=/var/run/wazuh-agent.lock
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Systemd cannot handle the lockfile as a PIDfile.
Systemd expect that file to contain the process' PID.
As the unit has Type=simple, the PIDfile is not necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. In a previous test, I used Type=notify and wrote the PID in the lockfile. Thanks for letting me know.

ExecStart=/usr/bin/env WAZUH_HOME/wazuh-agent
TimeoutStopSec=30s # Wait for 30 seconds before killing the service

KillSignal=SIGTERM

KillMode=process
KillMode=mixed

SendSIGKILL=no

Expand Down
11 changes: 11 additions & 0 deletions src/agent/src/agent.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,17 @@ void Agent::Run()
},
m_messageQueue);
}
else if (cmd.Module == "restart")
{
LogInfo("Restart: Initiating self-restart");
m_signalHandler->Restart();
auto RestartExecuteCommand = []() -> boost::asio::awaitable<module_command::CommandExecutionResult>
{
co_return module_command::CommandExecutionResult {module_command::Status::IN_PROGRESS,
"Pending restart execution"};
};
return RestartExecuteCommand();
}
return DispatchCommand(cmd, m_moduleManager.GetModule(cmd.Module), m_messageQueue);
}),
"CommandsProcessing");
Expand Down
185 changes: 173 additions & 12 deletions src/agent/src/process_options_unix.cpp
Original file line number Diff line number Diff line change
@@ -1,40 +1,201 @@
#include <process_options.hpp>
#include <process_options_unix.hpp>

#include <agent.hpp>
#include <ctime>
#include <fmt/format.h>
#include <fmt/ranges.h>
#include <fstream>
#include <logger.hpp>
#include <unix_daemon.hpp>

#include <csignal>
#include <iostream>
#include <thread>
#include <vector>

void StartAgent(const std::string& configFilePath)
#include <sys/wait.h>

// Flag to signal that SIGUSR1 was received
volatile sig_atomic_t SIGNAL_RECEIVED = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this symbol is not a constant, please write it in lowercase.


// Signal handler for SIGUSR1
void sigusr1_handler(int signal)
{
unix_daemon::LockFileHandler lockFileHandler = unix_daemon::GenerateLockFile(configFilePath);
if (signal == SIGUSR1)
{
LogDebug("Received SIGUSR1: Restarting child agent process...");
SIGNAL_RECEIVED = SIGUSR1; // Set flag to indicate restart needed
}
}

if (!lockFileHandler.isLockFileCreated())
// Signal handler for SIGCHLD
void sigchld_handler(int signal)
{
if (signal == SIGCHLD)
{
std::cout << "wazuh-agent already running\n";
return;
LogDebug("Received SIGCHLD: Child agent process terminated.");
}
}

// Signal handler for SIGTERM
void sigterm_handler(int signal)
{
if (signal == SIGTERM)
{
LogDebug("Received SIGTERM: Stop the agent process...");
SIGNAL_RECEIVED = SIGTERM; // Set flag to indicate restart needed
}
}

LogInfo("Starting wazuh-agent");
void StartAgent(const std::string& configFilePath)
{
// Set up signal handlers
struct sigaction sa_usr1 = {}, sa_chld = {}, sa_term = {};

sa_usr1.sa_handler = sigusr1_handler;
sa_usr1.sa_flags = 0;
sigaction(SIGUSR1, &sa_usr1, nullptr);

sa_chld.sa_handler = sigchld_handler;
sa_chld.sa_flags = SA_NOCLDSTOP; // Avoid receiving SIGCHLD for stopped children
sigaction(SIGCHLD, &sa_chld, nullptr);

try
// Set up SIGTERM handler
sa_term.sa_handler = sigterm_handler;
sa_term.sa_flags = 0;
sigaction(SIGTERM, &sa_term, nullptr);

pid_t pid = fork();

if (pid < 0)
{
Agent agent(configFilePath);
agent.Run();
LogError("Restart: Fork failed");
exit(1);
}
catch (const std::exception& e)
else if (pid == 0)
{
LogError("Exception thrown in wazuh-agent: {}", e.what());
unix_daemon::LockFileHandler lockFileHandler = unix_daemon::GenerateLockFile(configFilePath);

// Child process: Run the agent
if (!lockFileHandler.isLockFileCreated())
{
LogInfo("wazuh-agent already running");
return;
}

LogInfo("Starting wazuh-agent");
try
{
Agent agent(configFilePath);
agent.Run();
}
catch (const std::exception& e)
{
LogError("Exception thrown in wazuh-agent: {}", e.what());
}

exit(0);
}
else
{
// Parent process - Monitoring the agent, taking care of self-restart
pause(); // Suspend parent until a signal is received

// Stop Agent
if (SIGNAL_RECEIVED == SIGTERM)
{
LogDebug("Received SIGTERM, terminating child process...");
kill(pid, SIGTERM);
waitpid(pid, nullptr, 0); // Wait for the child to terminate.
}

// Self-restart agent
if (SIGNAL_RECEIVED == SIGUSR1)
{
if (using_systemctl())
{
LogDebug("Restart: systemctl restarting wazuh agent service.");
std::system("systemctl restart wazuh-agent");
}
else
{
StopAgent(pid, configFilePath);

std::vector<const char*> args = get_command_line_args();
LogDebug("Restart: starting wazuh agent in a new process.");
if (execve(args[0], const_cast<char* const*>(args.data()), nullptr) == -1)
{
LogError("Failed to spawn new Wazuh agent process.");
}
}
}
exit(0); // Exit the parent process
}
}

void StatusAgent(const std::string& configFilePath)
{
std::cout << fmt::format("wazuh-agent status: {}\n", unix_daemon::GetDaemonStatus(configFilePath));
}

std::vector<const char*> get_command_line_args()
{
std::vector<const char*> args;
std::ifstream cmdline_file("/proc/self/cmdline");

if (!cmdline_file)
{
LogError("Failed to open /proc/self/cmdline");
return args;
}

std::string arg;
while (getline(cmdline_file, arg, '\0'))
{
args.push_back(strdup(arg.c_str()));
}

args.push_back(nullptr);

return args;
}

bool using_systemctl()
{
return (0 == std::system("which systemctl > /dev/null 2>&1") && nullptr != std::getenv("INVOCATION_ID"));
}

void StopAgent(pid_t pid, const std::string& configFilePath)
{
int status {};
pid_t result {};

const int timeout = 30; // Timeout duration (in seconds) for killing the agent child process
time_t start_time = time(nullptr); // Record the start time to track the timeout duration

// Initiate the process termination by sending SIGTERM
kill(pid, SIGTERM);

while (true)
{
result = waitpid(pid, &status, WNOHANG); // Non-blocking check for agent process status

if (result == pid)
{
LogDebug("Agent process terminated.");
break;
}

if (difftime(time(nullptr), start_time) > timeout)
{
LogError("Timeout reached! Forcing agent process termination.");
unix_daemon::LockFileHandler lockFileHandler = unix_daemon::GenerateLockFile(configFilePath);
kill(pid, SIGKILL);
// Remove lock file
lockFileHandler.~LockFileHandler();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a removeLockFile() method if I'm not mistaken. Also, this is an object created on the stack, the destructor will be called when it goes out of scope, calling it manually could result in UB.

Copy link
Member Author

@lchico lchico Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removeLockFile() method is now private. I need a hard remove because the SIGKILL signal will shut down the service abruptly, preventing a graceful removal of the lock file. But yes, I need to find a better solution for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would probably be better to just make the method public.

}

// Sleep for a short time before checking again
std::this_thread::sleep_for(std::chrono::seconds(1));
}
}
Loading
Loading