Skip to content

Conversation

kwvg
Copy link
Collaborator

@kwvg kwvg commented Sep 5, 2025

Motivation

Continuing the work in dash#6267, this pull request aims to allow IPv6 connections, add TCP connection support and message buffering if a TCP connection drops. This should allow hosts that want stronger connection guarantees to their Statsd server to opt for TCP expecting lower stats loss as stats generated within ~5 minutes are preserved in the buffer.

To allow opting for TCP, support for URLs have been introduced for statshost and can be formatted as tcp://127.0.0.1:8125. As statshost can also include the port, statsport has been deprecated and will be removed in a future release of Dash Core.

Additional Information

  • Downtime buffering is only available for TCP, behavior of UDP connections remain unaffected and messages that fail to send will remain dropped.

  • The success and failure stats in debug logging has been removed as reconnection logic makes them no longer a reliable means of measuring how many messages overall were generated at runtime.

  • To avoid noise, all RawSender logs and StatsdClient send failure logs are now gated behind -debug=net.

    • This does not affect startup validation as errors are no longer ignored and trigger an InitError(). Connection failure will not halt startup but lookup failure will, this is because LookupHost() is used to identify if the connection is IPv4/IPv6.

      After startup, subsequent lookup failures will simply result in retrying at an interval up to every 5 minutes (not applicable for UDP).

Breaking Changes

See release notes.

Checklist

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have made corresponding changes to the documentation
  • I have assigned this pull request to a milestone (for repository code-owners and collaborators only)

@kwvg kwvg added this to the 23 milestone Sep 5, 2025
@kwvg
Copy link
Collaborator Author

kwvg commented Sep 5, 2025

@coderabbitai review

Copy link

github-actions bot commented Sep 5, 2025

✅ No Merge Conflicts Detected

This PR currently has no conflicts with other open PRs.

Copy link

coderabbitai bot commented Sep 5, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

coderabbitai bot commented Sep 5, 2025

Walkthrough

The patch refactors StatsD client initialization to a URL-aware factory StatsdClient::make, removes InitStatsClient, and hides -statsport. It adds TCP transport with reconnection, batching, buffering, and IPv6/URL parsing support, converts APIs to std::string_view overloads, and surfaces explicit error reporting. RawSender gains reconnect logic, queueing, and bilingual errors. Compatibility macros for Windows socket error codes were added on non-Windows builds. Tests and benchmarks were updated: a new functional UDP stats test and test runner inclusion, bench adjustments to use a stub client, and init sequencing changes in startup code.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Pre-merge checks (2 passed, 1 warning)

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 2.22% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The current title clearly summarizes the primary enhancements (IPv6 and TCP support) and notes deprecation and new tests, aligning closely with the main changes in the PR. It is concise, specific, and free of vague language or unnecessary noise.
Description Check ✅ Passed The description clearly explains the rationale, high-level behavior changes, and outlines how IPv6, TCP buffering, and deprecation are implemented, directly corresponding to the set of changes in the pull request.
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
src/stats/client.h (1)

1-66: Apply clang-format to fix formatting issues

The pipeline check indicates formatting differences. Please run clang-format -i on this file to ensure consistent formatting.

#!/bin/bash
# Verify the exact formatting issues
clang-format --version 2>/dev/null || echo "clang-format not available"
src/stats/client.cpp (2)

1-223: Apply clang-format to fix formatting issues

The pipeline check indicates formatting differences. Please run clang-format -i on this file.


210-213: Potential precision loss in format string

Using %f for formatting the value might cause precision issues. Consider using a more precise format specifier or std::to_string for exact representation.

-    RawMessage msg{strprintf("%s%s%s:%f|%s", m_prefix, key, m_suffix, value, type)};
+    // Use higher precision for doubles to avoid data loss
+    RawMessage msg;
+    if constexpr (std::is_floating_point_v<T1>) {
+        msg = strprintf("%s%s%s:%.17g|%s", m_prefix, key, m_suffix, value, type);
+    } else {
+        msg = strprintf("%s%s%s:%s|%s", m_prefix, key, m_suffix, std::to_string(value), type);
+    }
src/stats/rawsender.cpp (1)

1-277: Apply clang-format to fix formatting issues

The pipeline check indicates formatting differences. Please run clang-format -i on this file.

🧹 Nitpick comments (16)
src/bench/checkblock.cpp (1)

41-43: Stub StatsdClient init — LGTM; add missing header for make_unique

Avoid relying on transitive includes.

Apply:

 #include <bench/data.h>
+#include <memory>
doc/release-notes-6837.md (1)

4-7: Add IPv6 URL example to prevent ambiguity

Showing bracketed literal avoids misconfiguration.

Append e.g.:

 - `-statshost` now accepts URLs to allow specifying the protocol, host and port in one argument.
+  For IPv6 literals, use brackets (e.g., `udp://[::1]:8125`).
@@
-  `tcp://127.0.0.1:8125`). UDP (`udp://`) remains the default and will be assumed if a protocol is not specified.
+  `tcp://127.0.0.1:8125`). UDP (`udp://`) remains the default and will be assumed if a protocol is not specified (e.g., `127.0.0.1:8125`).

Also applies to: 13-15

src/init.cpp (2)

781-781: Deprecate -statsport while keeping backward compatibility — looks good

Registering -statsport as hidden preserves compatibility without surfacing it in help. Consider logging a one-time deprecation warning when -statsport is present to aid users migrating to URL-based configuration.


1643-1654: Early, fail-fast Statsd client init is appropriate; minor nit on return style

The early initialization avoids null dereferences in code paths that may emit stats during startup. Error propagation via util::ErrorString is clear. Nit: you can directly return the InitError value to be consistent with nearby code.

Apply this diff for the minor style tweak:

-        if (!stats_client) {
-            InitError(_("Cannot init Statsd client") + Untranslated(" (") + util::ErrorString(stats_client) + Untranslated(")"));
-            return false;
-        }
+        if (!stats_client) {
+            return InitError(_("Cannot init Statsd client") + Untranslated(" (") + util::ErrorString(stats_client) + Untranslated(")"));
+        }
src/stats/rawsender.h (3)

107-108: Vector as a queue can degrade performance; prefer deque or batch-swap

Switching m_queue to std::vector is fine if you always flush in bulk via swap-and-clear. If you ever erase from the front, complexity becomes O(n). Please confirm the .cpp only does bulk flushes or switch to std::deque.

If bulk flush is intended, consider documenting it in the member comment.

-    /* Queue of (batches of) messages to be sent */
-    std::vector<RawMessage> m_queue GUARDED_BY(cs);
+    /* Queue of (batches of) messages to be sent. NOTE: treated as append-and-bulk-flush only (no front erases). */
+    std::vector<RawMessage> m_queue GUARDED_BY(cs);

Also applies to: 88-90


58-60: Constructor error-out parameter is awkward

Using std::optional<bilingual_str>& as an out-param is unusual and easy to misuse (caller must pass a lvalue). Consider returning util::Result or std::optional<bilingual_str> directly from a static factory to keep the ctor noexcept.

If changing the signature is out of scope, at least document ownership and expected state of ‘error’ on success/failure.


111-112: Thread member naming nit

m_reconn is a std::thread; consider renaming to m_reconn_thread for clarity, matching the comment, or update the comment to match the name.

test/functional/feature_stats.py (4)

49-53: Use explicit guards instead of assert in test infra startup

assert may be stripped with -O. Prefer an explicit guard and AssertionError for reliability in all runs.

-        assert not self.running
+        if self.running:
+            raise AssertionError("StatsServer already running")

61-71: Tighten wait loop and reduce flakiness

Consider shorter get timeouts with a small sleep to reduce blocking and overall wait, and optionally log sampled messages to aid debugging on failure. This helps CI stability.


115-131: Exact debug-log matching can be brittle across platforms/build variants

If possible, match stable substrings or use regex to avoid failures due to minor wording changes. Example:

-with self.nodes[0].assert_debug_log(expected_msgs=['Transmitting stats are disabled, will not init Statsd client']):
+with self.nodes[0].assert_debug_log(expected_msgs=['stats are disabled', 'will not init Statsd client']):

Or add a regex-based helper in the framework.


25-33: AF selection: consider binding to the resolved sockaddr directly

addrinfo may return multiple entries; picking the first is usually fine, but for hosts that resolve to multiple families/records, iterating until bind succeeds is more robust.

Optional improvement:

  • Loop over addr_info entries, try socket/bind per entry.
  • Close the socket on bind failure before next attempt.
src/stats/client.h (2)

44-56: Consider making sample_rate parameter default = 1.0f instead of 1.f

While 1.f and 1.0f are functionally equivalent, using 1.0f would be more consistent with common C++ style guides and improve readability.

-    virtual bool dec(std::string_view key, float sample_rate = 1.f) { return false; }
-    virtual bool inc(std::string_view key, float sample_rate = 1.f) { return false; }
-    virtual bool count(std::string_view key, int64_t delta, float sample_rate = 1.f) { return false; }
-    virtual bool gauge(std::string_view key, int64_t value, float sample_rate = 1.f) { return false; }
-    virtual bool gaugeDouble(std::string_view key, double value, float sample_rate = 1.f) { return false; }
-    virtual bool timing(std::string_view key, uint64_t ms, float sample_rate = 1.f) { return false; }
+    virtual bool dec(std::string_view key, float sample_rate = 1.0f) { return false; }
+    virtual bool inc(std::string_view key, float sample_rate = 1.0f) { return false; }
+    virtual bool count(std::string_view key, int64_t delta, float sample_rate = 1.0f) { return false; }
+    virtual bool gauge(std::string_view key, int64_t value, float sample_rate = 1.0f) { return false; }
+    virtual bool gaugeDouble(std::string_view key, double value, float sample_rate = 1.0f) { return false; }
+    virtual bool timing(std::string_view key, uint64_t ms, float sample_rate = 1.0f) { return false; }

52-56: Same float literal consistency suggestion for send methods

-    virtual bool send(std::string_view key, double value, std::string_view type, float sample_rate = 1.f) { return false; }
-    virtual bool send(std::string_view key, int32_t value, std::string_view type, float sample_rate = 1.f) { return false; }
-    virtual bool send(std::string_view key, int64_t value, std::string_view type, float sample_rate = 1.f) { return false; }
-    virtual bool send(std::string_view key, uint32_t value, std::string_view type, float sample_rate = 1.f) { return false; }
-    virtual bool send(std::string_view key, uint64_t value, std::string_view type, float sample_rate = 1.f) { return false; }
+    virtual bool send(std::string_view key, double value, std::string_view type, float sample_rate = 1.0f) { return false; }
+    virtual bool send(std::string_view key, int32_t value, std::string_view type, float sample_rate = 1.0f) { return false; }
+    virtual bool send(std::string_view key, int64_t value, std::string_view type, float sample_rate = 1.0f) { return false; }
+    virtual bool send(std::string_view key, uint32_t value, std::string_view type, float sample_rate = 1.0f) { return false; }
+    virtual bool send(std::string_view key, uint64_t value, std::string_view type, float sample_rate = 1.0f) { return false; }
src/stats/client.cpp (1)

135-142: Complex lambda could be simplified

The inline lambda for counting forward slashes after the colon could be extracted to a named function or simplified for better readability.

-            // Remove all forward slashes found after the port delimiter (colon)
-            host = std::string(
-                host.begin(), host.end() - [&colon_idx, &host]() {
-                    const size_t slash_idx{host.find('/', /*pos=*/colon_idx + 1)};
-                    return slash_idx != std::string::npos ? host.length() - slash_idx : 0;
-                }());
+            // Remove all forward slashes found after the port delimiter (colon)
+            const size_t slash_idx{host.find('/', /*pos=*/colon_idx + 1)};
+            if (slash_idx != std::string::npos) {
+                host = host.substr(0, slash_idx);
+            }
src/stats/rawsender.cpp (2)

182-192: Consider extracting error codes to named constants

The long list of error codes for detecting broken connections could be more maintainable as a helper function or a set of constants.

+    static bool IsConnectionError(int err_code) {
+        return err_code == WSAECONNABORTED ||
+               err_code == WSAECONNREFUSED ||
+               err_code == WSAECONNRESET ||
+               err_code == WSAEHOSTUNREACH ||
+               err_code == WSAENETDOWN ||
+               err_code == WSAENETRESET ||
+               err_code == WSAENETUNREACH ||
+               err_code == WSAENOTCONN ||
+               err_code == WSAEPIPE ||
+               err_code == WSAETIMEDOUT;
+    }
+
     if (m_sock->Send(reinterpret_cast<const char*>(msg.data()), msg.size(), send_flags) == SOCKET_ERROR) {
         const auto err_code = WSAGetLastError();
-        if (err_code == WSAECONNABORTED ||
-            err_code == WSAECONNREFUSED ||
-            err_code == WSAECONNRESET ||
-            err_code == WSAEHOSTUNREACH ||
-            err_code == WSAENETDOWN ||
-            err_code == WSAENETRESET ||
-            err_code == WSAENETUNREACH ||
-            err_code == WSAENOTCONN ||
-            err_code == WSAEPIPE ||
-            err_code == WSAETIMEDOUT)
-        {
+        if (IsConnectionError(err_code)) {

250-260: Consider checking SendDirectly return value

QueueFlush calls SendDirectly but ignores the return value. While this might be intentional to avoid blocking on errors, consider at least logging failures for debugging purposes.

 void RawSender::QueueFlush(std::vector<RawMessage>& queue)
 {
     AssertLockNotHeld(cs);
     for (auto& msg : queue) {
         // Add delimiter to prevent unexpected concat if sends are consolidated
         if (m_use_tcp && !msg.empty() && msg.back() != m_batching_opts.second) {
             msg += m_batching_opts.second;
         }
-        SendDirectly(msg);
+        if (auto error = SendDirectly(msg)) {
+            LogPrint(BCLog::NET, "%s: Failed to send message: %s\n", __func__, error->original);
+        }
     }
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4cfb520 and 100b2e6.

📒 Files selected for processing (11)
  • doc/release-notes-6837.md (1 hunks)
  • src/bench/checkblock.cpp (1 hunks)
  • src/compat/compat.h (1 hunks)
  • src/init.cpp (2 hunks)
  • src/stats/client.cpp (4 hunks)
  • src/stats/client.h (2 hunks)
  • src/stats/rawsender.cpp (2 hunks)
  • src/stats/rawsender.h (4 hunks)
  • src/test/util/setup_common.cpp (1 hunks)
  • test/functional/feature_stats.py (1 hunks)
  • test/functional/test_runner.py (1 hunks)
🧰 Additional context used
🪛 LanguageTool
doc/release-notes-6837.md

[grammar] ~17-~17: There might be a mistake here.
Context: ...tionand-statsbatchsize` and flushed when connection is restored. This buffer is ...

(QB_NEW_EN)


[grammar] ~22-~22: There might be a mistake here.
Context: ...ew URL syntax supported by -statshost. -statsport will be removed in a futu...

(QB_NEW_EN)


[grammar] ~26-~26: There might be a mistake here.
Context: ... -statsport value will be ignored.

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)
doc/release-notes-6837.md

11-11: Heading style
Expected: setext; Actual: atx

(MD003, heading-style)


20-20: Heading style
Expected: setext; Actual: atx

(MD003, heading-style)

🪛 Ruff (0.12.2)
test/functional/feature_stats.py

44-44: Do not catch blind exception: Exception

(BLE001)


46-46: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


49-49: Use of assert detected

(S101)


70-70: Avoid specifying long messages outside the exception class

(TRY003)

🪛 GitHub Actions: Clang Diff Format Check
src/stats/client.h

[error] 1-1: Clang format differences detected by clang-format-diff.py. The formatting check failed (exit code 1). Command: git diff -U0 origin/develop -- $(git ls-files -- $(cat test/util/data/non-backported.txt)) | ./contrib/devtools/clang-format-diff.py -p1 > diff_output.txt. Please run clang-format -i on this file to apply the correct formatting.

src/stats/rawsender.cpp

[error] 1-1: Clang format differences detected by clang-format-diff.py. The formatting check failed (exit code 1). Command: git diff -U0 origin/develop -- $(git ls-files -- $(cat test/util/data/non-backported.txt)) | ./contrib/devtools/clang-format-diff.py -p1 > diff_output.txt. Please run clang-format -i on this file to apply the correct formatting.

src/stats/client.cpp

[error] 1-1: Clang format differences detected by clang-format-diff.py. The formatting check failed (exit code 1). Command: git diff -U0 origin/develop -- $(git ls-files -- $(cat test/util/data/non-backported.txt)) | ./contrib/devtools/clang-format-diff.py -p1 > diff_output.txt. Please run clang-format -i on this file to apply the correct formatting.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: x86_64-apple-darwin / Build depends
  • GitHub Check: x86_64-w64-mingw32 / Build depends
  • GitHub Check: x86_64-pc-linux-gnu_multiprocess / Build depends
  • GitHub Check: arm-linux-gnueabihf / Build depends
  • GitHub Check: x86_64-pc-linux-gnu_nowallet / Build depends
  • GitHub Check: x86_64-pc-linux-gnu / Build depends
🔇 Additional comments (13)
test/functional/test_runner.py (1)

310-310: Approve inclusion of feature_stats.py in BASE_SCRIPTS — test file present at test/functional/feature_stats.py defines StatsTest(BitcoinTestFramework).

src/test/util/setup_common.cpp (1)

201-207: Factory-based StatsdClient initialization with explicit failure — LGTM

Clear error propagation and scoped temporary keep things tidy. No changes requested.

src/stats/rawsender.h (2)

66-72: Thread-safety annotations look consistent; LGTM

The EXCLUSIVE_LOCKS_REQUIRED annotations for Send/SendDirectly/QueueFlush/Connect/Reconect are consistent with cs usage and aid static analysis.

Also applies to: 76-84, 85-90


113-121: Compile break: uses chrono literal ‘1s’ in header without literals namespace

Headers shouldn’t rely on std::chrono_literals; this will fail to compile for users not importing the literal. Initialize with std::chrono::seconds{1} instead.

-        std::chrono::seconds m_timeout{1s};
+        std::chrono::seconds m_timeout{std::chrono::seconds{1}};
⛔ Skipped due to learnings
Learnt from: kwvg
PR: dashpay/dash#6815
File: src/chainlock/chainlock.h:68-69
Timestamp: 2025-08-13T16:10:32.972Z
Learning: The Dash codebase extensively uses std::chrono_literals throughout, including in header files. Chrono literals like 0s, 1min, etc. are acceptable and preferred over explicit constructors like std::chrono::seconds::zero().
Learnt from: kwvg
PR: dashpay/dash#6815
File: src/chainlock/chainlock.h:68-69
Timestamp: 2025-08-13T16:10:32.972Z
Learning: The Dash codebase extensively uses std::chrono_literals throughout, including in header files. The pattern using namespace std::chrono_literals; is used globally in src/util/time.h, and chrono literals like 0s are commonly used for atomic member initialization in headers (e.g., src/net.h). Chrono literals like 0s, 1min, etc. are preferred over explicit constructors.
src/stats/client.h (1)

40-41: Good API design with factory pattern

The migration to a factory pattern with util::Result for error handling is a solid improvement. This provides better error propagation and a more modern C++ interface.

src/stats/client.cpp (3)

104-108: Good input validation for port range

The validation correctly ensures the port is within the valid range (1-65535) when explicitly set via -statsport.


120-159: Excellent URL parsing implementation

The URL parsing logic is well-structured:

  • Properly validates URL format and scheme
  • Correctly handles both tcp:// and udp:// schemes
  • Appropriately warns when URL port overrides -statsport
  • Good error messages for malformed URLs

200-207: Thread-safe random sampling implementation

Good use of mutex protection for the random number generator in the sampling logic.

src/stats/rawsender.cpp (5)

70-100: Well-structured connection logic

The Connect() method properly handles both UDP and TCP socket creation with appropriate error handling.


102-120: Good TCP reconnection strategy with exponential backoff

The reconnection logic correctly implements exponential backoff with a maximum timeout, which is a best practice for network reconnection.


54-68: Proper thread lifecycle management

The destructor correctly handles thread interruption and joining for both the main queue thread and reconnection thread, followed by queue flushing to prevent data loss.


225-248: Efficient batching implementation

The queue batching logic is well-implemented with proper size checks and delimiter handling.


122-154: Verify TCP reconnection behavior in production

The reconnection thread implementation looks correct, but TCP reconnection behavior under various network conditions should be thoroughly tested.

Could you verify that the TCP reconnection properly handles these scenarios:

  1. Server restart while client is running
  2. Network partition/recovery
  3. Long-duration disconnections
  4. Rapid connect/disconnect cycles

- Specifying invalid values will no longer result in silent disablement of the StatsD client and will now cause errors
at startup.

### New Features
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix headings and minor grammar/spacing in release notes

Conform to repo’s setext heading style and tighten wording.

Apply:

-### New Features
+New Features
+------------
@@
-  - Unlike with UDP, if a TCP connection is interrupted, stats will _not_ be dropped and instead buffered according to
-    `-statsduration` and `-statsbatchsize` and flushed when connection is restored. This buffer is discarded every 5
+  - Unlike with UDP, if a TCP connection is interrupted, stats will not be dropped and instead buffered according to
+    `-statsduration` and `-statsbatchsize` and flushed when the connection is restored. This buffer is discarded every 5
     minutes if connection is not restored by then.
@@
-### Deprecations
+Deprecations
+------------
@@
-  - If both `-statsport` and `-statshost` with a URL specifying a port is supplied, the `-statsport` value will be
-    ignored.
+  - If both `-statsport` and `-statshost` with a URL specifying a port are supplied, the `-statsport` value will be ignored.

Also applies to: 20-20, 16-18, 22-26

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

11-11: Heading style
Expected: setext; Actual: atx

(MD003, heading-style)

🤖 Prompt for AI Agents
In doc/release-notes-6837.md around line 11, the "### New Features" ATX-style
heading must be converted to the repo's setext heading style and minor
grammar/spacing tightened; replace the ATX heading with a setext-level underline
(e.g., "New Features" followed by a line of hyphens), normalize surrounding
blank lines, fix any punctuation/spacing inconsistencies, and tighten wording as
needed; apply the same conversion and edits to the other affected regions (lines
16-18, 20-20, and 22-26).

Comment on lines +44 to +46
except Exception as e:
if self.running:
raise AssertionError("Unexpected exception raised: " + type(e).__name__)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid catching broad Exception; preserve traceback context

Catch concrete exceptions (e.g., OSError, UnicodeDecodeError) and chain the assertion for debuggability.

-            except Exception as e:
-                if self.running:
-                    raise AssertionError("Unexpected exception raised: " + type(e).__name__)
+            except (OSError, UnicodeDecodeError) as e:
+                if self.running:
+                    raise AssertionError(f"Unexpected exception raised: {type(e).__name__}") from e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except Exception as e:
if self.running:
raise AssertionError("Unexpected exception raised: " + type(e).__name__)
except (OSError, UnicodeDecodeError) as e:
if self.running:
raise AssertionError(f"Unexpected exception raised: {type(e).__name__}") from e
🧰 Tools
🪛 Ruff (0.12.2)

44-44: Do not catch blind exception: Exception

(BLE001)


46-46: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents
In test/functional/feature_stats.py around lines 44 to 46, the code catches a
broad Exception and raises an AssertionError without preserving traceback;
change the except to catch the specific exceptions you expect (for example
OSError, UnicodeDecodeError) instead of Exception, and when raising the
AssertionError chain the original exception (raise AssertionError("Unexpected
exception raised: " + type(e).__name__) from e) so the original traceback is
preserved for debugging.

@kwvg kwvg force-pushed the statsd3 branch 4 times, most recently from 1630cc3 to 9c7ec07 Compare September 9, 2025 12:39
@kwvg kwvg marked this pull request as ready for review September 9, 2025 13:02
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/stats/rawsender.h (1)

95-101: Data race on m_sock/m_server: guard socket state with a dedicated mutex.

m_sock and m_server are read/written from multiple threads (queue, reconnect, direct send) without synchronization, leading to UB. Add a mutex (e.g., cs_sock) and guard all accesses.

@@
-    /* Socket used to communicate with host */
-    std::unique_ptr<Sock> m_sock{nullptr};
-    /* Socket address containing host information */
-    std::pair<struct sockaddr_storage, socklen_t> m_server{{}, sizeof(struct sockaddr_storage)};
+    /* Protect socket state (must be acquired before cs when both are needed) */
+    mutable Mutex cs_sock;
+    /* Socket used to communicate with host */
+    std::unique_ptr<Sock> m_sock GUARDED_BY(cs_sock){nullptr};
+    /* Socket address containing host information */
+    std::pair<struct sockaddr_storage, socklen_t> m_server GUARDED_BY(cs_sock){{}, sizeof(struct sockaddr_storage)};

Follow-up: wrap all reads/writes to m_sock/m_server in src/stats/rawsender.cpp with LOCK(cs_sock)/WITH_LOCK(cs_sock, ...).

Also applies to: 125-135

src/stats/client.cpp (1)

209-218: Bug: integer metrics are formatted with “%f”.

Using “%f” for all numeric types emits counters/timers as floats (e.g., 1.000000), which is non-standard and may break parsers. Format integers as integers; floats as floats.

-    RawMessage msg{strprintf("%s%s%s:%f|%s", m_prefix, key, m_suffix, value, type)};
+    auto fmt_value = [&](){
+        if constexpr (std::is_floating_point_v<T1>) {
+            return strprintf("%f", static_cast<double>(value));
+        } else {
+            return ToString(static_cast<int64_t>(value));
+        }
+    }();
+    RawMessage msg{strprintf("%s%s%s:%s|%s", m_prefix, key, m_suffix, fmt_value, type)};

Alternatively, add a small helper near the class to avoid the lambda.

♻️ Duplicate comments (2)
doc/release-notes-6837.md (1)

16-18: Minor grammar and clarity tweaks.

Tighten phrasing and fix agreement.

-  - Unlike with UDP, if a TCP connection is interrupted, stats will _not_ be dropped and instead buffered according to
-    `-statsduration` and `-statsbatchsize` and flushed when connection is restored. This buffer is discarded every 5
+  - Unlike with UDP, if a TCP connection is interrupted, stats are not dropped; they are buffered according to
+    `-statsduration` and `-statsbatchsize` and flushed when the connection is restored. This buffer is discarded every 5
     minutes if connection is not restored by then.
@@
-  - If both `-statsport` and `-statshost` with a URL specifying a port is supplied, the `-statsport` value will be
-    ignored.
+  - If both `-statsport` and `-statshost` with a URL specifying a port are supplied, the `-statsport` value is ignored.

Also applies to: 25-26

test/functional/feature_stats.py (1)

44-47: Avoid catching broad Exception; preserve traceback.

Catch concrete exceptions and chain for debuggability.

-            except Exception as e:
-                if self.running:
-                    raise AssertionError("Unexpected exception raised: " + type(e).__name__)
+            except (OSError, UnicodeDecodeError) as e:
+                if self.running:
+                    raise AssertionError(f"Unexpected exception raised: {type(e).__name__}") from e
🧹 Nitpick comments (4)
doc/release-notes-6837.md (1)

11-12: Use setext headings (repo style) for sections.

Convert “### New Features” and “### Deprecations” to setext style to satisfy markdownlint MD003.

-### New Features
+New Features
+------------
@@
-### Deprecations
+Deprecations
+------------

Also applies to: 20-20

src/stats/rawsender.h (1)

125-135: Minor: IPv6 log formatting.

To avoid ambiguous “host:port” in logs for IPv6, bracket IPv6 hosts in ToStringHostPort().

-    std::string ToStringHostPort() const;
+    std::string ToStringHostPort() const; // Consider [host]:port for IPv6

I’ve proposed the concrete change in rawsender.cpp.

test/functional/feature_stats.py (1)

115-132: Add TCP path coverage and reconnection buffering test.

Current tests only cover UDP. Add a simple TCP StatsServer (or a minimal sink), simulate a brief disconnect, assert buffered resend after reconnection, and add IPv6 happy-path when test_ipv6_local() is re-enabled.

I can draft a TCP test harness and a reconnection test case mirroring test_conn().

src/stats/rawsender.cpp (1)

230-231: Bracket IPv6 address in ToStringHostPort().

Avoid ambiguous logs for IPv6.

-std::string RawSender::ToStringHostPort() const { return strprintf("%s:%d", m_host, m_port); }
+std::string RawSender::ToStringHostPort() const {
+    // Bracket IPv6 for readability in logs
+    return (m_host.find(':') != std::string::npos && m_host.find(']') == std::string::npos)
+        ? strprintf("[%s]:%d", m_host, m_port)
+        : strprintf("%s:%d", m_host, m_port);
+}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 100b2e6 and 9c7ec07.

📒 Files selected for processing (11)
  • doc/release-notes-6837.md (1 hunks)
  • src/bench/checkblock.cpp (2 hunks)
  • src/compat/compat.h (1 hunks)
  • src/init.cpp (2 hunks)
  • src/stats/client.cpp (4 hunks)
  • src/stats/client.h (2 hunks)
  • src/stats/rawsender.cpp (2 hunks)
  • src/stats/rawsender.h (4 hunks)
  • src/test/util/setup_common.cpp (1 hunks)
  • test/functional/feature_stats.py (1 hunks)
  • test/functional/test_runner.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • test/functional/test_runner.py
  • src/compat/compat.h
  • src/bench/checkblock.cpp
  • src/init.cpp
  • src/test/util/setup_common.cpp
🧰 Additional context used
📓 Path-based instructions (3)
src/**/*.{cpp,h,cc,cxx,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction

Files:

  • src/stats/client.cpp
  • src/stats/rawsender.cpp
  • src/stats/rawsender.h
  • src/stats/client.h
doc/**

📄 CodeRabbit inference engine (CLAUDE.md)

Unless specifically prompted, avoid making changes to the doc directory (documentation)

Files:

  • doc/release-notes-6837.md
test/functional/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Functional tests should be written in Python and placed in test/functional/

Files:

  • test/functional/feature_stats.py
🧬 Code graph analysis (4)
src/stats/rawsender.cpp (1)
src/stats/rawsender.h (1)
  • RawSender (56-135)
src/stats/rawsender.h (1)
src/stats/rawsender.cpp (18)
  • RawSender (21-52)
  • RawSender (54-68)
  • Send (161-172)
  • Send (161-161)
  • SendDirectly (174-228)
  • SendDirectly (174-174)
  • ToStringHostPort (230-230)
  • ToStringHostPort (230-230)
  • Connect (70-100)
  • Connect (70-70)
  • Reconnect (102-120)
  • Reconnect (102-102)
  • ReconnectThread (122-159)
  • ReconnectThread (122-122)
  • QueueAdd (232-255)
  • QueueAdd (232-232)
  • QueueFlush (257-269)
  • QueueFlush (257-257)
src/stats/client.h (1)
src/stats/client.cpp (18)
  • make (92-172)
  • make (92-92)
  • key (53-53)
  • key (53-53)
  • key (54-54)
  • key (54-54)
  • key (55-55)
  • key (55-55)
  • key (56-56)
  • key (56-56)
  • key (57-57)
  • key (57-57)
  • key (58-58)
  • key (58-58)
  • key (61-61)
  • key (61-61)
  • key (62-62)
  • key (62-62)
test/functional/feature_stats.py (2)
test/functional/test_framework/netutil.py (1)
  • test_ipv6_local (148-162)
test/functional/test_framework/test_node.py (2)
  • assert_start_raises_init_error (606-649)
  • assert_debug_log (444-472)
🪛 GitHub Actions: Clang Diff Format Check
src/stats/client.cpp

[error] 52-68: Clang-format differences found in src/stats/client.cpp. Run 'clang-format-diff.py -p1' to apply formatting changes.

src/stats/rawsender.cpp

[error] 185-203: Clang-format differences found in src/stats/rawsender.cpp. Run 'clang-format-diff.py -p1' to apply formatting changes.

src/stats/client.h

[error] 49-59: Clang-format differences found in src/stats/client.h. Run 'clang-format-diff.py -p1' to apply formatting changes.

🪛 markdownlint-cli2 (0.17.2)
doc/release-notes-6837.md

11-11: Heading style
Expected: setext; Actual: atx

(MD003, heading-style)


20-20: Heading style
Expected: setext; Actual: atx

(MD003, heading-style)

🪛 Ruff (0.12.2)
test/functional/feature_stats.py

44-44: Do not catch blind exception: Exception

(BLE001)


46-46: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


70-70: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: win64-build / Build source
  • GitHub Check: linux64_multiprocess-build / Build source
  • GitHub Check: linux64_tsan-build / Build source
  • GitHub Check: arm-linux-build / Build source
  • GitHub Check: mac-build / Build source
  • GitHub Check: linux64_nowallet-build / Build source
  • GitHub Check: linux64_ubsan-build / Build source
  • GitHub Check: linux64_sqlite-build / Build source
  • GitHub Check: linux64_fuzz-build / Build source
  • GitHub Check: linux64-build / Build source
🔇 Additional comments (5)
src/stats/client.h (2)

40-60: Public API surface looks sane.

Factory-returned inactive client on empty -statshost avoids nullptr checks; string_view overloads are good for perf.


49-59: Install clang-format and reformat src/stats/client.h
CI reported formatting issues but clang-format isn’t available; install clang-format locally and run

clang-format --style=file -i src/stats/client.h

then commit the updated file.

src/stats/rawsender.cpp (1)

185-203: Verify clang-format formatting: clang-format isn’t available in this environment; please run

git clang-format --extensions=cpp HEAD~1 -- src/stats/rawsender.cpp

locally to confirm the hunk meets project style.

src/stats/client.cpp (2)

92-171: URL/host parsing looks good; port precedence log is helpful.

Factory path and error messages align with release notes; bilingual errors propagate correctly.


52-68: Manual clang-format verification required
Unable to run clang-format in this environment—please manually run clang-format --style=file on src/stats/client.cpp and address any formatting diffs flagged by CI.

@kwvg kwvg marked this pull request as draft September 9, 2025 14:27
@kwvg kwvg marked this pull request as ready for review September 9, 2025 23:20
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/stats/client.cpp (2)

201-209: Sampling uses std::uniform_real_distribution with FastRandomContext (likely not a URBG) and will fail to compile; replace with FastRandomContext::randrange().

FastRandomContext doesn’t model UniformRandomBitGenerator; passing it to standard distributions is invalid. Use randrange() and compare against a scaled threshold. Also, keep the WITH_LOCK pattern but return a bool value, not a return-from-function.

-    if (never_send || (!always_send &&
-                       WITH_LOCK(cs, return sample_rate < std::uniform_real_distribution<float>(0.f, 1.f)(insecure_rand)))) {
+    if (never_send || (!always_send &&
+                       WITH_LOCK(cs, return insecure_rand.randrange(1'000'000ULL) >= static_cast<uint64_t>(sample_rate * 1'000'000ULL)))) {
         return true;
     }

210-215: Format string type-mismatch: using “%f” for all T1 breaks integer sends.

strprintf with “%f” requires double; passing integers is UB. Format value type-agnostically, e.g., via ToString(value).

-    RawMessage msg{strprintf("%s%s%s:%f|%s", m_prefix, key, m_suffix, value, type)};
+    RawMessage msg{strprintf("%s%s%s:%s|%s", m_prefix, key, m_suffix, ToString(value), type)};
♻️ Duplicate comments (2)
doc/release-notes-6837.md (1)

11-18: Unify heading style (setext) and tighten phrasing

Convert ATX headings to the repo’s setext style and fix minor grammar/spacing. Matches prior feedback.

-### New Features
+New Features
+------------

-  - Unlike with UDP, if a TCP connection is interrupted, stats will _not_ be dropped and instead buffered according to
-    `-statsduration` and `-statsbatchsize` and flushed when connection is restored. This buffer is discarded every 5
+  - Unlike with UDP, if a TCP connection is interrupted, stats will not be dropped and instead buffered according to
+    `-statsduration` and `-statsbatchsize` and flushed when the connection is restored. This buffer is discarded every 5
     minutes if connection is not restored by then.
@@
-### Deprecations
+Deprecations
+------------
@@
-  - If both `-statsport` and `-statshost` with a URL specifying a port is supplied, the `-statsport` value will be
-    ignored.
+  - If both `-statsport` and `-statshost` with a URL specifying a port are supplied, the `-statsport` value will be ignored.

Also applies to: 20-26

test/functional/feature_stats.py (1)

44-46: Don’t catch broad Exception; preserve traceback

Limit to expected exceptions and chain for debuggability. Mirrors earlier guidance.

-            except Exception as e:
-                if self.running:
-                    raise AssertionError("Unexpected exception raised: " + type(e).__name__)
+            except (OSError, UnicodeDecodeError) as e:
+                if self.running:
+                    raise AssertionError(f"Unexpected exception raised: {type(e).__name__}") from e
🧹 Nitpick comments (8)
test/functional/feature_stats.py (2)

61-70: Make queue polling respect the remaining deadline

Avoid blocking longer than the overall timeout when no messages arrive.

-    def assert_msg_received(self, expected_msg: str, timeout: int = 30):
+    def assert_msg_received(self, expected_msg: str, timeout: int = 30):
         deadline = time.time() + timeout
         while time.time() < deadline:
             try:
-                msg = self.queue.get(timeout=5)
+                remaining = max(0.0, deadline - time.time())
+                # Cap per-get wait to remaining time (and avoid 0 which makes get non-blocking).
+                msg = self.queue.get(timeout=max(0.1, min(5.0, remaining)))
                 if expected_msg in msg:
                     return
             except queue.Empty:
                 continue
         raise AssertionError(f"Did not receive message containing '{expected_msg}' within {timeout} seconds")

15-15: Simplify imports and avoid runtime-parametrized Queue typing

Use a single import and a non-parameterized annotation to avoid potential runtime typing issues across Python versions.

-from queue import Queue
+# Use the module form to reference Queue and Empty consistently
-        self.queue: Queue[str] = Queue()
+        self.queue: queue.Queue = queue.Queue()

Also applies to: 23-23

src/stats/rawsender.cpp (3)

16-20: Don’t rely on chrono literals in .cpp either

5min requires using namespace std::chrono_literals;. Make it self-contained to avoid hidden dependencies.

-constexpr std::chrono::seconds RECONNECT_TIMEOUT_MAX{5min};
+constexpr std::chrono::seconds RECONNECT_TIMEOUT_MAX{std::chrono::minutes{5}};

125-163: Reconnection “stale buffer” timestamp semantics can drop fresh messages

m_reconn_stats.m_timestamp is set only on discard, not when items first enter the reconnection buffer. On first loop (timestamp==0), you treat the buffer as stale even if items were just added. Consider setting the timestamp when the reconnection queue first becomes non-empty (e.g., in SendDirectly() when pushing the first item) and/or when you enqueue during send error handling.

Minimal change sketch (update when first enqueue happens):

 // inside the two places that enqueue to m_reconn_queue in SendDirectly()
 WITH_LOCK(cs, QueueAdd(m_reconn_queue, msg));
+WITH_LOCK(cs_net, if (m_reconn_stats.m_timestamp == 0) m_reconn_stats.m_timestamp = GetTime());

240-263: Batch size type mixing

queue.back().size() + msg.size() >= batch_size mixes size_t with uint64_t. Cast batch_size to size_t to avoid signed/unsigned or width warnings on some platforms, and reserve with the same type.

-    const auto& [batch_size, batch_delim] = m_batching_opts;
+    const auto& [batch_size_u64, batch_delim] = m_batching_opts;
+    const auto batch_size = static_cast<size_t>(batch_size_u64);

And adjust the two uses of batch_size accordingly.

src/stats/client.cpp (3)

137-139: Gate this notice behind -debug=net per PR intent.

Switch LogPrintf to LogPrint(BCLog::NET, ...) to keep normal startup quiet.

-                    LogPrintf("%s: Supplied URL with port, ignoring -statsport\n", __func__);
+                    LogPrint(BCLog::NET, "%s: Supplied URL with port, ignoring -statsport\n", __func__);

174-189: Initialization log should also be gated behind -debug=net.

Keep verbose network destination details under BCLog::NET per PR description.

-    LogPrintf("%s: Initialized to transmit stats to %s:%d\n", __func__, host, port);
+    LogPrint(BCLog::NET, "%s: Initialized to transmit stats to %s:%d\n", __func__, host, port);

92-99: StatsdClient is non-abstract and default-constructible
All virtual methods in StatsdClient have in-class definitions and there are no other constructors declared, so the compiler provides an implicit public default ctor. Thus, std::make_unique<StatsdClient>() compiles and yields a valid no-op client. Optionally, you may prefer returning nullptr to more explicitly signal “inactive.”

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9c7ec07 and 1e750cc.

📒 Files selected for processing (11)
  • doc/release-notes-6837.md (1 hunks)
  • src/bench/checkblock.cpp (2 hunks)
  • src/compat/compat.h (1 hunks)
  • src/init.cpp (3 hunks)
  • src/stats/client.cpp (4 hunks)
  • src/stats/client.h (2 hunks)
  • src/stats/rawsender.cpp (1 hunks)
  • src/stats/rawsender.h (3 hunks)
  • src/test/util/setup_common.cpp (1 hunks)
  • test/functional/feature_stats.py (1 hunks)
  • test/functional/test_runner.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/test/util/setup_common.cpp
  • src/bench/checkblock.cpp
  • src/compat/compat.h
  • test/functional/test_runner.py
🧰 Additional context used
📓 Path-based instructions (3)
src/**/*.{cpp,h,cc,cxx,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction

Files:

  • src/stats/rawsender.cpp
  • src/init.cpp
  • src/stats/rawsender.h
  • src/stats/client.h
  • src/stats/client.cpp
test/functional/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Functional tests should be written in Python and placed in test/functional/

Files:

  • test/functional/feature_stats.py
doc/**

📄 CodeRabbit inference engine (CLAUDE.md)

Unless specifically prompted, avoid making changes to the doc directory (documentation)

Files:

  • doc/release-notes-6837.md
🧠 Learnings (8)
📚 Learning: 2025-09-09T21:36:31.495Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:31.495Z
Learning: In RawSender class (src/stats/rawsender.cpp), m_sock access is protected by the cs_net mutex. This mutex coordinates between the TCP send path in SendDirectly() and the reconnection operations in Connect()/Reconnect() methods, ensuring proper synchronization of socket state.

Applied to files:

  • src/stats/rawsender.cpp
  • src/stats/rawsender.h
📚 Learning: 2025-09-09T21:36:11.805Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:11.805Z
Learning: In RawSender class (src/stats/rawsender.cpp), cs_net is the appropriate mutex for protecting socket access (m_sock) and network operations, not additional custom locks. The implementation correctly uses cs_net with GUARDED_BY annotations and EXCLUSIVE_LOCKS_REQUIRED to synchronize socket access between SendDirectly() and ReconnectThread().

Applied to files:

  • src/stats/rawsender.cpp
  • src/stats/rawsender.h
📚 Learning: 2025-09-09T21:36:31.495Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:31.495Z
Learning: In RawSender class (src/stats/rawsender.cpp), m_sock access requires synchronization using the cs_net mutex. The race condition exists between SendDirectly() (callable from any thread) and Connect() (called by ReconnectThread), where both read and write m_sock concurrently. The cs_net mutex properly coordinates these network operations.

Applied to files:

  • src/stats/rawsender.cpp
  • src/stats/rawsender.h
📚 Learning: 2025-09-09T21:36:58.937Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:223-232
Timestamp: 2025-09-09T21:36:58.937Z
Learning: In RawSender class (src/stats/rawsender.cpp), socket operations are properly synchronized using the cs_net mutex. The m_sock and m_server members are GUARDED_BY(cs_net), and methods like Connect(), SendDirectly(), and ReconnectThread() use appropriate locking with cs_net to prevent race conditions on socket access.

Applied to files:

  • src/stats/rawsender.cpp
  • src/stats/rawsender.h
📚 Learning: 2025-09-09T13:24:23.293Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:70-97
Timestamp: 2025-09-09T13:24:23.293Z
Learning: In RawSender class, m_sock and m_server members are accessed concurrently between SendDirectly() (which can be called from any thread) and Connect() (called by ReconnectThread), requiring proper synchronization even though Connect() has limited call sites.

Applied to files:

  • src/stats/rawsender.cpp
  • src/stats/rawsender.h
📚 Learning: 2025-09-09T21:36:58.937Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:223-232
Timestamp: 2025-09-09T21:36:58.937Z
Learning: In RawSender class (src/stats/rawsender.cpp), socket access operations (m_sock and m_server) should be protected by the cs_net mutex, not avoiding synchronization entirely. While lock overhead concerns are valid in general, socket operations require proper synchronization via cs_net.

Applied to files:

  • src/stats/rawsender.cpp
  • src/stats/rawsender.h
📚 Learning: 2025-09-09T21:36:11.805Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:11.805Z
Learning: In RawSender class (src/stats/rawsender.cpp), cs_net is the appropriate mutex for protecting socket access (m_sock) and network operations, not additional custom locks.

Applied to files:

  • src/stats/rawsender.cpp
  • src/stats/rawsender.h
📚 Learning: 2025-07-29T14:32:48.369Z
Learnt from: kwvg
PR: dashpay/dash#6761
File: src/chainlock/signing.cpp:247-250
Timestamp: 2025-07-29T14:32:48.369Z
Learning: In PR #6761, kwvg acknowledged a null pointer check issue in ChainLockSigner::Cleanup() method but deferred it to follow-up, consistent with the pattern of avoiding scope creep in refactoring PRs.

Applied to files:

  • src/stats/rawsender.cpp
🧬 Code graph analysis (6)
src/stats/rawsender.cpp (3)
src/stats/rawsender.h (1)
  • RawSender (57-136)
src/stats/client.cpp (3)
  • Schedule (191-194)
  • Schedule (191-191)
  • scheduler (53-53)
src/util/time.h (1)
  • count_seconds (56-56)
src/init.cpp (1)
src/stats/client.cpp (2)
  • make (92-172)
  • make (92-92)
src/stats/rawsender.h (3)
src/stats/rawsender.cpp (22)
  • RawSender (22-50)
  • RawSender (52-62)
  • Schedule (64-68)
  • Schedule (64-64)
  • Send (165-177)
  • Send (165-165)
  • SendDirectly (179-236)
  • SendDirectly (179-179)
  • ToStringHostPort (238-238)
  • ToStringHostPort (238-238)
  • Connect (70-102)
  • Connect (70-70)
  • Reconnect (104-123)
  • Reconnect (104-104)
  • ReconnectThread (125-163)
  • ReconnectThread (125-125)
  • QueueAdd (240-263)
  • QueueAdd (240-240)
  • QueueFlush (265-279)
  • QueueFlush (265-265)
  • QueueThreadMain (281-290)
  • QueueThreadMain (281-281)
src/stats/client.cpp (2)
  • Schedule (191-194)
  • Schedule (191-191)
src/util/sock.cpp (8)
  • Send (68-71)
  • Send (68-68)
  • Connect (78-81)
  • Connect (78-78)
  • Sock (46-46)
  • Sock (48-48)
  • Sock (50-54)
  • Sock (56-56)
test/functional/feature_stats.py (2)
test/functional/test_framework/netutil.py (1)
  • test_ipv6_local (148-162)
test/functional/test_framework/test_node.py (2)
  • assert_start_raises_init_error (606-649)
  • assert_debug_log (444-472)
src/stats/client.h (2)
src/stats/client.cpp (21)
  • make (92-172)
  • make (92-92)
  • Schedule (191-194)
  • Schedule (191-191)
  • scheduler (53-53)
  • key (55-55)
  • key (55-55)
  • key (56-56)
  • key (56-56)
  • key (57-57)
  • key (57-57)
  • key (58-58)
  • key (58-58)
  • key (59-59)
  • key (59-59)
  • key (60-60)
  • key (60-60)
  • key (62-62)
  • key (62-62)
  • key (63-63)
  • key (63-63)
src/stats/rawsender.cpp (2)
  • Schedule (64-68)
  • Schedule (64-64)
src/stats/client.cpp (3)
src/random.h (1)
  • max (256-256)
src/stats/rawsender.cpp (2)
  • Schedule (64-68)
  • Schedule (64-64)
src/node/interfaces.cpp (9)
  • Assert (92-92)
  • Assert (93-93)
  • Assert (117-117)
  • Assert (259-259)
  • Assert (282-282)
  • Assert (404-404)
  • Assert (605-608)
  • Assert (817-817)
  • Assert (1123-1126)
🪛 GitHub Actions: Clang Diff Format Check
src/stats/rawsender.cpp

[error] 1-1: Clang-format differences detected in src/stats/rawsender.cpp. Run clang-format (or clang-format-diff.py) to fix formatting.

src/stats/client.h

[error] 1-1: Clang-format differences detected in src/stats/client.h. Run clang-format (or clang-format-diff.py) to fix formatting.

src/stats/client.cpp

[error] 1-1: Clang-format differences detected in src/stats/client.cpp. Run clang-format (or clang-format-diff.py) to fix formatting.

🪛 Ruff (0.12.2)
test/functional/feature_stats.py

44-44: Do not catch blind exception: Exception

(BLE001)


46-46: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


70-70: Avoid specifying long messages outside the exception class

(TRY003)

🪛 markdownlint-cli2 (0.17.2)
doc/release-notes-6837.md

11-11: Heading style
Expected: setext; Actual: atx

(MD003, heading-style)


20-20: Heading style
Expected: setext; Actual: atx

(MD003, heading-style)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: linux64_multiprocess-test / Test source
🔇 Additional comments (14)
src/init.cpp (3)

786-786: Deprecate -statsport via hidden arg — LGTM

Hiding the flag while keeping it functional aligns with the deprecation plan and tests that verify precedence.


1648-1657: Early StatsD init: confirm no dependency on later -dns gating

Initializing g_stats_client here avoids null access, but it precedes the point where fNameLookup/-dns is processed (Line 1833). Please confirm StatsdClient::make doesn’t rely on that toggle or other state set later; otherwise, consider deferring hostname DNS decisions to the sender’s Schedule path.

Would you like a quick grep script to locate any name-resolution in StatsdClient::make/RawSender for confirmation?


2305-2305: Always calling Schedule() — LGTM, assuming it’s a no-op when inactive

If Schedule() is a no-op for inactive clients, this is fine and keeps reconnection/queueing consistent.

src/stats/rawsender.h (2)

68-76: Thread-safety contract on Send/SendDirectly is clear and correct

The annotations and split between queuing and direct send look good w.r.t. cs and cs_net.


106-114: Avoid header-only chrono literal; use plain seconds to prevent ADL issues

Using 1s in a header relies on std::chrono_literals being visible to every TU. Initialize with an integer to avoid that dependency.

-        /* Time between reconnection attempts */
-        std::chrono::seconds m_timeout{1s};
+        /* Time between reconnection attempts */
+        std::chrono::seconds m_timeout{1};
⛔ Skipped due to learnings
Learnt from: kwvg
PR: dashpay/dash#6815
File: src/chainlock/chainlock.h:68-69
Timestamp: 2025-08-13T16:10:32.972Z
Learning: The Dash codebase extensively uses std::chrono_literals throughout, including in header files. Chrono literals like 0s, 1min, etc. are acceptable and preferred over explicit constructors like std::chrono::seconds::zero().
Learnt from: kwvg
PR: dashpay/dash#6815
File: src/chainlock/chainlock.h:68-69
Timestamp: 2025-08-13T16:10:32.972Z
Learning: The Dash codebase extensively uses std::chrono_literals throughout, including in header files. The pattern using namespace std::chrono_literals; is used globally in src/util/time.h, and chrono literals like 0s are commonly used for atomic member initialization in headers (e.g., src/net.h). Chrono literals like 0s, 1min, etc. are preferred over explicit constructors.
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:31.495Z
Learning: In RawSender class (src/stats/rawsender.cpp), m_sock access is protected by the cs_net mutex. This mutex coordinates between the TCP send path in SendDirectly() and the reconnection operations in Connect()/Reconnect() methods, ensuring proper synchronization of socket state.
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:11.805Z
Learning: In RawSender class (src/stats/rawsender.cpp), cs_net is the appropriate mutex for protecting socket access (m_sock) and network operations, not additional custom locks. The implementation correctly uses cs_net with GUARDED_BY annotations and EXCLUSIVE_LOCKS_REQUIRED to synchronize socket access between SendDirectly() and ReconnectThread().
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:31.495Z
Learning: In RawSender class (src/stats/rawsender.cpp), m_sock access requires synchronization using the cs_net mutex. The race condition exists between SendDirectly() (callable from any thread) and Connect() (called by ReconnectThread), where both read and write m_sock concurrently. The cs_net mutex properly coordinates these network operations.
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:223-232
Timestamp: 2025-09-09T21:36:58.937Z
Learning: In RawSender class (src/stats/rawsender.cpp), socket operations are properly synchronized using the cs_net mutex. The m_sock and m_server members are GUARDED_BY(cs_net), and methods like Connect(), SendDirectly(), and ReconnectThread() use appropriate locking with cs_net to prevent race conditions on socket access.
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:11.805Z
Learning: In RawSender class (src/stats/rawsender.cpp), cs_net is the appropriate mutex for protecting socket access (m_sock) and network operations, not additional custom locks.
src/stats/rawsender.cpp (1)

1-1: Fix clang-format diff check

CI reports clang-format differences for this file. Please run clang-format (or clang-format-diff.py -p1) on the touched hunks.

src/stats/client.h (2)

41-64: API surface looks good; factory + string_view overloads

Factory-based init, Schedule, and string_view overloads are a solid improvement. No issues spotted.


1-1: Fix clang-format diff check

CI flags formatting for this header as well. Please run clang-format on the changed regions.

src/stats/client.cpp (6)

44-87: Impl structure looks solid.

Nice encapsulation via a private implementation and clean surface on StatsdClient. Mutex + FastRandomContext state is scoped correctly.


100-109: Port range validation is good.

Covers 1..65535 and returns a bilingual error. No action needed.


151-159: Prefix/suffix sanitizer looks good.

Trims leading/trailing dots as expected. No action.


191-195: Scheduler pass-through is clean.

Assert + delegate to RawSender::Schedule() matches RawSender’s contract.


217-219: Good: send-failure logs are gated to NET.

Matches the “gate logs behind -debug=net” objective.


1-224: Ensure clang-format is installed and reformat src/stats/client.cpp
CI formatting checks are failing because clang-format isn’t available in this environment; install the CI-matching clang-format, run:

clang-format -i src/stats/client.cpp

and verify there are no formatting diffs to unblock the pipeline.

Comment on lines 9 to 21
#include <logging.h>
#include <random.h>
#include <stats/rawsender.h>
#include <sync.h>
#include <util/check.h>
#include <util/strencodings.h>
#include <util/system.h>
#include <util/translation.h>

#include <cmath>
#include <limits>
#include <random>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Missing include for std::clamp (compile error on some toolchains).

std::clamp is used later but isn’t included. Add it explicitly to avoid build breaks with stricter compilers.

 #include <logging.h>
 #include <random.h>
 #include <stats/rawsender.h>
 #include <sync.h>
 #include <util/check.h>
 #include <util/strencodings.h>
 #include <util/system.h>
 #include <util/translation.h>

 #include <cmath>
+#include <algorithm>
 #include <limits>
 #include <random>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#include <logging.h>
#include <random.h>
#include <stats/rawsender.h>
#include <sync.h>
#include <util/check.h>
#include <util/strencodings.h>
#include <util/system.h>
#include <util/translation.h>
#include <cmath>
#include <limits>
#include <random>
#include <logging.h>
#include <random.h>
#include <stats/rawsender.h>
#include <sync.h>
#include <util/check.h>
#include <util/strencodings.h>
#include <util/system.h>
#include <util/translation.h>
#include <cmath>
#include <algorithm>
#include <limits>
#include <random>
🤖 Prompt for AI Agents
In src/stats/client.cpp around lines 9 to 21, the file uses std::clamp later but
does not include <algorithm>, which causes compile failures on stricter
toolchains; add an explicit include for <algorithm> among the other headers at
the top of the file to ensure std::clamp is available and rebuild.

Comment on lines +110 to +149
// Could be a URL, try to parse it.
const size_t scheme_idx{host.find(URL_SCHEME_DELIMITER)};
if (scheme_idx != std::string::npos) {
// Parse the scheme and trim it out of the URL if we succeed
if (scheme_idx == 0) {
return util::Error{_("No text before the scheme delimiter, malformed URL")};
}
std::string scheme{ToLower(host.substr(/*pos=*/0, scheme_idx))};
if (scheme != "tcp" && scheme != "udp") {
return util::Error{_("Unsupported URL scheme, must begin with tcp:// or udp://")};
}
use_tcp = scheme == "tcp";
host = host.substr(scheme_idx + URL_SCHEME_DELIMITER.length());

// Strip trailing slashes and parse the port
const size_t colon_idx{host.rfind(':')};
if (colon_idx != std::string::npos) {
// Remove all forward slashes found after the port delimiter (colon)
host = std::string(
host.begin(), host.end() - [&colon_idx, &host]() {
const size_t slash_idx{host.find('/', /*pos=*/colon_idx + 1)};
return slash_idx != std::string::npos ? host.length() - slash_idx : 0;
}());
uint16_t port_url{0};
SplitHostPort(host, port_url, host);
if (port_url != 0) {
if (args.IsArgSet("-statsport")) {
LogPrintf("%s: Supplied URL with port, ignoring -statsport\n", __func__);
}
port = port_url;
}
} else {
// There was no port specified, remove everything after the first forward slash
host = host.substr(/*pos=*/0, host.find("/"));
}

if (host.empty()) {
return util::Error{_("No host specified, malformed URL")};
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

URL/host parsing: avoid aliasing in SplitHostPort and simplify path trimming; improve IPv6 edge handling.

  • Passing the same variable as both input and output to SplitHostPort risks aliasing surprises depending on its implementation.
  • The erase logic around slicing after the colon is hard to read; simpler substring ops are clearer.
  • Consider handling bracketless IPv6 explicitly (require brackets when port is present), returning a helpful error.
-        const size_t colon_idx{host.rfind(':')};
-        if (colon_idx != std::string::npos) {
-            // Remove all forward slashes found after the port delimiter (colon)
-            host = std::string(
-                host.begin(), host.end() - [&colon_idx, &host]() {
-                    const size_t slash_idx{host.find('/', /*pos=*/colon_idx + 1)};
-                    return slash_idx != std::string::npos ? host.length() - slash_idx : 0;
-                }());
-            uint16_t port_url{0};
-            SplitHostPort(host, port_url, host);
+        const size_t colon_idx{host.rfind(':')};
+        if (colon_idx != std::string::npos) {
+            // Trim any path/query after the port
+            const size_t slash_idx{host.find('/', colon_idx + 1)};
+            if (slash_idx != std::string::npos) host.erase(slash_idx);
+            uint16_t port_url{0};
+            std::string host_in = host; // avoid aliasing input/output
+            SplitHostPort(host_in, port_url, host);
             if (port_url != 0) {
-                if (args.IsArgSet("-statsport")) {
-                    LogPrintf("%s: Supplied URL with port, ignoring -statsport\n", __func__);
-                }
+                if (args.IsArgSet("-statsport")) {
+                    LogPrint(BCLog::NET, "%s: Supplied URL with port, ignoring -statsport\n", __func__);
+                }
                 port = port_url;
             }
         } else {
             // There was no port specified, remove everything after the first forward slash
             host = host.substr(/*pos=*/0, host.find("/"));
         }

Optionally, reject bracketless IPv6 with a port:

if (host.find(':') != std::string::npos && host.find(']') == std::string::npos && host.find('[') == std::string::npos && host.find(':', host.rfind(':')+1) == std::string::npos) {
    return util::Error{_("IPv6 address with port must be enclosed in []")};
}

Comment on lines +161 to +167
std::optional<bilingual_str> error_opt;
auto statsd_ptr = std::make_unique<StatsdClientImpl>(
host, port, use_tcp, args.GetIntArg("-statsbatchsize", DEFAULT_STATSD_BATCH_SIZE),
args.GetIntArg("-statsduration", DEFAULT_STATSD_DURATION),
sanitize_string(args.GetArg("-statsprefix", DEFAULT_STATSD_PREFIX)),
sanitize_string(args.GetArg("-statssuffix", DEFAULT_STATSD_SUFFIX)), error_opt);
if (error_opt.has_value()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Validate -statsbatchsize and -statsduration ranges (prevent negative/overflow misuse).

GetIntArg returns signed; negative values will wrap when cast to uint64_t. Either clamp to sane bounds or error out.

-    auto statsd_ptr = std::make_unique<StatsdClientImpl>(
-        host, port, use_tcp, args.GetIntArg("-statsbatchsize", DEFAULT_STATSD_BATCH_SIZE),
-        args.GetIntArg("-statsduration", DEFAULT_STATSD_DURATION),
+    const int64_t batch_arg = args.GetIntArg("-statsbatchsize", DEFAULT_STATSD_BATCH_SIZE);
+    const int64_t dur_arg   = args.GetIntArg("-statsduration", DEFAULT_STATSD_DURATION);
+    if (batch_arg < 1) return util::Error{_("statsbatchsize must be >= 1")};
+    if (dur_arg < 0)   return util::Error{_("statsduration must be >= 0")};
+    auto statsd_ptr = std::make_unique<StatsdClientImpl>(
+        host, port, use_tcp, static_cast<uint64_t>(batch_arg),
+        static_cast<uint64_t>(dur_arg),
         sanitize_string(args.GetArg("-statsprefix", DEFAULT_STATSD_PREFIX)),
         sanitize_string(args.GetArg("-statssuffix", DEFAULT_STATSD_SUFFIX)), error_opt);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
std::optional<bilingual_str> error_opt;
auto statsd_ptr = std::make_unique<StatsdClientImpl>(
host, port, use_tcp, args.GetIntArg("-statsbatchsize", DEFAULT_STATSD_BATCH_SIZE),
args.GetIntArg("-statsduration", DEFAULT_STATSD_DURATION),
sanitize_string(args.GetArg("-statsprefix", DEFAULT_STATSD_PREFIX)),
sanitize_string(args.GetArg("-statssuffix", DEFAULT_STATSD_SUFFIX)), error_opt);
if (error_opt.has_value()) {
std::optional<bilingual_str> error_opt;
const int64_t batch_arg = args.GetIntArg("-statsbatchsize", DEFAULT_STATSD_BATCH_SIZE);
const int64_t dur_arg = args.GetIntArg("-statsduration", DEFAULT_STATSD_DURATION);
if (batch_arg < 1) return util::Error{_("statsbatchsize must be >= 1")};
if (dur_arg < 0) return util::Error{_("statsduration must be >= 0")};
auto statsd_ptr = std::make_unique<StatsdClientImpl>(
host, port, use_tcp,
static_cast<uint64_t>(batch_arg),
static_cast<uint64_t>(dur_arg),
sanitize_string(args.GetArg("-statsprefix", DEFAULT_STATSD_PREFIX)),
sanitize_string(args.GetArg("-statssuffix", DEFAULT_STATSD_SUFFIX)),
error_opt);
if (error_opt.has_value()) {
🤖 Prompt for AI Agents
In src/stats/client.cpp around lines 161-167 the args.GetIntArg results for
-statsbatchsize and -statsduration are used where unsigned/64-bit values are
expected, which allows negative signed ints to wrap on cast; read each into a
signed type, validate they are within sane non-negative bounds (or clamp) and
check an upper limit to avoid overflow, and if invalid set error_opt with a
clear bilingual_str error and return/fail rather than passing a wrapped value;
only after validation perform a safe static_cast<uint64_t> (or use std::clamp)
before constructing StatsdClientImpl.

Comment on lines +54 to +62
// If there are threads, interrupt and stop it
if (m_reconn.joinable()) {
m_reconn_interrupt();
m_reconn.join();
}
// Flush queue of uncommitted messages
QueueFlush();
QueueFlush(m_reconn_queue);
QueueFlush(m_queue);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Bug: QueueFlush mutates the same vector it iterates (self-reenqueue under TCP)

QueueFlush(m_reconn_queue) can re-enqueue into m_reconn_queue via SendDirectly() on TCP-not-connected paths, invalidating the iteration and causing UB. Swap to locals before flushing.

-    // Flush queue of uncommitted messages
-    QueueFlush(m_reconn_queue);
-    QueueFlush(m_queue);
+    // Flush queues safely: swap into locals to avoid mutating while iterating
+    std::vector<RawMessage> q_reconn, q_main;
+    {
+        LOCK(cs);
+        m_reconn_queue.swap(q_reconn);
+        m_queue.swap(q_main);
+    }
+    QueueFlush(q_reconn);
+    QueueFlush(q_main);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// If there are threads, interrupt and stop it
if (m_reconn.joinable()) {
m_reconn_interrupt();
m_reconn.join();
}
// Flush queue of uncommitted messages
QueueFlush();
QueueFlush(m_reconn_queue);
QueueFlush(m_queue);
}
// If there are threads, interrupt and stop it
if (m_reconn.joinable()) {
m_reconn_interrupt();
m_reconn.join();
}
// Flush queues safely: swap into locals to avoid mutating while iterating
std::vector<RawMessage> q_reconn, q_main;
{
LOCK(cs);
m_reconn_queue.swap(q_reconn);
m_queue.swap(q_main);
}
QueueFlush(q_reconn);
QueueFlush(q_main);
}
🤖 Prompt for AI Agents
In src/stats/rawsender.cpp around lines 54 to 62, QueueFlush is called directly
on m_reconn_queue and m_queue but QueueFlush can call SendDirectly() which may
re-enqueue into the same vector and invalidate the iteration; to fix, swap each
member queue with a local empty container before calling QueueFlush (e.g.,
create a local vector, std::swap(local, m_reconn_queue), then call
QueueFlush(local)), so any re-enqueued items go into the original member queue
and do not affect the iteration; ensure this is done for both m_reconn_queue and
m_queue and preserve any necessary locking/ownership semantics around the swap.

Copy link

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 1e750cc

{
auto stats_client = StatsdClient::make(*m_node.args);
if (!stats_client) {
throw std::runtime_error{strprintf("Cannot init Statsd client (%s)", util::ErrorString(stats_client).original)};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The app should start whatever stats_client is initialized or not IMO.
It's not a critical failure.

nit: avoid code duplication between init.cpp & setup_common.cpp

argsman.AddArg("-statsduration=<ms>", strprintf("Specify the number of milliseconds between stats messages (default: %d)", DEFAULT_STATSD_DURATION), ArgsManager::ALLOW_ANY, OptionsCategory::STATSD);
argsman.AddArg("-statshost=<ip>", strprintf("Specify statsd host (default: %s)", DEFAULT_STATSD_HOST), ArgsManager::ALLOW_ANY, OptionsCategory::STATSD);
argsman.AddArg("-statsport=<port>", strprintf("Specify statsd port (default: %u)", DEFAULT_STATSD_PORT), ArgsManager::ALLOW_ANY, OptionsCategory::STATSD);
hidden_args.emplace_back("-statsport");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it should be different PRs; deprecation of statsport & refactorings.

std::string scheme{ToLower(host.substr(/*pos=*/0, scheme_idx))};
if (scheme != "udp") {
return util::Error{_("Unsupported URL scheme, must begin with udp://")};
if (scheme != "tcp" && scheme != "udp") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why possible do we need tcp support actually?

stats daemon is just a utility to collect metrics; a single lost message is not an issue at all; there's no requirements for guaranteed delivery; no exact numbers are needed.


while (!m_reconn_interrupt) {
std::deque<RawMessage> queue{};
std::vector<RawMessage> queue{};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::deque is a better choice for metrics; because metrics should be:

  • minimal overhead (in average)
  • minimum latency (for each request)
  • approximitally correct

std::vector call re-allocation of all internal data to the copy every time when size is over some certain threshold. For overall during a day (or even minute) it's completely fine; but in some particular moment it could be 50ms delay because need to re-allocation 100k objects to a new copy of data; and it breaks "low latency" requirement for metrics.

std::deque is a better fit because overall overhead is bigger than std::vector, but it doesn't have delays for ms in random moment of time

- The StatsD client now supports TCP hosts, TCP can be opted for with the new URL syntax supported by `-statshost` (e.g.
`tcp://127.0.0.1:8125`). UDP (`udp://`) remains the default and will be assumed if a protocol is not specified.

- Unlike with UDP, if a TCP connection is interrupted, stats will _not_ be dropped and instead buffered according to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will happen if queue of new messages is bigger than bandwidth to send statsd?

Also, if it is limited by firewall, not sure how stats service will behave. I vote to drop TCP support if there's no strong demand from users. @PastaPastaPasta @UdjinM6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants