Split long messages to avoid being kicked for excess flood #191

lenormf · 2021-04-15T06:36:04Z

Hi,

My bot sends messages to a channel whose length can sometimes be too long for the server to accept.

Would it be possible for irc3 (maybe via an opt-in mechanism) to split long message queries into several ones, to avoid the bot being kicked for excess flood? The message itself could be split on word boundaries.

This could theoretically be handled on the bot's side, but I don't think it would be very robust, as the bot doesn't know exactly what the resulting IRC query will look like (I assume something like PRIVMSG recipient :message), and thus not be able to know how many characters should be subtracted to the theoretical maximum message size (512 I believe).

HTH.

The text was updated successfully, but these errors were encountered:

gawel · 2021-04-15T06:46:12Z

Hi,

You can set max_length in the config file. Default value is 512: https://github.com/gawel/irc3/blob/master/irc3/base.py#L75

lenormf · 2021-04-15T06:50:41Z

Do servers communicate what maximum message length they accept? In that case, I would like irc3 to detect that, and split messages according to it.

gawel · 2021-04-15T07:12:33Z

It's possible. But the rfc say 512 is fine:

irc3/irc3/rfc2812.txt

Line 300 in 2d7d65c

IRC messages are always lines of characters terminated with a CR-LF

You may be able to track this for your server by checking bot.config['server_config']

See

irc3/irc3/plugins/core.py

Line 109 in 2d7d65c

def set_config(self, data=None, **kwargs):

lenormf · 2021-04-15T07:37:31Z

So, the server I'm connecting to is GameSurge.

Here is the configuration the server sends to the bot:

:NuclearFallout.WA.US.GameSurge.net 005 _nick WHOX WALLCHOPS WALLVOICES USERIP CPRIVMSG CNOTICE SILENCE=25 MODES=6 MAXCHANNELS=75 MAXBANS=100 NICKLEN=30 :are supported by this server
:NuclearFallout.WA.US.GameSurge.net 005 _nick MAXNICKLEN=30 TOPICLEN=300 AWAYLEN=200 KICKLEN=300 CHANNELLEN=200 MAXCHANNELLEN=200 CHANTYPES=#& PREFIX=(ov)@+ STATUSMSG=@+ CHANMODES=b,k,l,imnpstrDdRcC CASEMAPPING=rfc1459 NETWORK=GameSurge :are supported by this server

The bot subsequently prints these values as:

Server config: {'TOPICLEN': '300', 'MAXCHANNELS': '75', 'PREFIX': '(ov)@+', 'MAXCHANNELLEN': '200', 'CHANNELLEN': '200', 'MAXBANS': '100', 'CASEMAPPING': 'rfc1459', 'NETWORK': 'GameSurge', 'USERIP': True, 'WHOX': True, 'CHANMODES': 'b,k,l,imnpstrDdRcC', 'AWAYLEN': '200', 'MODES': '6', 'SILENCE': '25', 'STATUSMSG': '@+', 'MAXNICKLEN': '30', 'KICKLEN': '300', 'WALLVOICES': True, 'NICKLEN': '30', 'CPRIVMSG': True, 'WALLCHOPS': True, 'CHANTYPES': '#&', 'CNOTICE': True}

The 'max_length' option in bot.config is set to 512.

When the bot sends a message that is >512 characters long (I can see that message in the debug log), the server replies:

:NuclearFallout.WA.US.GameSurge.net 417 _nick :Input line was too long

That could mean that either:

irc3 is not splitting messages to make sure packets remain below 512 characters — if it's not supposed to at the moment, I'd like it to do so
GameSurge uses a maximum length lesser than 512 and doesn't communicate it to irc3 to split messages — I haven't tried to verify that, it's possible that GameSurge uses 512 as limit

HTH.

gawel · 2021-04-15T08:03:10Z

As I remember you can track what the bot receive/send by using irc3 -vdr config.ini

lenormf · 2021-04-16T07:54:49Z

After debugging the issue, I found two problems, and one suspicious thing:

the IrcBot.privmsg() function assumes that messages (i.e. words sent by users) must be self.config.max_length in length, when that number applies to the whole IRC message (i.e. PRIVMSG target :message\r\n) — other parts of the code that deal with protocol might have this issue as well, I didn't check

irc3/irc3/rfc2812.txt

Lines 272 to 274 in 2d7d65c

    
           Each IRC message may consist of up to three main parts: the prefix 
        
           (OPTIONAL), the command, and the command parameters (maximum of 
        
           fifteen (15)).  The prefix, command, and all parameters are separated

irc3/irc3/rfc2812.txt

Lines 300 to 304 in 2d7d65c

    
           IRC messages are always lines of characters terminated with a CR-LF 
        
           (Carriage Return - Line Feed) pair, and these messages SHALL NOT 
        
           exceed 512 characters in length, counting all characters including 
        
           the trailing CR-LF. Thus, there are 510 characters maximum allowed 
        
           for the command and its parameters.  There is no provision for

the length of the IRC message is supposed to be counted in bytes (ASCII or encoded UTF8), but utils.split_messages() works on unicode strings, which can throw off the count — I couldn't really find anything in the RFC about that after a cursory look, maybe some servers (other than GameSurge) are lenient and count codepoints?
the bot sends messages one after the other without any form of rate limiting, which results in it being kicked for excess flood, once the above problems are solved — I can see a flood_delay local configuration option, but from what I can tell the server doesn't seem to indicate what minimum delay should separate messages that it could be compared to

Here is a debugging patch that I used to diagnose the above issues:

diff --git a/irc3/__init__.py b/irc3/__init__.py
index 2cd5fe7..12ee2c9 100644
--- a/irc3/__init__.py
+++ b/irc3/__init__.py
@@ -179,6 +179,7 @@ class IrcBot(base.IrcObject):
 
     def send_line(self, data, nowait=False):
         """send a line to the server. replace CR by spaces"""
+        self.log.debug("DBG: (%d) %s", len(data), data)
         data = data.replace('\n', ' ').replace('\r', ' ')
         f = asyncio.Future(loop=self.loop)
         if self.queue is not None and nowait is False:
@@ -225,15 +226,23 @@ class IrcBot(base.IrcObject):
     def privmsg(self, target, message, nowait=False):
         """send a privmsg to target"""
         if message:
-            messages = utils.split_message(message, self.config.max_length)
             if isinstance(target, DCCChat):
+                messages = utils.split_message(message, self.config.max_length)
                 for message in messages:
                     target.send_line(message)
             elif target:
+                command_prefix = 'PRIVMSG %s :' % target
+                self.log.debug("MAX_LENGTH: %d" % self.config.max_length)
+                assert self.config.max_length > len(command_prefix) + 2
+                self.log.debug("SPLIT: %s" % message)
+                messages = utils.split_message(message, self.config.max_length - len(command_prefix) - 2, self.log)
+                self.log.debug("MESSAGES: %s", messages)
                 f = None
                 for message in messages:
-                    f = self.send_line('PRIVMSG %s :%s' % (target, message),
+                    self.log.debug("DBG: MSG (%d) %s", len(message), message)
+                    f = self.send_line('%s%s' % (command_prefix, message),
                                        nowait=nowait)
+                    break
                 return f
 
     def action(self, target, message, nowait=False):
diff --git a/irc3/utils.py b/irc3/utils.py
index 4f158f8..fb91620 100644
--- a/irc3/utils.py
+++ b/irc3/utils.py
@@ -164,8 +164,38 @@ class IrcString(BaseString):
 STRIPPED_CHARS = '\t '
 
 
-def split_message(message, max_length):
+def split_message(message, max_length, log=None):
     """Split long messages"""
+    def utf8_lead_byte(b):
+        '''A UTF-8 intermediate byte starts with the bits 10xxxxxx.'''
+        return (b & 0xC0) != 0x80
+
+    def utf8_byte_truncate(text, max_bytes):
+        '''If text[max_bytes] is not a lead byte, back up until a lead byte is
+        found and truncate before that character.'''
+        log.debug("IN")
+        utf8 = text.encode('utf8')
+        log.debug("utf8: %s", utf8)
+        if len(utf8) <= max_bytes:
+            log.debug("SPLIT: max bytes")
+            return utf8
+        log.debug("max_bytes: %d", max_bytes)
+        i = max_bytes
+        while i > 0 and not utf8_lead_byte(utf8[i]):
+            log.debug("I: %d", i)
+            i -= 1
+        log.debug("RET")
+        return utf8[:i]
+
+    if log is not None:
+        log.debug("SPLIT MSG: %d - %s", max_length, message)
+        s = utf8_byte_truncate(message, max_length)
+        log.debug("S: %s", s)
+        s = s.decode('utf8')
+        log.debug("DECODED")
+        yield s
+        return
+
     if len(message) > max_length:
         for message in textwrap.wrap(message, max_length):
             yield message

I used this implementation to split strings according to byte length: https://stackoverflow.com/a/43848928

HTH.

gawel · 2021-04-21T12:30:20Z

Thanks for digging into that. I don't like the idea of encoding/decoding the whole messages to... re-encode them before sending to the server. encoding/decoding are very CPU expensive AFAIK.

lenormf · 2021-05-14T04:48:46Z

Any chance we'll get a fix soon? My bot trips on half of the URLs (apparently the internet has decided that having 1k+ characters long sentences in the metadata is a good idea).

I understand the concern about the encoding round trips, however there doesn't seem to be any alternative that I can see.

gawel · 2021-05-14T08:15:58Z

I'll be ok with that if there is no need to re-decode the data. I'm pretty sure a few if isinstance(data, bytes) will be faster than decoding/encoding the whole messages multiple times. Don't know how hard it is to add those if and how many of them are required.

I'll try to give it a try soon...

gawel · 2021-05-14T10:00:23Z

For now I've added a prefix argument to split_message (inspired by your snippet). This should reduce your amount of errors. Maybe it's enough to "fix" the problem if you set max_length to an appropriate value to allow some extra unicode characters.. Maybe 500 instead of 512.

Looks like supporting both unicode and bytes in the whole process will not be easy at all..

gawel closed this as completed Apr 15, 2021

gawel reopened this Apr 15, 2021

gawel added a commit that referenced this issue May 14, 2021

take care of command lentgh to split messages. see #191

53b110f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split long messages to avoid being kicked for excess flood #191

Split long messages to avoid being kicked for excess flood #191

lenormf commented Apr 15, 2021

gawel commented Apr 15, 2021

lenormf commented Apr 15, 2021

gawel commented Apr 15, 2021

lenormf commented Apr 15, 2021

gawel commented Apr 15, 2021

lenormf commented Apr 16, 2021

gawel commented Apr 21, 2021

lenormf commented May 14, 2021

gawel commented May 14, 2021

gawel commented May 14, 2021

Split long messages to avoid being kicked for excess flood #191

Split long messages to avoid being kicked for excess flood #191

Comments

lenormf commented Apr 15, 2021

gawel commented Apr 15, 2021

lenormf commented Apr 15, 2021

gawel commented Apr 15, 2021

lenormf commented Apr 15, 2021

gawel commented Apr 15, 2021

lenormf commented Apr 16, 2021

gawel commented Apr 21, 2021

lenormf commented May 14, 2021

gawel commented May 14, 2021

gawel commented May 14, 2021