Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split long messages to avoid being kicked for excess flood #191

Open
lenormf opened this issue Apr 15, 2021 · 10 comments
Open

Split long messages to avoid being kicked for excess flood #191

lenormf opened this issue Apr 15, 2021 · 10 comments

Comments

@lenormf
Copy link

lenormf commented Apr 15, 2021

Hi,

My bot sends messages to a channel whose length can sometimes be too long for the server to accept.

Would it be possible for irc3 (maybe via an opt-in mechanism) to split long message queries into several ones, to avoid the bot being kicked for excess flood? The message itself could be split on word boundaries.

This could theoretically be handled on the bot's side, but I don't think it would be very robust, as the bot doesn't know exactly what the resulting IRC query will look like (I assume something like PRIVMSG recipient :message), and thus not be able to know how many characters should be subtracted to the theoretical maximum message size (512 I believe).

HTH.

@gawel
Copy link
Owner

gawel commented Apr 15, 2021

Hi,

You can set max_length in the config file. Default value is 512: https://github.com/gawel/irc3/blob/master/irc3/base.py#L75

@gawel gawel closed this as completed Apr 15, 2021
@lenormf
Copy link
Author

lenormf commented Apr 15, 2021

Do servers communicate what maximum message length they accept? In that case, I would like irc3 to detect that, and split messages according to it.

@gawel
Copy link
Owner

gawel commented Apr 15, 2021

It's possible. But the rfc say 512 is fine:

IRC messages are always lines of characters terminated with a CR-LF

You may be able to track this for your server by checking bot.config['server_config']

See

def set_config(self, data=None, **kwargs):

@lenormf
Copy link
Author

lenormf commented Apr 15, 2021

So, the server I'm connecting to is GameSurge.

Here is the configuration the server sends to the bot:

:NuclearFallout.WA.US.GameSurge.net 005 _nick WHOX WALLCHOPS WALLVOICES USERIP CPRIVMSG CNOTICE SILENCE=25 MODES=6 MAXCHANNELS=75 MAXBANS=100 NICKLEN=30 :are supported by this server
:NuclearFallout.WA.US.GameSurge.net 005 _nick MAXNICKLEN=30 TOPICLEN=300 AWAYLEN=200 KICKLEN=300 CHANNELLEN=200 MAXCHANNELLEN=200 CHANTYPES=#& PREFIX=(ov)@+ STATUSMSG=@+ CHANMODES=b,k,l,imnpstrDdRcC CASEMAPPING=rfc1459 NETWORK=GameSurge :are supported by this server

The bot subsequently prints these values as:

Server config: {'TOPICLEN': '300', 'MAXCHANNELS': '75', 'PREFIX': '(ov)@+', 'MAXCHANNELLEN': '200', 'CHANNELLEN': '200', 'MAXBANS': '100', 'CASEMAPPING': 'rfc1459', 'NETWORK': 'GameSurge', 'USERIP': True, 'WHOX': True, 'CHANMODES': 'b,k,l,imnpstrDdRcC', 'AWAYLEN': '200', 'MODES': '6', 'SILENCE': '25', 'STATUSMSG': '@+', 'MAXNICKLEN': '30', 'KICKLEN': '300', 'WALLVOICES': True, 'NICKLEN': '30', 'CPRIVMSG': True, 'WALLCHOPS': True, 'CHANTYPES': '#&', 'CNOTICE': True}

The 'max_length' option in bot.config is set to 512.

When the bot sends a message that is >512 characters long (I can see that message in the debug log), the server replies:

:NuclearFallout.WA.US.GameSurge.net 417 _nick :Input line was too long

That could mean that either:

  • irc3 is not splitting messages to make sure packets remain below 512 characters — if it's not supposed to at the moment, I'd like it to do so
  • GameSurge uses a maximum length lesser than 512 and doesn't communicate it to irc3 to split messages — I haven't tried to verify that, it's possible that GameSurge uses 512 as limit

HTH.

@gawel
Copy link
Owner

gawel commented Apr 15, 2021

As I remember you can track what the bot receive/send by using irc3 -vdr config.ini

@gawel gawel reopened this Apr 15, 2021
@lenormf
Copy link
Author

lenormf commented Apr 16, 2021

After debugging the issue, I found two problems, and one suspicious thing:

  • the IrcBot.privmsg() function assumes that messages (i.e. words sent by users) must be self.config.max_length in length, when that number applies to the whole IRC message (i.e. PRIVMSG target :message\r\n) — other parts of the code that deal with protocol might have this issue as well, I didn't check

irc3/irc3/rfc2812.txt

Lines 272 to 274 in 2d7d65c

Each IRC message may consist of up to three main parts: the prefix
(OPTIONAL), the command, and the command parameters (maximum of
fifteen (15)). The prefix, command, and all parameters are separated

irc3/irc3/rfc2812.txt

Lines 300 to 304 in 2d7d65c

IRC messages are always lines of characters terminated with a CR-LF
(Carriage Return - Line Feed) pair, and these messages SHALL NOT
exceed 512 characters in length, counting all characters including
the trailing CR-LF. Thus, there are 510 characters maximum allowed
for the command and its parameters. There is no provision for

  • the length of the IRC message is supposed to be counted in bytes (ASCII or encoded UTF8), but utils.split_messages() works on unicode strings, which can throw off the count — I couldn't really find anything in the RFC about that after a cursory look, maybe some servers (other than GameSurge) are lenient and count codepoints?

  • the bot sends messages one after the other without any form of rate limiting, which results in it being kicked for excess flood, once the above problems are solved — I can see a flood_delay local configuration option, but from what I can tell the server doesn't seem to indicate what minimum delay should separate messages that it could be compared to

Here is a debugging patch that I used to diagnose the above issues:

diff --git a/irc3/__init__.py b/irc3/__init__.py
index 2cd5fe7..12ee2c9 100644
--- a/irc3/__init__.py
+++ b/irc3/__init__.py
@@ -179,6 +179,7 @@ class IrcBot(base.IrcObject):
 
     def send_line(self, data, nowait=False):
         """send a line to the server. replace CR by spaces"""
+        self.log.debug("DBG: (%d) %s", len(data), data)
         data = data.replace('\n', ' ').replace('\r', ' ')
         f = asyncio.Future(loop=self.loop)
         if self.queue is not None and nowait is False:
@@ -225,15 +226,23 @@ class IrcBot(base.IrcObject):
     def privmsg(self, target, message, nowait=False):
         """send a privmsg to target"""
         if message:
-            messages = utils.split_message(message, self.config.max_length)
             if isinstance(target, DCCChat):
+                messages = utils.split_message(message, self.config.max_length)
                 for message in messages:
                     target.send_line(message)
             elif target:
+                command_prefix = 'PRIVMSG %s :' % target
+                self.log.debug("MAX_LENGTH: %d" % self.config.max_length)
+                assert self.config.max_length > len(command_prefix) + 2
+                self.log.debug("SPLIT: %s" % message)
+                messages = utils.split_message(message, self.config.max_length - len(command_prefix) - 2, self.log)
+                self.log.debug("MESSAGES: %s", messages)
                 f = None
                 for message in messages:
-                    f = self.send_line('PRIVMSG %s :%s' % (target, message),
+                    self.log.debug("DBG: MSG (%d) %s", len(message), message)
+                    f = self.send_line('%s%s' % (command_prefix, message),
                                        nowait=nowait)
+                    break
                 return f
 
     def action(self, target, message, nowait=False):
diff --git a/irc3/utils.py b/irc3/utils.py
index 4f158f8..fb91620 100644
--- a/irc3/utils.py
+++ b/irc3/utils.py
@@ -164,8 +164,38 @@ class IrcString(BaseString):
 STRIPPED_CHARS = '\t '
 
 
-def split_message(message, max_length):
+def split_message(message, max_length, log=None):
     """Split long messages"""
+    def utf8_lead_byte(b):
+        '''A UTF-8 intermediate byte starts with the bits 10xxxxxx.'''
+        return (b & 0xC0) != 0x80
+
+    def utf8_byte_truncate(text, max_bytes):
+        '''If text[max_bytes] is not a lead byte, back up until a lead byte is
+        found and truncate before that character.'''
+        log.debug("IN")
+        utf8 = text.encode('utf8')
+        log.debug("utf8: %s", utf8)
+        if len(utf8) <= max_bytes:
+            log.debug("SPLIT: max bytes")
+            return utf8
+        log.debug("max_bytes: %d", max_bytes)
+        i = max_bytes
+        while i > 0 and not utf8_lead_byte(utf8[i]):
+            log.debug("I: %d", i)
+            i -= 1
+        log.debug("RET")
+        return utf8[:i]
+
+    if log is not None:
+        log.debug("SPLIT MSG: %d - %s", max_length, message)
+        s = utf8_byte_truncate(message, max_length)
+        log.debug("S: %s", s)
+        s = s.decode('utf8')
+        log.debug("DECODED")
+        yield s
+        return
+
     if len(message) > max_length:
         for message in textwrap.wrap(message, max_length):
             yield message

I used this implementation to split strings according to byte length: https://stackoverflow.com/a/43848928

HTH.

@gawel
Copy link
Owner

gawel commented Apr 21, 2021

Thanks for digging into that. I don't like the idea of encoding/decoding the whole messages to... re-encode them before sending to the server. encoding/decoding are very CPU expensive AFAIK.

@lenormf
Copy link
Author

lenormf commented May 14, 2021

Any chance we'll get a fix soon? My bot trips on half of the URLs (apparently the internet has decided that having 1k+ characters long sentences in the metadata is a good idea).

I understand the concern about the encoding round trips, however there doesn't seem to be any alternative that I can see.

@gawel
Copy link
Owner

gawel commented May 14, 2021

I'll be ok with that if there is no need to re-decode the data. I'm pretty sure a few if isinstance(data, bytes) will be faster than decoding/encoding the whole messages multiple times. Don't know how hard it is to add those if and how many of them are required.

I'll try to give it a try soon...

@gawel
Copy link
Owner

gawel commented May 14, 2021

For now I've added a prefix argument to split_message (inspired by your snippet). This should reduce your amount of errors. Maybe it's enough to "fix" the problem if you set max_length to an appropriate value to allow some extra unicode characters.. Maybe 500 instead of 512.

Looks like supporting both unicode and bytes in the whole process will not be easy at all..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants