Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 characters are dropped in the database #16

Open
dajoha opened this issue Aug 17, 2020 · 1 comment
Open

UTF-8 characters are dropped in the database #16

dajoha opened this issue Aug 17, 2020 · 1 comment

Comments

@dajoha
Copy link

dajoha commented Aug 17, 2020

I have the given folder in my filesystem:

/home/io/projets/fanfare-ligugé

But in the ASH sqlite database, the accentuated character é of the folder name has been dropped everywhere.

I.e., the cwd value is:

/home/io/projets/fanfare-ligug

The following columns of the table commands have this kind of issue:

  • cwd
  • command

I have checked many entries in the table commands, and I can notice that all accentuated characters have been dropped.

@dajoha
Copy link
Author

dajoha commented Feb 5, 2021

I managed to fix the issue by tweaking the method DBObject::quote() in database.cpp, but I'm not good in C++, and I guess this is not the right way to handle utf-8 characters...

diff --git a/src/database.cpp b/src/database.cpp
index 08c47e8..82059e6 100644
--- a/src/database.cpp
+++ b/src/database.cpp
@@ -457,28 +457,28 @@ const string DBObject::quote(const char * value) {
 /**
  * Returns a quoted string suitable for insertion into the DB.
  * Converts an empty string to null.  Removes unprintable characters.
  * Replaces all single-quotes with double single quotes in the output string.
  */
 const string DBObject::quote(const string & in) {
   if (in.empty()) return "null";
   string out = "'";
-  char c;
+  unsigned char c;
   for (string::const_iterator i = in.begin(), e = in.end(); i != e; ++i) {
     c = *i;
     switch (c) {
       case '\n':  // fallthrough
       case '\t':
         out.push_back(c);
         break;
       case '\'':
         out.push_back(c); // fallthrough
       default:
-        if (isprint(c)) out.push_back(c);
+        if (c >= 0x20 && c != 0x7f) out.push_back(c);
     }
   }
   out.push_back('\'');
   return out;
 }

Would this patch be eligible for a PR?
Or what else could be done in order to fix this issue?

@dajoha dajoha changed the title Accentuated characters are dropped in the database UTF-8 characters are dropped in the database Feb 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant