Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lack of IO error checks generate incorrect file checksums #215

Open
spbooth opened this issue Jul 27, 2023 · 4 comments
Open

Lack of IO error checks generate incorrect file checksums #215

spbooth opened this issue Jul 27, 2023 · 4 comments

Comments

@spbooth
Copy link
Contributor

spbooth commented Jul 27, 2023

The routine globus_l_gass_copy_cksm_file() in globus_gass_copy_glob.c
has a file read loop
while((n = read(fd,buf,count)) > 0)
If an IO error occurs in the read call it will return -1 and the loop will terminate early generating an incorrect digest for the file but no error report until the subsequent file transfer fails its checksum test.

On a related note if you check for errors properly you discover that a recursive globus-url-copy attempts to calculate checksums on directory-urls using this routine (which always generates a error in the read call)

My POC fix is

static
globus_result_t
globus_l_gass_copy_cksm_file(
    globus_gass_copy_handle_t *         handle,
    char *                              url,
    char *                              cksm,
    globus_off_t                        offset,
    globus_off_t                        length,
    const char *                        algorithm,
    globus_gass_copy_callback_t         callback,
    void *                              callback_arg)
{
    char *      myname = "globus_l_gass_copy_cksm_file";

    globus_url_t                        parsed_url;
    globus_result_t                     result;
    int                                 rc;

    EVP_MD_CTX *                        mdctx;
    const EVP_MD *                      md;
    char *                              cksmptr;
    unsigned char                       md_value[EVP_MAX_MD_SIZE];
    char                                cksm_buff[CKSM_SIZE];
    unsigned int                        md_len;

    char                                buf[GASS_COPY_CKSM_BUFSIZE];

    int                                 i;
    int                                 fd;
    int                                 n;
    ssize_t                             readlen;
    globus_off_t                        count;
    globus_off_t                        read_left;
    struct stat                         statbuf;
    int                                 is_regular;

    rc = globus_url_parse_loose(url, &parsed_url);
    if(rc != 0)
    {
        result = globus_error_put(
            globus_error_construct_string(
                GLOBUS_GASS_COPY_MODULE,
                GLOBUS_NULL,
                "[%s]: error parsing url: "
                "globus_url_parse returned %d",
                myname,
                url));
        goto error_url;
    }

    if(parsed_url.url_path == GLOBUS_NULL)
    {
        result = globus_error_put(
            globus_error_construct_string(
                GLOBUS_GASS_COPY_MODULE,
                GLOBUS_NULL,
                "[%s]: error parsing url: "
                "url has no path",
                myname));
        goto error_fd;
    }    
       
    read_left = length;
    if(length >= 0)
    {
        count = (read_left > GASS_COPY_CKSM_BUFSIZE) ? 
            GASS_COPY_CKSM_BUFSIZE : read_left;
    }
    else
    {
        count = GASS_COPY_CKSM_BUFSIZE;
    }
    
    fd = open(parsed_url.url_path, O_RDONLY);        
    if(fd < 0)
    {
        result = globus_error_put(
            globus_error_construct_string(
                GLOBUS_GASS_COPY_MODULE,
                GLOBUS_NULL,
                "[%s]: error opening checksum file %s",
                myname,
                parsed_url.url_path));
        goto error_fd;
    }
    fstat(fd,&statbuf);
    is_regular = S_ISREG(statbuf.st_mode);    

    if (is_regular && lseek(fd, offset, SEEK_SET) == -1)
    {
        result = globus_error_put(
            globus_error_construct_string(
                GLOBUS_GASS_COPY_MODULE,
                GLOBUS_NULL,
                "[%s]: error (%d) seeking checksum file %s",
                myname,
                errno,
                parsed_url.url_path));
        goto error_seek;
    }

    OpenSSL_add_all_algorithms();
    md = EVP_get_digestbyname(algorithm);
    if (!md)
    {
        /*
         * try again with uppercase algorithm name.
         */
        char *                          p;
        char *                          alg = strdup(algorithm);
        for(p = alg; *p != '\0'; p++)
        {
            *p = toupper(*p);
        }
        md = EVP_get_digestbyname(alg);
        free(alg);
    }
    if (!md)
    {
        result = globus_error_put(globus_error_construct_string(
                GLOBUS_GASS_COPY_MODULE,
                GLOBUS_NULL, 
                "Unable to use checksum algorithm %s", 
                algorithm));
        goto error_seek;
    }

    mdctx = EVP_MD_CTX_create();
    EVP_DigestInit_ex(mdctx, md, NULL);

    if( is_regular )
    {
        while((n = (readlen = read(fd, buf, count))) > 0)
        {
            if(length >= 0)
            {
                read_left -= n;
                count = (read_left > GASS_COPY_CKSM_BUFSIZE) ? GASS_COPY_CKSM_BUFSIZE : read_left;
            }

            EVP_DigestUpdate(mdctx, buf, n);
        }
        if( readlen < 0 && count > 0 ){
            result = globus_error_put(globus_error_construct_string(
                    GLOBUS_GASS_COPY_MODULE,
                    GLOBUS_NULL,
                    "Read error in checksum calculation url=%s read_left =%d %d %s",
                    url,read_left,readlen, strerror(errno)));
            goto error_seek;
        }
    }

    EVP_DigestFinal_ex(mdctx, md_value, &md_len);
    EVP_MD_CTX_destroy(mdctx);
    
    close(fd);
        
    cksmptr = cksm_buff;
    for (i = 0; i < md_len; i++)
    {
       sprintf(cksmptr, "%02x", md_value[i]);
       cksmptr++;
       cksmptr++;
    }
    *cksmptr = '\0';
    
    strncpy(cksm, cksm_buff, sizeof(cksm_buff));
    //globus_libc_fprintf(stderr,"SPB file-checksum %s=%s\n",url,cksm); 
    globus_url_destroy(&parsed_url);
    
    if(callback)
    {
        callback(callback_arg, handle, NULL);
    }
    return GLOBUS_SUCCESS;

error_seek:
    close(fd);
error_fd:
    globus_url_destroy(&parsed_url);

error_url:

    return result;
}

But it probably needs a bit of tidying up.

@fscheiner
Copy link
Member

fscheiner commented Jul 27, 2023

@spbooth
Thanks for bringing this issue to our attention. I took the liberty and reformated your post a little. As patch it would boil down to:

--- <unnamed>
+++ <unnamed>
@@ -28,8 +28,11 @@
     int                                 i;
     int                                 fd;
     int                                 n;
+    ssize_t                             readlen;
     globus_off_t                        count;
     globus_off_t                        read_left;
+    struct stat                         statbuf;
+    int                                 is_regular;
 
     rc = globus_url_parse_loose(url, &parsed_url);
     if(rc != 0)
@@ -80,8 +83,10 @@
                 parsed_url.url_path));
         goto error_fd;
     }
+    fstat(fd,&statbuf);
+    is_regular = S_ISREG(statbuf.st_mode);    
 
-    if (lseek(fd, offset, SEEK_SET) == -1)
+    if (is_regular && lseek(fd, offset, SEEK_SET) == -1)
     {
         result = globus_error_put(
             globus_error_construct_string(
@@ -123,15 +128,26 @@
     mdctx = EVP_MD_CTX_create();
     EVP_DigestInit_ex(mdctx, md, NULL);
 
-    while((n = read(fd, buf, count)) > 0)
+    if( is_regular )
     {
-        if(length >= 0)
+        while((n = (readlen = read(fd, buf, count))) > 0)
         {
-            read_left -= n;
-            count = (read_left > GASS_COPY_CKSM_BUFSIZE) ? GASS_COPY_CKSM_BUFSIZE : read_left;
+            if(length >= 0)
+            {
+                read_left -= n;
+                count = (read_left > GASS_COPY_CKSM_BUFSIZE) ? GASS_COPY_CKSM_BUFSIZE : read_left;
+            }
+
+            EVP_DigestUpdate(mdctx, buf, n);
         }
-
-        EVP_DigestUpdate(mdctx, buf, n);
+        if( readlen < 0 && count > 0 ){
+            result = globus_error_put(globus_error_construct_string(
+                    GLOBUS_GASS_COPY_MODULE,
+                    GLOBUS_NULL,
+                    "Read error in checksum calculation url=%s read_left =%d %d %s",
+                    url,read_left,readlen, strerror(errno)));
+            goto error_seek;
+        }
     }
 
     EVP_DigestFinal_ex(mdctx, md_value, &md_len);
@@ -149,7 +165,7 @@
     *cksmptr = '\0';
     
     strncpy(cksm, cksm_buff, sizeof(cksm_buff));
-    
+    //globus_libc_fprintf(stderr,"SPB file-checksum %s=%s\n",url,cksm); 
     globus_url_destroy(&parsed_url);
     
     if(callback)

Would you maybe like to also provide a pull request for this change so we can run it through our CI and do a complete review?


And BTW, is this part of the "sync" functionality of globus-url-copy or can this be accessed independently?

@spbooth
Copy link
Contributor Author

spbooth commented Jul 28, 2023

Its definately part of -verify-checksum. It might also be part of -sync

I'll make a pull request soon. Might be worth re-writing the checksum to use buffered io as well it might speed up the checksum phase and would be more consistent with the rest of the file.

@fscheiner
Copy link
Member

Its definately part of -verify-checksum. It might also be part of -sync

Looks like our documentation over at https://gridcf.org/gct-docs/ is incomplete in this regard, at least the following two globus-url-copy options are missing:

[...]
-checksum-alg <checksum algorithm>
       Set the algorithm type to use for all checksum operations during the
       transfer.  The default algorithm is MD5.
-verify-checksum
       Specify to perform a checksum on the source and destination after each
       file transfer and compare the two.  If they do not match, fail the
       transfer.

I'll make a pull request soon.

Looking forward to that.

Might be worth re-writing the checksum to use buffered io as well it might speed up the checksum phase and would be more consistent with the rest of the file.

Might be a good idea, though at least for non-parallel MD5 I recently only got up to 483 MiB/s for a file on a Lustre FS. This is already limiting for 10 Gbps. What they developed on https://dzone.com/articles/parallelizing-md5-checksum-computation-to-speed-up looks more promising.

@spbooth
Copy link
Contributor Author

spbooth commented Jul 31, 2023

Ive submitted a pull request. In the end I stayes with unbuffered io as the checksum buffer is pretty big anyway but I switched to the gass stat function to keep the code portable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants