Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The report-sproxyd-keys:basic does not handle timeouts in requests properly #7

Open
ghost opened this issue Dec 27, 2021 · 1 comment

Comments

@ghost
Copy link

ghost commented Dec 27, 2021

The report_sproxyd.py when listbuckets() is run has a 30 second timeout. When this timeout occurs and bucketd does not respond the listbucket() operation does not handle this, and breaks out of the while loop because there is no error response of 500 or 404. This appears to them move onto the next bucket and have no handling to ensure each bucket gets listed properly.

Error:

File "/home/s3/report_sproxyd.py", line 176 in run
session, key, versionid)
File "/home/s3/report_sproxyd.py", line 94, in listbucket
r = session.get(url, timeout=30, verify=False)

<SNIP>
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool )hots='127.0.0.1', port=9000) : Read timed out. (read timeout=30)

The block for line 176 is:

    def run(self):
        """main function to be passed to the Threading class"""
        total_size = 0                          
        files = 0               
        payload = True
        key = ""      
        versionid = ""
        while payload:                         
            while 1:
                session = requests.Session()
                session.headers.update({"x-scal-request-uids":"utapi-reindex-buckets"})
                error, payload = self.listbucket(
                    session, key, versionid)
                if error == 500:                 
                    time.sleep(15)                                    
                else:
                    break
            if error == 404:
                break
            key, skeys, versionid = self.retkeys(payload)
            self.skeyg += skeys   
        return(self.userid, self.bucket, self.skeyg)

The block for line 94 is:

    def listbucket(self, session=None, marker="", versionmarker=""):
        """function to list the contains of the buckets"""
        m = marker.encode('utf8')              
        mark = urllib.parse.quote(m)
        params = "%s?listingType=Basic&maxKeys=1000&gt=%s" % (
            self.bucket, mark)                                                         
        url = "%s/default/bucket/%s" % (self._bucketd, params)
        r = session.get(url, timeout=30, verify=False)
        if r.status_code == 200:                 
            r.encoding = 'utf-8'                                      
            payload = json.loads(r.text)
            return (r.status_code, payload)
        else:               
            return (r.status_code, "")

The issue can be observed by comparing the content of the keys.txt output file and the total number of buckets in s3api:

# awk -F';' '{print $1}' clean_np_keys.txt| sort -u | wc -l
56
# listbuckets=$(aws --endpoint-url=https://obs-are.allstate.com s3api list-buckets --query "Buckets[].Name" | sort -u | wc -l) ; echo $(( ${listbuckets} - 2 ))
83

The quantity of buckets from s3api shows 83 buckets. The quantity of buckets from report-sproxyd-keys:basic only contains 56 buckets.

  • What appears to be causing this issue is empty S3 buckets. This appears to cause the listing keys of the bucket to fail.
  • Any bucket that is not empty but also exhibits a 30 second timeout in bucketd would lead to:
    • The listkeys.csv data would have all keys from all nodes
    • The report-sproxyd-keys:basic keys.txt data would not contain keys for buckets that timeout during listbucket().
    • Resulting in:
      • The P0/P1 scripts will perform the left anti join finding objects in listkeys.csv but not an associated key in bucketd. This will lead to the P4 script considering all keys where the bucketd timeout occured as S3 orphans and attempt to delete them.
@ghost
Copy link
Author

ghost commented Dec 27, 2021

@apcurrier Please provide any details you think are pertinent or corrections if I incorrectly stated anything.

@patrickdos Do you confirm? Or am I missing how report_sproxyd.py would work around this "if" a timeout happens for a bucket that is not empty?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants