hung on unmount operation #88

jdef · 2016-04-02T20:07:36Z

I created a task in Marathon, watched it come up, let it run for a few min, then suspended the app (sets instances to zero). I ended up with a hung unmount op on the slave the task was running on. The task appears as KILLED in mesos. I have a single slave node cluster. Now I can't launch a new task trying to use a (different) rexray vol. Attempting to do so results in a task stuck in STAGING. So this hung unmount op is blocking somewhere.

logs:

Apr 02 19:23:11 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:11.981403  1134 slave.cpp:3528] executor(1)@10.0.0.249:53756 exited
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:12.070981  1136 containerizer.cpp:1608] Executor for container '05ca518e-1fe1-4fee-bc2f-abfbb0cb9fa9' has exited
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:12.071041  1136 containerizer.cpp:1392] Destroying container '05ca518e-1fe1-4fee-bc2f-abfbb0cb9fa9'
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:12.072430  1133 cgroups.cpp:2427] Freezing cgroup /sys/fs/cgroup/freezer/mesos/05ca518e-1fe1-4fee-bc2f-abfbb0cb9fa9
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:12.073606  1132 cgroups.cpp:1409] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/05ca518e-1fe1-4fee-bc2f-abfbb0cb9fa9
 after 1.126912ms
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:12.074779  1139 cgroups.cpp:2445] Thawing cgroup /sys/fs/cgroup/freezer/mesos/05ca518e-1fe1-4fee-bc2f-abfbb0cb9fa9
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:12.075850  1139 cgroups.cpp:1438] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/05ca518e-1fe1-4fee-bc2f-abfbb0cb9f
a9 after 1.031936ms
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:12.076619  1133 docker_volume_driver_isolator.cpp:356] rexray/test is being unmounted on cleanup()
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal mesos-slave[1129]: I0402 19:23:12.078660  1133 docker_volume_driver_isolator.cpp:362] Invoking /opt/mesosphere/bin/dvdcli unmount --volumedriver=rexray --vo
lumename=test
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:12Z" level=info msg=vdm.Create driverName=docker moduleName=default-docker opts=map[] volumeName=test
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:12Z" level=info msg="initialized count" count=0 moduleName=default-docker volumeName=test
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:12Z" level=info msg="creating volume" driverName=docker moduleName=default-docker volumeName=test volumeOpts=map[]
Apr 02 19:23:12 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:12Z" level=info msg=sdm.GetVolume driverName=ec2 moduleName=default-docker volumeID= volumeName=test
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg=vdm.Unmount driverName=docker moduleName=default-docker volumeID= volumeName=test
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg="initialized count" count=0 moduleName=default-docker volumeName=test
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg="unmounting volume" driverName=docker moduleName=default-docker volumeID= volumeName=test
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg=sdm.GetVolume driverName=ec2 moduleName=default-docker volumeID= volumeName=test
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg=sdm.GetVolumeAttach driverName=ec2 instanceID=i-8677785c moduleName=default-docker volumeID=vol-23fe
679a
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg=odm.GetMounts deviceName="/dev/xvdc" driverName=linux moduleName=default-docker mountPoint=
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg=odm.Unmount driverName=linux moduleName=default-docker mountPoint="/var/lib/rexray/volumes/test"
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg=sdm.DetachVolume driverName=ec2 instanceID= moduleName=default-docker runAsync=false volumeID=vol-23
fe679a
Apr 02 19:23:13 ip-10-0-0-249.us-west-2.compute.internal rexray[931]: time="2016-04-02T19:23:13Z" level=info msg="waiting for volume detachment to complete" driverName=ec2 force=false moduleName=default-docker run
Async=false volumeID=vol-23fe679a
Apr 02 19:23:14 ip-10-0-0-249.us-west-2.compute.internal kernel: vbd vbd-51744: 16 Device in use; refusing to close

/proc/mounts:

core@ip-10-0-0-249 ~ $ cat /proc/mounts
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
devtmpfs /dev devtmpfs rw,nosuid,size=7688856k,nr_inodes=1922214,mode=755 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,nodev,mode=755 0 0
tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,mode=755 0 0
cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd 0 0
pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0
/dev/xvda9 / ext4 rw,relatime,data=ordered 0 0
/dev/xvda3 /usr ext4 ro,relatime 0 0
mqueue /dev/mqueue mqueue rw,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0
tmpfs /media tmpfs rw,nosuid,nodev,noexec,relatime 0 0
systemd-1 /boot autofs rw,relatime,fd=32,pgrp=1,timeout=0,minproto=5,maxproto=5,direct 0 0
tmpfs /tmp tmpfs rw 0 0
hugetlbfs /dev/hugepages hugetlbfs rw,relatime 0 0
xenfs /proc/xen xenfs rw,relatime 0 0
/dev/xvda6 /usr/share/oem ext4 rw,nodev,relatime,commit=600,data=ordered 0 0
/dev/xvda1 /boot vfat rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,errors=remount-ro 0 0
/dev/xvdb /var/lib ext4 rw,relatime,data=ordered 0 0
/dev/xvdc /tmp/test-rexray-volume ext4 rw,relatime,data=ordered 0 0

The text was updated successfully, but these errors were encountered:

jdef · 2016-04-02T20:12:29Z

forcibly killing the dvdcli unmount process appears to unblock things, but is (obviously) not an ideal solution.

jdef · 2016-04-02T20:14:36Z

... and this appears to confuse the system, which now has mounted two devices to the same point in the filesystem:

/dev/xvda6 /usr/share/oem ext4 rw,nodev,relatime,commit=600,data=ordered 0 0
/dev/xvda1 /boot vfat rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,errors=remount-ro 0 0
/dev/xvdb /var/lib ext4 rw,relatime,data=ordered 0 0
/dev/xvdc /tmp/test-rexray-volume ext4 rw,relatime,data=ordered 0 0
/dev/xvdd /var/lib/rexray/volumes/jdef-test-vol ext4 rw,relatime,data=ordered 0 0
/dev/xvdd /tmp/test-rexray-volume ext4 rw,relatime,data=ordered 0 0

jdef · 2016-04-02T20:18:16Z

second volume was created with these specs (should have been a new vol since it was not preexisting):

I0402 20:11:02.853814  1135 docker_volume_driver_isolator.cpp:524] DVDI_VOLUME_CONTAINERPATH(/tmp/test-rexray-volume) parsed from environment
I0402 20:11:02.853844  1135 docker_volume_driver_isolator.cpp:524] DVDI_VOLUME_NAME(jdef-test-vol) parsed from environment
I0402 20:11:02.853869  1135 docker_volume_driver_isolator.cpp:524] DVDI_VOLUME_DRIVER(rexray) parsed from environment
I0402 20:11:02.853893  1135 docker_volume_driver_isolator.cpp:524] DVDI_VOLUME_OPTS(size=100) parsed from environment

jdef · 2016-04-02T20:19:42Z

rexray config:

core@ip-10-0-0-249 ~ $ cat /etc/rexray/config.yml 

rexray:
  loglevel: info
  storageDrivers:
    - ec2
  volume:
    mount:
      preempt: true
    unmount:
      ignoreusedcount: true

clintkitson · 2016-04-02T22:41:59Z

@jdef I believe the 16 Device in use; refusing to close message is the key here. The mesos-module-dvdi does accounting to determine when it is appropriate to unmount a volume from the system. The purpose here is if there are multiple containers sharing a volume, then it is the modules responsibility to track this and ensure an unmount (host level operation) does not occur until all container are done using the volume.

The message in the bottom of the log is showing that there is something that is still using the device that is trying to be detached. Is there any chance that the device and/or mount path is being accessed by another process on the system? Can you capture the results of lsof filtering for the mounted paths?

jdef · 2016-04-04T20:09:05Z

@clintonskitson lsof results:

core@ip-10-0-0-142 ~ $ sudo lsof -V /tmp/test-rexray-volume
lsof: no file system use located: /tmp/test-rexray-volume
core@ip-10-0-0-142 ~ $ cat /proc/mounts|grep -e rexray
/dev/xvdc /tmp/test-rexray-volume ext4 rw,relatime,data=ordered 0 0
core@ip-10-0-0-142 ~ $ sudo lsof -V /dev/xvdc              
lsof: no file system use located: /dev/xvdc
core@ip-10-0-0-142 ~ $ ls /tmp/test-rexray-volume
hello
core@ip-10-0-0-142 ~ $ sudo lsof -V /tmp/test-rexray-volume/hello
lsof: no file use located: /tmp/test-rexray-volume/hello

clintkitson · 2016-04-04T20:11:42Z

@jdef Can you perform something like lsof -V | grep test-rexray-volume?

jdef · 2016-04-04T20:12:46Z

comes back empty

On Mon, Apr 4, 2016 at 4:11 PM, Clinton Kitson [email protected]
wrote:

@jdef https://github.com/jdef Can you perform something like lsof -V |
grep test-rexray-volume?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#88 (comment)

James DeFelice
585.241.9488 (voice)
650.649.6071 (fax)

jdef · 2016-04-04T20:13:54Z

@clintonskitson i'm going to try with an explicit mount under /etc to see if that makes a difference

clintkitson · 2016-04-04T20:14:52Z

If it is empty, can you try an unmount of the path?

jdef · 2016-04-04T20:16:47Z

appears to succeed:

core@ip-10-0-0-142 ~ $ sudo umount /tmp/test-rexray-volume
core@ip-10-0-0-142 ~ $ cat /proc/mounts |grep -e rexray
core@ip-10-0-0-142 ~ $

On Mon, Apr 4, 2016 at 4:14 PM, Clinton Kitson [email protected]
wrote:

If it is empty, can you try an unmount of the path?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#88 (comment)

James DeFelice
585.241.9488 (voice)
650.649.6071 (fax)

clintkitson · 2016-04-04T20:20:00Z

Can you replicate the problem where it hung before? It would have hung because there were open files. Technically, since the container that has the volume mounted and is using the files is done, there should not be any files open. I am curious to see if you could do a lsof during this time to see why the mount process is hanging.

jdef · 2016-04-04T20:34:18Z

well, i just tried to launch a new task with a new volume instance but rexray is hanging on trying to mount it. this is the same slave that I just manually ran the umount command on.

Apr 04 20:23:25 ip-10-0-0-142.us-west-2.compute.internal rexray[906]: time="2016-04-04T20:23:25Z" level=info msg=vdm.Mount driverName=docker moduleName=default
-docker newFsType= overwriteFs=false preempt=false volumeID= volumeName=jdef-test-vol13115
Apr 04 20:23:25 ip-10-0-0-142.us-west-2.compute.internal rexray[906]: time="2016-04-04T20:23:25Z" level=info msg="mounting volume" driverName=docker moduleName
=default-docker newFsType= overwriteFs=false volumeID= volumeName=jdef-test-vol13115
Apr 04 20:23:25 ip-10-0-0-142.us-west-2.compute.internal rexray[906]: time="2016-04-04T20:23:25Z" level=info msg=sdm.GetVolume driverName=ec2 moduleName=defaul
t-docker volumeID= volumeName=jdef-test-vol13115
Apr 04 20:23:27 ip-10-0-0-142.us-west-2.compute.internal rexray[906]: time="2016-04-04T20:23:26Z" level=info msg=sdm.GetVolumeAttach driverName=ec2 instanceID=
i-25bff2fd moduleName=default-docker volumeID=vol-9be4fb28
Apr 04 20:23:27 ip-10-0-0-142.us-west-2.compute.internal rexray[906]: time="2016-04-04T20:23:27Z" level=info msg=odm.Unmount driverName=linux moduleName=defaul
t-docker mountPoint="//var/lib/rexray/volumes/jdef-test-vol13115"
Apr 04 20:23:27 ip-10-0-0-142.us-west-2.compute.internal rexray[906]: time="2016-04-04T20:23:27Z" level=info msg=sdm.AttachVolume driverName=ec2 force=true ins
tanceID=i-25bff2fd moduleName=default-docker runAsync=false volumeID=vol-9be4fb28
Apr 04 20:23:27 ip-10-0-0-142.us-west-2.compute.internal rexray[906]: time="2016-04-04T20:23:27Z" level=info msg="got next device name" driverName=ec2 moduleNa
me=default-docker nextDeviceName="/dev/xvdc"
Apr 04 20:23:27 ip-10-0-0-142.us-west-2.compute.internal rexray[906]: time="2016-04-04T20:23:27Z" level=info msg="waiting for volume attachment to complete" dr
iverName=ec2 force=true instanceID=i-25bff2fd moduleName=default-docker runAsync=false volumeID=vol-9be4fb28

the error I produced today was on a completely new cluster vs the error I first reported. so it can be reproduced. the containers i'm spinning up aren't touching any files in the volume, so i'm not sure what would be holding a handle to any files on it.

clintkitson · 2016-04-04T20:36:13Z

@jdef Waiting for volume attachment to complete looks to be a EC2 based problem and might align to a bug they have in their Ubuntu OS image with the EBS driver. Can you reboot the instances and try again?

jdef · 2016-04-04T21:35:26Z

FWIW this is coreos, not Ubuntu. I've rebooted the slave. Once rebooted, the task launched just fine (tail -f /dev/null).

lsof -V did not show anything accessing the volume filesystem

i killed the task. and dvdcli unmount is hanging again trying to unmount the volume. running lsof -V still shows that no one is accessing the volume filesystem

jdef · 2016-04-04T21:36:19Z

core@ip-10-0-0-142 ~ $ uname -a
Linux ip-10-0-0-142.us-west-2.compute.internal 4.1.7-coreos-r1 #2 SMP Thu Nov 5 02:10:23 UTC 2015 x86_64 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz GenuineIntel GNU/Linux

jdef · 2016-04-04T21:42:24Z

to be clear, once dvdcli unmount hangs further task launches (using ext vols, or not) are all blocked in TASK_STAGING and eventually time out.

clintkitson · 2016-04-05T00:08:22Z

What is the coreos rev or AMI? I will try and reproduce.

On Monday, April 4, 2016, James DeFelice [email protected] wrote:

to be clear, once dvdcli unmount hangs further task launches (using ext
vols, or not) are all blocked in TASK_STAGING and eventually time out.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#88 (comment)

jdef · 2016-04-05T03:32:58Z

looks like this AMI CoreOS-stable-766.5.0-hvm

clintkitson · 2016-04-05T03:34:14Z

Can you review the devices that show up under /dev/disk/by-id?

Curious if something is claiming for block devices as they appear.

On Monday, April 4, 2016, James DeFelice [email protected] wrote:

looks like this AMI CoreOS-stable-766.5.0-hvm

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#88 (comment)

jdef · 2016-04-05T03:36:27Z

core@ip-10-0-0-142 ~ $ ls -laF /dev/disk/by-uuid/
total 0
drwxr-xr-x 2 root root 160 Apr 4 21:11 ./
drwxr-xr-x 7 root root 140 Apr 4 20:38 ../
lrwxrwxrwx 1 root root 11 Apr 4 20:38
01d6bc7e-0ab0-49f0-9e24-23ff5fc001c9 -> ../../xvda9
lrwxrwxrwx 1 root root 11 Apr 4 20:38
0321af0d-eff8-4f69-9f05-9f9f7bd3c7cf -> ../../xvda3
lrwxrwxrwx 1 root root 10 Apr 4 20:38
15797eed-93bd-43ab-a7ad-19e7e1fa0b31 -> ../../xvdb
lrwxrwxrwx 1 root root 10 Apr 4 21:11
6e2bac7e-8f12-428b-9a4a-38b599573190 -> ../../xvdc
lrwxrwxrwx 1 root root 11 Apr 4 20:38 CBC4-C7B4 -> ../../xvda1
lrwxrwxrwx 1 root root 11 Apr 4 20:38
ba2d8241-3460-4b9c-88d1-ffc493349f66 -> ../../xvda6

On Mon, Apr 4, 2016 at 11:34 PM, Clinton Kitson [email protected]
wrote:

Can you review the devices that show up under /dev/disk/by-id?

Curious if something is claiming for block devices as they appear.

On Monday, April 4, 2016, James DeFelice [email protected] wrote:

looks like this AMI CoreOS-stable-766.5.0-hvm

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
<
#88 (comment)

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#88 (comment)

James DeFelice
585.241.9488 (voice)
650.649.6071 (fax)

clintkitson · 2016-04-05T03:37:19Z

Ok looks normal, thanks.

On Monday, April 4, 2016, James DeFelice [email protected] wrote:

core@ip-10-0-0-142 ~ $ ls -laF /dev/disk/by-uuid/
total 0
drwxr-xr-x 2 root root 160 Apr 4 21:11 ./
drwxr-xr-x 7 root root 140 Apr 4 20:38 ../
lrwxrwxrwx 1 root root 11 Apr 4 20:38
01d6bc7e-0ab0-49f0-9e24-23ff5fc001c9 -> ../../xvda9
lrwxrwxrwx 1 root root 11 Apr 4 20:38
0321af0d-eff8-4f69-9f05-9f9f7bd3c7cf -> ../../xvda3
lrwxrwxrwx 1 root root 10 Apr 4 20:38
15797eed-93bd-43ab-a7ad-19e7e1fa0b31 -> ../../xvdb
lrwxrwxrwx 1 root root 10 Apr 4 21:11
6e2bac7e-8f12-428b-9a4a-38b599573190 -> ../../xvdc
lrwxrwxrwx 1 root root 11 Apr 4 20:38 CBC4-C7B4 -> ../../xvda1
lrwxrwxrwx 1 root root 11 Apr 4 20:38
ba2d8241-3460-4b9c-88d1-ffc493349f66 -> ../../xvda6

On Mon, Apr 4, 2016 at 11:34 PM, Clinton Kitson <[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');>
wrote:

Can you review the devices that show up under /dev/disk/by-id?

Curious if something is claiming for block devices as they appear.

On Monday, April 4, 2016, James DeFelice <[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');> wrote:

looks like this AMI CoreOS-stable-766.5.0-hvm

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
<

#88 (comment)

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
<
#88 (comment)

James DeFelice
585.241.9488 (voice)
650.649.6071 (fax)

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#88 (comment)

jdef · 2016-04-05T03:41:49Z

so, it actually doesn't block indefinitely... the job I tried to start about 6 hours ago finally started about 20m ago. slave logs have this to say:

E0405 03:18:47.355356  3293 shell.hpp:93] Command '/opt/mesosphere/bin/dvdcli unmount --volumedriver=rexray --volumename=jdef-test-vol13115 ' failed; this is the output:
W0405 03:18:47.355484  3293 docker_volume_driver_isolator.cpp:386] /opt/mesosphere/bin/dvdcli unmount failed to execute on cleanup(), continuing on the assumption this volume was manually unmounted previously Failed to execute '/opt/mesosphere/bin/dvdcli unmount --volumedriver=rexray --volumename=jdef-test-vol13115 '; the command was either not found or exited with a non-zero exit status: 1
I0405 03:18:47.355914  3293 containerizer.cpp:666] Starting container 'bf8b669c-9623-43c2-9345-bf895f77059f' for executor 'adasdasdasdasd.e6874945-faad-11e5-bf0b-161c5a4ef132' of framework '15c5b4d7-9ae8-40f6-91a5-4bf40cb8c791-0000'
I0405 03:18:47.357314  3293 containerizer.cpp:1392] Destroying container 'bf8b669c-9623-43c2-9345-bf895f77059f'

immediately after this log output, my (formerly) blocked non-volume-using task was finally launched (actually it experienced a TASK_FAILED because it had started so long ago and marathon had since killed it, but ... it finally caught up to the world. marathon was then able to launch it again and that worked).

jdef · 2016-04-05T03:43:23Z

rexray had something to say about the same time:

Apr 05 03:18:47 ip-10-0-0-142.us-west-2.compute.internal rexray[5498]: time="2016-04-05T03:18:47Z" level=error msg="/VolumeDriver.Unmount: error unmounting volume" error="AWS was not able to validate the provided 
access credentials (AuthFailure)"
Apr 05 03:18:47 ip-10-0-0-142.us-west-2.compute.internal mesos-slave[3175]: time="2016-04-05T03:18:47Z" level=error msg="Plugin Error: VolumeDriver.Unmount, {\"Error\":\"AWS was not able to validate the provided a
ccess credentials (AuthFailure)\"}\n"
Apr 05 03:18:47 ip-10-0-0-142.us-west-2.compute.internal rexray[5498]: time="2016-04-05T03:18:47Z" level=error msg="AWS was not able to validate the provided access credentials (AuthFailure)"
Apr 05 03:18:47 ip-10-0-0-142.us-west-2.compute.internal mesos-slave[3175]: E0405 03:18:47.355356  3293 shell.hpp:93] Command '/opt/mesosphere/bin/dvdcli unmount --volumedriver=rexray --volumename=jdef-test-vol131
15 ' failed; this is the output:
Apr 05 03:18:47 ip-10-0-0-142.us-west-2.compute.internal mesos-slave[3175]: W0405 03:18:47.355484  3293 docker_volume_driver_isolator.cpp:386] /opt/mesosphere/bin/dvdcli unmount failed to execute on cleanup(), con
tinuing on the assumption this volume was manually unmounted previously Failed to execute '/opt/mesosphere/bin/dvdcli unmount --volumedriver=rexray --volumename=jdef-test-vol13115 '; the command was either not fou
nd or exited with a non-zero exit status: 1

clintkitson · 2016-04-05T04:38:23Z

@jdef With a fresh system can you try and leverage the dvdcli commands manually and check the results?

Please replace rexray and test with appropriate names for your config. This should simulate the most basic usage.

/opt/bin/dvdcli mount --volumedriver=rexray --volumename=test
/opt/bin/dvdcli unmount --volumedriver=rexray --volumename=test

clintkitson · 2016-04-05T04:40:06Z

@jdef Can you also paste in the marathon application definition that you are using?

jdef · 2016-04-05T15:31:32Z

Apparently we're running a forked version of rexray that tries to use IAM credentials instead of credentials coded into the configuration file. Upon further review it seems likely that the library code responsible for EC2 auth is failing to refresh credentials upon token expiration. That said, it's not clear to me that the credentials issue is directly related to hanging unmount commands, especially when I terminate a task so closely to when it's created -- given how easy it is to reproduce the issue, I'm betting that the credentials have not yet expired.

The marathon spec I'm using depends on a development branch of marathon that's adding higher level support for external volumes (so this is subject to change prior to being merged into master):

{
  "id": "hello",
  "instances": 1,
  "cpus": 0.1,
  "mem": 32,
  "cmd": "/usr/bin/tail -f /dev/null",
  "container": {
    "type": "MESOS",
    "volumes": [
      {
        "containerPath": "/tmp/test-rexray-volume",
        "persistent": {
          "size": 100,
          "name": "jdef-test-vol13115",
          "provider": "external",
          "options": { "external/driver": "rexray" }
          },
        "mode": "RW"
      }
    ]
  },
  "upgradeStrategy": {
    "minimumHealthCapacity": 0,
    "maximumOverCapacity": 0
  }
}

To test the mount/unmount commands manually I'll need to spin up a fresh cluster and execute the commands before the initially detected IAM creds expire.

jdef · 2016-04-07T02:48:46Z

when I manually run the dvdcli mount/unmount command they work just fine with the IAM fixes @branden incorporated. bumped into #89 and that's stalling further testing.

jdef · 2016-04-07T03:02:08Z

hm... IAM problems have been resolved and the hung unmount problem persists:

Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.814273  1542 slave.cpp:3528] executor(1)@10.0.1.57:47184 exited
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.910516  1547 containerizer.cpp:1608] Executor for container '903b69ea-9656-44dc-bae3-a106bc954f9f' has exited
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.910591  1547 containerizer.cpp:1392] Destroying container '903b69ea-9656-44dc-bae3-a106bc954f9f'
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.911882  1541 cgroups.cpp:2427] Freezing cgroup /sys/fs/cgroup/freezer/mesos/903b69ea-9656-44dc-bae3-a106bc954f9f
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.912988  1547 cgroups.cpp:1409] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/903b69ea-9656-44dc-bae3-a106bc954f9f 
after 1.054976ms
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.914213  1542 cgroups.cpp:2445] Thawing cgroup /sys/fs/cgroup/freezer/mesos/903b69ea-9656-44dc-bae3-a106bc954f9f
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.915292  1547 cgroups.cpp:1438] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/903b69ea-9656-44dc-bae3-a106bc954f9
f after 1.035008ms
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.915989  1545 docker_volume_driver_isolator.cpp:357] rexray/jdef-test-vol6249824 is being unmounted on cleanup()
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal mesos-slave[1500]: I0407 02:59:31.917794  1545 docker_volume_driver_isolator.cpp:363] Invoking /opt/mesosphere/bin/dvdcli unmount --volumedriver=rexray --vol
umename=jdef-test-vol6249824
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:31Z" level=info msg=vdm.Create driverName=docker moduleName=default-docker opts=map[] volumeName=jdef-test-vol6249824
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:31Z" level=info msg="initialized count" count=0 moduleName=default-docker volumeName=jdef-test-vol6249824
Apr 07 02:59:31 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:31Z" level=info msg="creating volume" driverName=docker moduleName=default-docker volumeName=jdef-test-vol6249824 volumeO
pts=map[]
Apr 07 02:59:32 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:32Z" level=info msg=sdm.GetVolume driverName=ec2 moduleName=default-docker volumeID= volumeName=jdef-test-vol6249824
Apr 07 02:59:32 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:32Z" level=info msg=vdm.Unmount driverName=docker moduleName=default-docker volumeID= volumeName=jdef-test-vol6249824
Apr 07 02:59:32 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:32Z" level=info msg="initialized count" count=0 moduleName=default-docker volumeName=jdef-test-vol6249824
Apr 07 02:59:32 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:32Z" level=info msg="unmounting volume" driverName=docker moduleName=default-docker volumeID= volumeName=jdef-test-vol624
9824
Apr 07 02:59:32 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:32Z" level=info msg=sdm.GetVolume driverName=ec2 moduleName=default-docker volumeID= volumeName=jdef-test-vol6249824
Apr 07 02:59:32 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:32Z" level=info msg=sdm.GetVolumeAttach driverName=ec2 instanceID=i-3e6f7ae4 moduleName=default-docker volumeID=vol-9482e
22d
Apr 07 02:59:33 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:33Z" level=info msg=odm.GetMounts deviceName="/dev/xvdc" driverName=linux moduleName=default-docker mountPoint=
Apr 07 02:59:33 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:33Z" level=info msg=odm.Unmount driverName=linux moduleName=default-docker mountPoint="/var/lib/rexray/volumes/jdef-test-
vol6249824"
Apr 07 02:59:33 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:33Z" level=info msg=sdm.DetachVolume driverName=ec2 instanceID= moduleName=default-docker runAsync=false volumeID=vol-948
2e22d
Apr 07 02:59:33 ip-10-0-1-57.us-west-2.compute.internal rexray[905]: time="2016-04-07T02:59:33Z" level=info msg="waiting for volume detachment to complete" driverName=ec2 force=false moduleName=default-docker runA
sync=false volumeID=vol-9482e22d
Apr 07 02:59:33 ip-10-0-1-57.us-west-2.compute.internal kernel: vbd vbd-51744: 16 Device in use; refusing to close

jdef · 2016-04-07T06:33:45Z

using filesystem/linux isolator in conjunction with this isolator fixes the problem. without the filesystem/linux isolator the host mountns is contaminated with the mounts added to the container. core team says that the hosts /tmp was being polluted by the container w/o the filesystem/linux isolator

jdef · 2016-04-07T15:21:48Z

I've left this issue open because this dependency on the linux/filesystem isolator should be documented

clintkitson · 2016-04-07T15:26:58Z

@jdef Doing some more testing on our side. To me this doesn't make sense that this would be required.

clintkitson · 2016-04-07T15:43:02Z

@jdef So the troubling thing here is that I believe I simulated relevant options by supplying DVDI_VOLUME_NAME, DVDI_VOLUME_DRIVER, DVDI_VOLUME_CONTAINERPATH. I did this from 0.27.0 and everything looks to be working fine.

So the question I am moving towards is whether there is some type of race condition introduced in a newer rev of Mesos. The failure of the detach is based on things not being cleaned up appropriately. If there was logic that changed on the cleanup side in 0.28+, then it may be interfering with our ability to properly remove mounts to enable the detach process to complete.

jdef · 2016-04-07T22:05:39Z

one of the symptoms is that when a container path mount is created /tmp (in our test system) within the container mountns it's propagating back to the hosts mountns. the container mountns is destroyed upon container teardown but the mount still exists in the host mountns. rexray then tries to detach the volume but can't.

our core team tells me this is because without the linux/fs isolator, /tmp exists in a shared mountns. and they also tell me that there is no other valid workaround for this problem -- that a container mountns must be properly isolated to avoid mounts propagating back to the host mountns.

@clintonskitson are you saying that DVD isolator already prevents such leakage?

/cc @jieyu

clintkitson · 2016-04-07T22:15:53Z

@cantbewong @jdef Having a mountpath leak from a container to the hosts mount doesn't sound right. Is there any documentation to point us to online that discusses this? Can you try this on an Ubuntu 14.04 based host?

jieyu · 2016-04-07T22:17:23Z

Ubuntu 14.04 does not have this issue because by default, all mounts are private.

Centos7/fedora23/coreos will have this issue because by default, all mounts are shared.

clintkitson · 2016-04-08T04:21:36Z

@cantbewong Is it possible that when we leverage the fileystem isolator code from Mesos that we aren't invoking it as a private mount? I could see this being the problem if DVDI_VOLUME_CONTAINERPATH is being invoked.

cantbewong · 2016-04-12T23:45:23Z

Assuming the mount being discussed is the bind mount used to isolate a container:

the mount command is queued as follows
prepareInfo.add_commands()->set_value( "mount -n --rbind " + mountPoint + " " + containerPath);

No efforts are made to specify private vs shared so if RedHat and Ubuntu differ as to default, the private vs. shared state of the bind mount will differ.

I found documentation that indicates that the RedHat default is private
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/sect-Using_the_mount_Command-Mounting-Bind.html

The Fedora documentation refers to the kernel documentation which also indicates a default of private
https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt

If this is the root cause of this issue, an explicit call to declare the mount private would be useful
mount --make-rprivate mount_point

Making this call should be harmless, other than adding the time it takes to execute, so I will put it in the next version of the isolator.

jdef · 2016-04-12T23:49:16Z

I think @jieyu expressed concerns that not all versions of mount
supported the same set of flags. Should probably determine whether the flag
you're proposing is available on the OS versions that should be supported

On Tue, Apr 12, 2016 at 7:45 PM, Steve Wong [email protected]
wrote:

Assuming the mount being discussed is the bind mount used to isolate a
container:

the mount command is queued as follows
prepareInfo.add_commands()->set_value( "mount -n --rbind " + mountPoint +
" " + containerPath);

No efforts are made to specify private vs shared so if RedHat and Ubuntu
differ as to default, the private vs. shared state of the bind mount will
differ.

I found documentation that indicates that the RedHat default is private

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/sect-Using_the_mount_Command-Mounting-Bind.html

The Fedora documentation refers to the kernel documentation which also
indicates a default of private
https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt

If this is the root cause of this issue, an explicit call to declare the
mount private would be useful
mount --make-rprivate mount_point

Making this call should be harmless, other than adding the time it takes
to execute, so I will put it in the next version of the isolator.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#88 (comment)

James DeFelice
585.241.9488 (voice)
650.649.6071 (fax)

cantbewong · 2016-04-13T06:59:59Z

@jdef can you provide a suggestion for a supported OS list?

jieyu · 2016-04-14T18:25:46Z

@jdef @cantbewong the problem regarding the make-rslave|make-rprivate issue is described here in this Mesos ticket. It does not work properly on ubuntu 14.04
https://issues.apache.org/jira/browse/MESOS-3806

clintkitson · 2016-04-15T01:27:47Z

The docker solution here is within a mount package they have that calls the
kernel via C I believe. Might be worth a look to see about leveraging that
as a long term solution versus calling a CLI that may continue to differ
between OS dist is over time.

On Thursday, April 14, 2016, Jie Yu [email protected] wrote:

@jdef https://github.com/jdef @cantbewong
https://github.com/cantbewong the problem regarding the
make-rslave|make-rprivate issue is described here in this Mesos ticket. It
does not work properly on ubuntu 14.04
https://issues.apache.org/jira/browse/MESOS-3806

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#88 (comment)

uschen · 2017-04-15T00:18:14Z

Same issue
coreos stable (1235.12.0)

Apr 15 00:11:57 10.1.21.50 rexray[1747]: time="2017-04-15T00:11:57Z" level=info msg="waiting for volume detachment to complete" driverName=ec2 force=false moduleName=default-docker runAsync=false volumeID=vol-xxxxx

docker ps etc wont response

clintkitson added the help wanted label Apr 2, 2016

kolloch mentioned this issue Apr 7, 2016

External Volumes: DVDI/rexray d2iq-archive/marathon#3698

Closed

jdef mentioned this issue Apr 12, 2016

isolator should invoke potentially blocking operations async from module API handlers #92

Open

cantbewong added this to the 0.4.2 milestone Apr 12, 2016

cantbewong self-assigned this Apr 12, 2016

clintkitson mentioned this issue Apr 13, 2016

Add a mount and unmount timeout rexray/rexray#364

Closed

cantbewong modified the milestones: 0.4.3, 0.4.2 Apr 18, 2016

davidvonthenen modified the milestones: 0.4.4, 0.4.3 Jun 10, 2016

hung on unmount operation #88

hung on unmount operation #88

Comments

jdef commented Apr 2, 2016

jdef commented Apr 2, 2016

jdef commented Apr 2, 2016

jdef commented Apr 2, 2016

jdef commented Apr 2, 2016

clintkitson commented Apr 2, 2016

jdef commented Apr 4, 2016

clintkitson commented Apr 4, 2016

jdef commented Apr 4, 2016

jdef commented Apr 4, 2016

clintkitson commented Apr 4, 2016

jdef commented Apr 4, 2016

clintkitson commented Apr 4, 2016

jdef commented Apr 4, 2016

clintkitson commented Apr 4, 2016

jdef commented Apr 4, 2016

jdef commented Apr 4, 2016

jdef commented Apr 4, 2016

clintkitson commented Apr 5, 2016

jdef commented Apr 5, 2016

clintkitson commented Apr 5, 2016

jdef commented Apr 5, 2016

clintkitson commented Apr 5, 2016

jdef commented Apr 5, 2016

jdef commented Apr 5, 2016

clintkitson commented Apr 5, 2016

clintkitson commented Apr 5, 2016

jdef commented Apr 5, 2016

jdef commented Apr 7, 2016

jdef commented Apr 7, 2016

jdef commented Apr 7, 2016

jdef commented Apr 7, 2016

clintkitson commented Apr 7, 2016

clintkitson commented Apr 7, 2016

jdef commented Apr 7, 2016

clintkitson commented Apr 7, 2016

jieyu commented Apr 7, 2016

clintkitson commented Apr 8, 2016

cantbewong commented Apr 12, 2016

jdef commented Apr 12, 2016

cantbewong commented Apr 13, 2016

jieyu commented Apr 14, 2016

clintkitson commented Apr 15, 2016

uschen commented Apr 15, 2017