Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merge] popcorn_migrate crashes when duplicate ips in the nodes file #94

Open
AHatnarf opened this issue Dec 17, 2019 · 0 comments
Open
Labels

Comments

@AHatnarf
Copy link
Collaborator

When /etc/popcorn/nodes has duplicate ips, i.e:

192.168.122.2
127.0.0.1
192.168.122.2
127.0.0.1

When inserting the module:

popcorn@x86:~$ sudo insmod msg_socket.ko
[  T723] Loading Popcorn messaging layer over TCP/IP...
[  T723] popcorn: Loading node configuration...
[  T723] popcorn:   0: 192.168.122.2
[  T723] popcorn: * 1: 127.0.0.1
[  T723] popcorn:   2: 192.168.122.2
[  T723] popcorn: * 3: 127.0.0.1
[  T723] Replace hot transport at your own risk.
[  T723] Ready to accept incoming connections
[  T723] Connecting to 0 at 192.168.122.2
[  T725] SEND handler for 0 is ready
[  T723] Connecting to 1 at 127.0.0.1
[  T726] RECV handler for 0 is ready
[  T727] SEND handler for 1 is ready
[  T723] Connecting to 2 at 192.168.122.2
[  T728] RECV handler for 1 is ready
[  T729] SEND handler for 2 is ready
[  T730] RECV handler for 2 is ready
[  T723] popcorn: Ready on TCP/IP
popcorn@x86:~$ sudo insmod msg_socket.ko 

Migration fails when running the basic test, with the following trace:

popcorn@x86:~/src$ ./basic
[  T745] ####### MIGRATE [745] to 1
[  T745] ------------[ cut here ]------------
[  T745] kernel BUG at ./include/popcorn/regset.h:94!
[  T745] invalid opcode: 0000 [#2] SMP NOPTI
[  T745] CPU: 0 PID: 745 Comm: basic Tainted: G      D W  O      5.2.0-rc4-popcorn+ #44
[  T745] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[  T745] RIP: 0010:process_server_do_migration+0x501/0x7f0
[  T745] Code: 00 e9 9b fd ff ff 0f 0b e9 94 fd ff ff 48 89 ef e8 d4 11 00 00 e9 87 fd ff ff 85 c0 41 89 c5 0f 84 2e fd ff ff e9 f8 fc ff ff <0f> 0b 0f 0b 8b b3 c8 03 00 00 8b 4d 0a 89 c2 48 c7 c7 c0 2e b7 81
[  T745] RSP: 0018:ffffc90000257e50 EFLAGS: 00000282
[  T745] RAX: 00000000ffffffff RBX: ffff888058471000 RCX: ffff88805847127e
[  T745] RDX: 0000000000000002 RSI: 0000000000000001 RDI: 0000000000000001
[  T745] RBP: ffff88805a1432c0 R08: 0000000000000000 R09: 0000000000000000
[  T745] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88805a558300
Travel for 118 ms
[  T745] R13: 0000000000000001 R14: 00007fffffffe900 R15: 0000000000000000
[  T745] FS:  00000000004ab880(0000) GS:ffff88805ba00000(0000) knlGS:0000000000000000
[  T745] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  T745] CR2: 00000000004ada38 CR3: 000000005861e000 CR4: 00000000000006f0
[  T745] Call Trace:
[  T745]  __se_sys_popcorn_migrate+0x75/0x130
[  T745]  __x64_sys_popcorn_migrate+0x16/0x20
[  T745]  do_syscall_64+0x69/0x440
[  T745]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  T745]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  T745] RIP: 0033:0x401e98
[  T745] Code: 44 0f 7f bd 40 ff ff ff 48 c7 85 80 fd ff ff 98 1e 40 00 48 8d 95 80 fd ff ff 8b bd 6c fd ff ff 48 89 d6 b8 b2 01 00 00 0f 05 <0f> ae f0 48 8b ad b8 fd ff ff 48 8b 9d a0 fd ff ff 4c 8b a5 e8 fd
[  T745] RSP: 002b:00007fffffffe8d0 EFLAGS: 00000246 ORIG_RAX: 00000000000001b2
[  T745] RAX: ffffffffffffffda RBX: 0000000000000076 RCX: 0000000000401e98
[  T745] RDX: 00007fffffffe900 RSI: 00007fffffffe900 RDI: 0000000000000001
[  T745] RBP: 00007fffffffeb80 R08: 0000000000000000 R09: 0000000000000012
[  T745] R10: 00007fffffffea65 R11: 0000000000000246 R12: 0000000000402f60
[  T745] R13: 0000000000000000 R14: 00000000004a7018 R15: 0000000000000000
[  T745] Modules linked in: msg_socket(O) [last unloaded: msg_socket]
[  T745] ---[ end trace f952d52dca275b48 ]---
[  T745] RIP: 0010:sock_recvmsg+0xf/0x30
[  T745] Code: c8 ff e9 24 ff ff ff e8 af 75 c3 ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8b 47 28 4c 8b 46 20 89 d1 <48> 8b 80 80 00 00 00 4c 89 c2 48 3d 70 d4 4d 81 75 05 e9 fa 33 0b
[  T745] RSP: 0018:ffffc90000147e20 EFLAGS: 00000282
[  T745] RAX: 0000000000000000 RBX: ffff88805a39c4c0 RCX: 0000000000000100
[  T745] RDX: 0000000000000100 RSI: ffffc90000147e68 RDI: ffff8880579038c0
[  T745] RBP: ffffc90000147e68 R08: 0000000000000072 R09: 0000000000000100
[  T745] R10: ffff8880579038c0 R11: 0000000000000000 R12: ffffffffffffffff
[  T745] R13: ffff8880579038c0 R14: 0000000000000100 R15: ffffffffa00002c0
[  T745] FS:  00000000004ab880(0000) GS:ffff88805ba00000(0000) knlGS:0000000000000000
[  T745] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  T745] CR2: 00000000004ada38 CR3: 000000005861e000 CR4: 00000000000006f0
[  T745] EXITED [745] local / 0xb

We likely would need to make sure entries are deduplicated in the msg_layer.

@jnarf jnarf added the bug label Dec 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants