[pve-devel] Kernel 4.2.8-1 problems

Paul Penev ppquant at gmail.com
Fri Mar 18 09:05:39 CET 2016


For a day I thought it was hardware related, but tonight the problem
reappeared again. This time, however, not inside NFS but somewhere
around fuse. The only fuse filesystems I have mounted are
lxcfs-related.

I'm attaching the logs below. I tried attaching gdb to the relevant
processes, but it was impossible. GDB would hang at the "attaching"
stage, so it was impossible to get more insight.

Look at the end, when the cgroup was killed because of low-memory
situation. This could've been inside an LXC container, because the
server still had 5GB of free memory left (checked in /proc/meminfo,
not the proxmox gui).

Mar 18 07:18:26 v14 kernel: [280550.207754]  [<ffffffff810bdd30>] ?
wait_woken+0x90/0x90
Mar 18 07:18:26 v14 kernel: [280550.207762]  [<ffffffff81306374>]
__fuse_direct_read+0x44/0x60
Mar 18 07:18:26 v14 kernel: [280550.207769]  [<ffffffff811fd91a>]
vfs_read+0x8a/0x130
Mar 18 07:18:26 v14 kernel: [280550.207801]       Tainted: P
O    4.2.8-1-pve #1
Mar 18 07:18:26 v14 kernel: [280550.207874]  0000000000000246
ffff8803a69a0000 ffff8803a699fc38 ffff88059f4ceb00
Mar 18 07:18:26 v14 kernel: [280550.207882]  [<ffffffff812fb863>]
request_wait_answer+0x163/0x280
Mar 18 07:18:26 v14 kernel: [280550.207889]  [<ffffffff81306128>]
fuse_direct_io+0x3a8/0x5b0
Mar 18 07:18:26 v14 kernel: [280550.207896]  [<ffffffff811fd2c6>]
__vfs_read+0x26/0x40
Mar 18 07:18:26 v14 kernel: [280550.207904] INFO: task vmstat:10920
blocked for more than 120 seconds.
Mar 18 07:18:26 v14 kernel: [280550.208003]  ffff880340fffbe8
0000000000000082 ffff8805a5520000 ffff88049ad71900
Mar 18 07:18:26 v14 kernel: [280550.208008] Call Trace:
Mar 18 07:18:26 v14 kernel: [280550.208014]  [<ffffffff810bdd30>] ?
wait_woken+0x90/0x90
Mar 18 07:18:26 v14 kernel: [280550.208019]  [<ffffffff81306128>]
fuse_direct_io+0x3a8/0x5b0
Mar 18 07:18:26 v14 kernel: [280550.208026]  [<ffffffff811fd2c6>]
__vfs_read+0x26/0x40
Mar 18 07:20:26 v14 kernel: [280670.306033] INFO: task vmstat:10164
blocked for more than 120 seconds.
Mar 18 07:20:26 v14 kernel: [280670.306066]       Tainted: P
O    4.2.8-1-pve #1
Mar 18 07:20:26 v14 kernel: [280670.306092] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 18 07:20:26 v14 kernel: [280670.306140]  ffff8804457ebbe8
0000000000000086 ffff8805a5b6a580 ffff88037b53cb00
Mar 18 07:20:26 v14 kernel: [280670.306145]  ffff8805a3d73040
fffffffffffffe00 ffff8804457ebc08 ffffffff81806967
Mar 18 07:20:26 v14 kernel: [280670.306154]  [<ffffffff81806967>]
schedule+0x37/0x80
Mar 18 07:20:26 v14 kernel: [280670.306163]  [<ffffffff810bdd30>] ?
wait_woken+0x90/0x90
Mar 18 07:20:26 v14 kernel: [280670.306167]  [<ffffffff812fba47>]
fuse_request_send+0x27/0x30
Mar 18 07:20:26 v14 kernel: [280670.306172]  [<ffffffff81306374>]
__fuse_direct_read+0x44/0x60
Mar 18 07:20:26 v14 kernel: [280670.306177]  [<ffffffff811fd264>]
new_sync_read+0x94/0xd0
Mar 18 07:20:26 v14 kernel: [280670.306180]  [<ffffffff811fd91a>]
vfs_read+0x8a/0x130
Mar 18 07:20:26 v14 kernel: [280670.306186]  [<ffffffff8180aaf2>]
entry_SYSCALL_64_fastpath+0x16/0x75
Mar 18 07:20:26 v14 kernel: [280670.306216]       Tainted: P
O    4.2.8-1-pve #1
Mar 18 07:20:26 v14 kernel: [280670.306284] pyzor           D
ffff8805a90d6a00     0 10198   1378 0x00000104
Mar 18 07:20:26 v14 kernel: [280670.306289]  0000000000000246
ffff8801e331c000 ffff8801e331bc38 ffff88056f79b910
Mar 18 07:20:26 v14 kernel: [280670.306292] Call Trace:
Mar 18 07:20:26 v14 kernel: [280670.306298]  [<ffffffff810bdd30>] ?
wait_woken+0x90/0x90
Mar 18 07:20:26 v14 kernel: [280670.306303]  [<ffffffff81306128>]
fuse_direct_io+0x3a8/0x5b0
Mar 18 07:20:26 v14 kernel: [280670.306309]  [<ffffffff811fd264>]
new_sync_read+0x94/0xd0
Mar 18 07:20:26 v14 kernel: [280670.306313]  [<ffffffff811fe7e5>]
SyS_read+0x55/0xc0
Mar 18 07:20:26 v14 kernel: [280670.306344]       Tainted: P
O    4.2.8-1-pve #1
Mar 18 07:20:26 v14 kernel: [280670.306416]  ffff8803a699fbe8
0000000000000082 ffff8805a5b69900 ffff8804bb7b0000
Mar 18 07:20:26 v14 kernel: [280670.306421] Call Trace:
Mar 18 07:20:26 v14 kernel: [280670.306426]  [<ffffffff810bdd30>] ?
wait_woken+0x90/0x90
Mar 18 07:20:26 v14 kernel: [280670.306432]  [<ffffffff81306128>]
fuse_direct_io+0x3a8/0x5b0
Mar 18 07:20:26 v14 kernel: [280670.306437]  [<ffffffff811fd264>]
new_sync_read+0x94/0xd0
Mar 18 07:20:26 v14 kernel: [280670.306442]  [<ffffffff811fe7e5>]
SyS_read+0x55/0xc0
Mar 18 07:20:26 v14 kernel: [280670.306475]       Tainted: P
O    4.2.8-1-pve #1
Mar 18 07:20:26 v14 kernel: [280670.306545]  ffff880340fffbe8
0000000000000082 ffff8805a5520000 ffff88049ad71900
Mar 18 07:20:26 v14 kernel: [280670.306551] Call Trace:
Mar 18 07:20:26 v14 kernel: [280670.306556]  [<ffffffff810bdd30>] ?
wait_woken+0x90/0x90
Mar 18 07:20:26 v14 kernel: [280670.306562]  [<ffffffff81306128>]
fuse_direct_io+0x3a8/0x5b0
Mar 18 07:20:26 v14 kernel: [280670.306567]  [<ffffffff811fd264>]
new_sync_read+0x94/0xd0
Mar 18 07:20:26 v14 kernel: [280670.306572]  [<ffffffff811fe7e5>]
SyS_read+0x55/0xc0
Mar 18 07:20:26 v14 kernel: [280670.306606]       Tainted: P
O    4.2.8-1-pve #1
Mar 18 07:20:26 v14 kernel: [280670.306677]  ffff8801e55a3be8
0000000000000086 ffff8805a5b6a580 ffff8803f3637080
Mar 18 07:20:26 v14 kernel: [280670.306682] Call Trace:
Mar 18 07:20:26 v14 kernel: [280670.306688]  [<ffffffff810bdd30>] ?
wait_woken+0x90/0x90
Mar 18 07:20:26 v14 kernel: [280670.306693]  [<ffffffff81306128>]
fuse_direct_io+0x3a8/0x5b0
Mar 18 07:20:26 v14 kernel: [280670.306699]  [<ffffffff811fd264>]
new_sync_read+0x94/0xd0
Mar 18 07:20:26 v14 kernel: [280670.306704]  [<ffffffff811fe7e5>]
SyS_read+0x55/0xc0
Mar 18 07:20:26 v14 kernel: [280670.306736]       Tainted: P
O    4.2.8-1-pve #1
Mar 18 07:20:26 v14 kernel: [280670.306806]  ffff8801ebfa3be8
0000000000000086 ffff8805a5b69900 ffff880335640c80
Mar 18 07:20:26 v14 kernel: [280670.306812] Call Trace:
Mar 18 07:20:26 v14 kernel: [280670.306817]  [<ffffffff810bdd30>] ?
wait_woken+0x90/0x90
Mar 18 07:20:26 v14 kernel: [280670.306823]  [<ffffffff81306128>]
fuse_direct_io+0x3a8/0x5b0
Mar 18 07:20:26 v14 kernel: [280670.306828]  [<ffffffff811fd264>]
new_sync_read+0x94/0xd0
Mar 18 07:20:26 v14 kernel: [280670.306833]  [<ffffffff811fe7e5>]
SyS_read+0x55/0xc0
Mar 18 08:26:29 v14 kernel: [284636.120070] Memory cgroup out of
memory: Kill process 3842 (systemd-journal) score 181 or sacrifice
child
Mar 18 08:26:29 v14 kernel: [284636.120098] Killed process 3842
(systemd-journal) total-vm:139232kB, anon-rss:136kB, file-rss:97764kB

Is this something related to the latest 4.2.8-1 kernel?

2016-03-15 18:38 GMT+01:00 Paul Penev <ppquant at gmail.com>:
> It seems a bad idea. Mounting nfs inside a CT is disabled for a
> reason. I need to add unconfined profile to the CT defeating the
> purpose of having a CT (unless I oversee something).
>
> I have this issue only on one server, while another server has never
> had this (different cpu/motherboard).
>
> I moved the CT to a different host, but of same type to see if there's
> a change. It has identical hardware specs, but with much less load and
> processes.
>
> Let's see if there's a change.
>
> 2016-03-15 13:41 GMT+01:00 Alexandre DERUMIER <aderumier at odiso.com>:
>> maybe apparmor related.
>>
>> do you try to mount nfs inside the CT ?
>>
>> ----- Mail original -----
>> De: "Paul Penev" <ppquant at gmail.com>
>> À: "pve-devel" <pve-devel at pve.proxmox.com>
>> Envoyé: Mardi 15 Mars 2016 10:17:28
>> Objet: [pve-devel] Kernel 4.2.8-1 problems
>>
>> Hello,
>>
>> I'm not sure if this bug report belongs here, perhaps you can tell.
>>
>> I upgraded a few nodes from 3.4 to 4.1 and I started getting kernel
>> errors like the below one
>>
>> I'm running LXC containers with bind mounts. The mounts are exported
>> from a nfs server and mounted in the host. Then they're bind-mounted
>> inside the LXC container (mail spools).
>>
>> The effect is that load sky-rockets over 100. When I kill the LXC
>> container, the load comes back to normal level.s
>>
>> [39162.359660] ------------[ cut here ]------------
>> [39162.359675] kernel BUG at kernel/cred.c:426!
>> [39162.359686] invalid opcode: 0000 [#1] SMP
>> [39162.359699] Modules linked in: xt_multiport xt_nat xt_tcpudp
>> iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
>> nf_conntrack veth ip_set ip6table_filter ip6_tables iptable_filter
>> ip_tables softdog x_tables nfsv3 nfsd auth_rpcgss nfs_acl nfs lockd
>> grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core
>> ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q
>> garp mrp bonding openvswitch libcrc32c nfnetlink_log nfnetlink zfs(PO)
>> zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) coretemp
>> kvm_intel joydev input_leds snd_pcm kvm psmouse gpio_ich snd_timer snd
>> soundcore ioatdma shpchp serio_raw pcspkr 8250_fintek i7core_edac
>> i2c_i801 dca lpc_ich i5500_temp edac_core mac_hid vhost_net vhost
>> macvtap macvlan autofs4 btrfs xor raid6_pq raid1 hid_generic usbmouse
>> usbkbd
>> [39162.360001] usbhid hid uas usb_storage ahci e1000e(O) ptp libahci pps_core
>> [39162.360047] CPU: 0 PID: 24302 Comm: pop3 Tainted: P O
>> 4.2.8-1-pve #1
>> [39162.360090] Hardware name: Supermicro X8DTT-H/X8DTT-H, BIOS 2.1b
>> 10/28/2011
>> [39162.360134] task: ffff8802f07b0000 ti: ffff880231380000 task.ti:
>> ffff880231380000
>> [39162.360176] RIP: 0010:[<ffffffff8109d171>] [<ffffffff8109d171>]
>> commit_creds+0x201/0x240
>> [39162.360227] RSP: 0018:ffff880231383688 EFLAGS: 00010202
>> [39162.360252] RAX: 0000000000000000 RBX: ffff880066f7f030 RCX: ffff8806e4505c00
>> [39162.360281] RDX: 0000000000000361 RSI: 0000000000000001 RDI: ffff88029b421680
>> [39162.360309] RBP: ffff8802313836b8 R08: 0000000000019e20 R09: ffff88032f90eba0
>> [39162.360338] R10: ffff8802313837f8 R11: ffff880297eba000 R12: ffff88062dc90c00
>> [39162.360366] R13: ffff8802f07b0000 R14: 0000000000000002 R15: ffff8806c0d5aa80
>> [39162.360395] FS: 00007f5807126700(0000) GS:ffff880333a00000(0000)
>> knlGS:0000000000000000
>> [39162.360439] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [39162.360465] CR2: 00007fd0131c7802 CR3: 0000000230917000 CR4: 00000000000026f0
>> [39162.360494] Stack:
>> [39162.360513] 000000000000001f 000000007325b09f ffff8802313836b8
>> ffff880066f7f030
>> [39162.360563] ffff88032f90eba0 ffff88029b421680 ffff8802313836e8
>> ffffffff81361ff5
>> [39162.360612] ffff8802f07b0068 ffff880066f7f030 ffff8806e1535000
>> 000000000000001f
>> [39162.360661] Call Trace:
>> [39162.360685] [<ffffffff81361ff5>] aa_replace_current_label+0xf5/0x150
>> [39162.360715] [<ffffffff81375a4a>] aa_sk_perm.isra.4+0xaa/0x140
>> [39162.360741] [<ffffffff813761ce>] aa_sock_msg_perm+0x5e/0x150
>> [39162.360769] [<ffffffff8136ad51>] apparmor_socket_sendmsg+0x21/0x30
>> [39162.360798] [<ffffffff8132baf3>] security_socket_sendmsg+0x43/0x60
>> [39162.360827] [<ffffffff816cdfba>] sock_sendmsg+0x1a/0x50
>> [39162.360854] [<ffffffff816ce10b>] kernel_sendmsg+0x2b/0x30
>> [39162.360891] [<ffffffffc0529368>] xs_send_kvec+0xa8/0xb0 [sunrpc]
>> [39162.360923] [<ffffffffc05293ea>] xs_sendpages+0x7a/0x1d0 [sunrpc]
>> [39162.360959] [<ffffffffc053c240>] ? xdr_reserve_space+0x20/0x160 [sunrpc]
>> [39162.360992] [<ffffffffc052b1ac>] xs_tcp_send_request+0x8c/0x1f0 [sunrpc]
>> [39162.361026] [<ffffffffc0527756>] xprt_transmit+0x66/0x330 [sunrpc]
>> [39162.361057] [<ffffffffc0523e19>] call_transmit+0x1b9/0x2a0 [sunrpc]
>> [39162.361088] [<ffffffffc0523c60>] ? call_decode+0x850/0x850 [sunrpc]
>> [39162.361119] [<ffffffffc0523c60>] ? call_decode+0x850/0x850 [sunrpc]
>> [39162.361152] [<ffffffffc052e4f1>] __rpc_execute+0x91/0x440 [sunrpc]
>> [39162.361185] [<ffffffffc053167e>] rpc_execute+0x5e/0xa0 [sunrpc]
>> [39162.361216] [<ffffffffc0525280>] rpc_run_task+0x70/0x90 [sunrpc]
>> [39162.361248] [<ffffffffc05252f0>] rpc_call_sync+0x50/0xd0 [sunrpc]
>> [39162.361277] [<ffffffffc064b5cf>]
>> nfs3_rpc_wrapper.constprop.12+0x5f/0xd0 [nfsv3]
>> [39162.361321] [<ffffffffc064b9c4>] nfs3_proc_access+0xc4/0x190 [nfsv3]
>> [39162.361355] [<ffffffffc05b0852>] nfs_do_access+0x242/0x3c0 [nfs]
>> [39162.361388] [<ffffffffc0533855>] ? generic_lookup_cred+0x15/0x20 [sunrpc]
>> [39162.361423] [<ffffffffc053206b>] ? rpcauth_lookupcred+0x8b/0xd0 [sunrpc]
>> [39162.361455] [<ffffffffc05b0bc0>] nfs_permission+0x1c0/0x1e0 [nfs]
>> [39162.361483] [<ffffffff8120a4e0>] ? walk_component+0xe0/0x480
>> [39162.361510] [<ffffffff812081d7>] __inode_permission+0x77/0xc0
>> [39162.361537] [<ffffffff81208238>] inode_permission+0x18/0x50
>> [39162.361563] [<ffffffff8120aae5>] link_path_walk+0x265/0x550
>> [39162.361590] [<ffffffff81209c20>] ? path_init+0x1f0/0x3c0
>> [39162.361616] [<ffffffff8120aecc>] path_lookupat+0x7c/0x110
>> [39162.361642] [<ffffffff8120d839>] filename_lookup+0xa9/0x180
>> [39162.361670] [<ffffffff811de962>] ? kmem_cache_alloc_trace+0x1d2/0x200
>> [39162.361698] [<ffffffff81361cc7>] ? aa_alloc_task_context+0x27/0x40
>> [39162.361725] [<ffffffff811de71f>] ? kmem_cache_alloc+0x18f/0x200
>> [39162.361752] [<ffffffff8120d426>] ? getname_flags+0x56/0x1f0
>> [39162.361779] [<ffffffff8120d9ea>] user_path_at_empty+0x3a/0x50
>> [39162.361807] [<ffffffff811fbc94>] SyS_access+0xb4/0x220
>> [39162.361834] [<ffffffff8180aaf2>] entry_SYSCALL_64_fastpath+0x16/0x75
>> [39162.361861] Code: ca 48 81 fa 80 50 e4 81 0f 84 78 fe ff ff 48 8b
>> 8a c8 00 00 00 48 39 ce 75 e4 3b 82 d4 00 00 00 0f 84 82 fe ff ff 48
>> 89 f2 eb d6 <0f> 0b 0f 0b e8 66 e8 fd ff 49 8b 44 24 30 48 89 c2 f7 d0
>> 48 c1
>> [39162.362062] RIP [<ffffffff8109d171>] commit_creds+0x201/0x240
>> [39162.362091] RSP <ffff880231383688>
>> [39162.362406] ---[ end trace 0848da7e7fa7bb80 ]---
>>
>> Server characteristics:
>>
>> Supermicro X8DTT-H
>> Dual Xeon E5530 CPU with 24GB total ram (16 cores seen).
>> Disks are 4x1TB SATA (WD1002 black)
>> I had no problems with the same load on Proxmox 3.4 and openvz containers.
>>
>> If you need more information, just ask.
>>
>> Paul
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel at pve.proxmox.com
>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel at pve.proxmox.com
>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



More information about the pve-devel mailing list