MAAS failing to commission HP ProLiant DL385 G7 (AMD Opteron 6172)

Hello all,
We have some HP ProLiant DL385 G7 (AMD Opteron 6172) servers which MAAS is failing to commission. We have also tried other servers on the same MAAS deployment that work with any issue.

We try to commission a “New” machine, it boots from PXE and we get the following messages:

[  90.598545] x86/PAT: impi-locate:2905 map pfn expected mapping type uncached-minus for [mem 0xbde1c000-0xbde1cfff], got write-back
[  90.599073] x86/PAT: impi-locate:2905 map pfn expected mapping type uncached-minus for [mem 0xbde1d000-0xbde1dfff], got write-back
[  90.599278] x86/PAT: impi-locate:2905 map pfn expected mapping type uncached-minus for [mem 0xbde1c000-0xbde1cfff], got write-back
[  90.599814] x86/PAT: impi-locate:2905 map pfn expected mapping type uncached-minus for [mem 0xbde1d000-0xbde1dfff], got write-back
Starting STop ureadahead data collection...
Stopping Read required files in advance...
[ OK ] Started Stop ureadahead data collection.
[  150.880073] INFO: rcu_sched detected stalls on CPUs/tasks:
[  150.880197]  0-...!: (0 ticks this GP) idle=ee4/0/0 softirq=2489/2489 fqs=0
[  150.880303]  12-...!: (0 ticks this GP) idle=0d8/0/0 softirq=2429/2429 fqs=0
[  150.880408]  21-...!: (51 GPs behind) idle=f10/0/0 softirq=1692/1694 fqs=0
[  150.880510]  23-...!: (19 GPs behind) idle=28c/0/0 softirq=1761/1761 fqs=0
[  150.880610]  (detected by 2, t=15002 jiffies, g=2532, c=2531, q=2143)
[  150.884754]  rcu_sched kthread starved for 15003 jiffies! g2532 c2531 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0
[ OK ] Stopped Read required files in advance.
[ OK ] Stopped Flush Journal to Persistent Storage.
       Stopping Flush Journal to Persistent Storage...
[ OK ] Stopped Journal Service.
       Starting Journal Service....
[ OK ] Started Journal Service.
       Starting Flush Journal to Persistent Storage...
[ OK ] Started Flush Journal to Persistent Storage.
[  363.217837]  INFO: task kworker/7:1:185 blocked for more than 120 seconds.
[  363.217950]        Not tainted 4.15.0-46-generic #49-Ubuntu
[  363.218046]  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  363.218375]  INFO: task ipmi-config:2937 blocked for more than 120 seconds.
[  363.218476]        Not tainted 4.15.0-46-generic #49-Ubuntu
[  363.218571]  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  484.217837]  INFO: task kworker/7:1:185 blocked for more than 120 seconds.
[  484.217950]        Not tainted 4.15.0-46-generic #49-Ubuntu
[  484.218046]  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  484.218375]  INFO: task ipmi-config:2937 blocked for more than 120 seconds.
[  484.218476]        Not tainted 4.15.0-46-generic #49-Ubuntu
[  484.218571]  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

These 2 INFO messages continue to show until commissioning fails at 30min timeout.

In approximately 1 out of 10 tries the server will be commissioned succesfully but then the same behaviour will occur in the deployment process.

One thing I tried was in Settings -> General -> Global Kernel Parameters, I placed the “nosmp” boot parameter which makes Commissioning succeed every time but the Machine then has only 1 CPU.

Any ideas?
Are these servers not supported by MAAS?

Thanks in advance,
Yanos

Hi Yanos,

Could you please file a bug report at https://bugs.launchpad.net/maas and attach the full log? This may be a kernel issue which we will need to get the kernel team involved with to fix.

Thanks,

Lee