Ubuntu Server team update - 30 September 2019

Hi everyone, below you will find the updates of the Ubuntu Server team members from the last week. If you are interested in discussing a topic please start a thread in the Server area of this Discourse site.

It has been some busy weeks lately with Ubuntu Eoan Freeze Dates (https://wiki.ubuntu.com/EoanErmine/ReleaseSchedule). I’m listing here some of my past accomplishments since last post.

QEMU memory barrier not enough for sync primitives for ARM64

LP: #1805256 - qemu-img hangs on high core count ARM system (josh asked)

SUMMARY:
We’ve discovered that there was a race condition in QEMU AIO loop during qemu-img executions. I was able to make sure it was a primitive atomicity issue by demonstrating the issue did not happen when mutexes protected the affected variables. Upstream QEMU, Marvel and Huawei engineers are working on it (likely memory alignment/cache lines issue):

Systemd restarts and HA software stack behavior

systemd-networkd:
https://github.com/systemd/systemd/issues/12050
https://github.com/systemd/systemd/pull/12511
https://github.com/ssahani/systemd/commit/b0fa0b4fd5ba

The following 3 bugs:



There are mainly 2 “fixes” for this issue:

  1. keepalived is able to recognize systemd-networkd changes and change cluster status in order to reconfigure managed NICs (keepalived (> 2.0.x)).

  2. systemd-networkd implements a new stanza (KeepConfiguration=) to systemd service unit files in order to fix not only this behavior but all those HA related software that manages secondary IPs and/or aliases to NICs being managed by systemd-networkd.

    • Discussed best way to approach with Christian
    • Fixing systemd-networkd seems more appropriate (1st attempt)
    • Changing keepalived might be more appropriate for SRUs (2nd attempt)

BACKPORTED:

The commits bellow implement support to "keep configuration":

commit 1e498853a39b46155cb89b5c9e74ecb27aaba3ed
Author: Yu Watanabe <watanabe.yu+github@gmail.com>
Date:   Mon Jun 3 01:21:13 2019

    test-network: add tests for KeepConfiguration=

commit c98d78d32abba6aadbe89eece7acf0742f59047c
Author: Yu Watanabe <watanabe.yu+github@gmail.com>
Date:   Mon Jun 3 03:37:25 2019

    man: add documentation about KeepConfiguration

commit db51778f85cb076e9ed1fe7f7e29cc740365c245
Author: Yu Watanabe <watanabe.yu+github@gmail.com>
Date:   Mon Jun 3 00:33:13 2019

    network: make KeepConfiguration=static drop DHCP addresses and routes
    
    Also, KeepConfiguration=dhcp drops static foreign addresses and routes.

commit 95355a281c06c5970b7355c38b066910c3be4958
Author: Yu Watanabe <watanabe.yu+github@gmail.com>
Date:   Mon Jun 3 14:05:26 2019

    network: add KeepConfiguration=dhcp-on-stop
    
    The option prevents to drop lease address on stop.
    By setting this, we can safely restart networkd.

commit 7da377ef16a2112a673247b39041a180b07e973a
Author: Susant Sahani <ssahani@vmware.com>
Date:   Mon Jun 3 00:31:13 2019

    networkd: add support to keep configuration

Provided a PPA and a MR for the Eoan SRU:
https://launchpad.net/~rafaeldtinoco/+archive/ubuntu/lp1815101
https://code.launchpad.net/~rafaeldtinoco/ubuntu/+source/systemd/+git/systemd/+merge/374027
Asked @xnox about his opinion for the same SRU made to Disco and Bionic.

grub-install issues with Eoan installer

LP: #1838525 - installer - LVM setup fails to install grub on virtio storage

SUMMARY Of the problem (or a HUGE TL;DR):

  • Installer depends on “grub-mkdevice --no-floppy -m -” command to get bootable devices ordering.
  • grub-mkdevice was dropped upstream and it is included in grub2 by a quilt patch.
  • grub-mkdevice orders everything that is in /dev/disk/by-id/* excluding, in this order, everything containing “-part”, “dm-” and “md-”.
  • LVM partitions are added to /dev/disk/by-id, but not the entire disk (as the PV is the partition itself).
  • UDEV creates /dev/disk/by-id depending on 60-persistent-storage.rules:

virtio-blk

KERNEL=="vd*[!0-9]", ATTRS{serial}=="?*", ENV{ID_SERIAL}="$attr{serial}", SYMLINK+="disk/by-id/virtio-$env{ID_SERIAL}"
KERNEL=="vd*[0-9]", ATTRS{serial}=="?*", ENV{ID_SERIAL}="$attr{serial}", SYMLINK+="disk/by-id/virtio-$env{ID_SERIAL}-part%n"

So, LVM puts ID_SERIAL in LVM partitions, they get added to /dev/disk/by-id and installer is lost when trying to order it, as LVM partition gets into 1st position of choice, instead of the full disk (for hd0, hd1, … grub setup).

There are 3 alternatives to fix this and I have chosen the one I believe has the smaller potential for any type of regression. Comment #30 describes what caused the regression and these 3 alternatives:

  1. To revert this change for current release, since this rule was added to “make navigation a bit easier using PV UUIDs”, as the commit says. We would worry about installer changes in the next release.
  2. Another possibility would be to change the logic inside “grub-mkdevicemap.c: make_device_map()->grub_util_iterate_devices()” to ignore all symlinks from /dev/disk/by-id/ containing lvm-pv-uuid-*. We would not have to worry about this in the next release if using debian-installer.
  3. Another option would be to change grub-installer package/logic. Unfortunately, a few days before the full freeze, I don’t think messing with the installer would be a good option to avoid regressions (potential regression item would grow in significance).

I’m choosing (2) because ubuntu foundations already faced a similar situation, when grub-mkdevicemap.c file was removed from grub2 code and they re-added it by using a quilt patch, assuming it was the easiest and better to maintain. I’m doing something similar, patching the patch that creates grub-mkdevicemap.c file again to ignore /dev/disk/by-id/lvm-pv-uuid-* files (like it already does for other symlinks, actually).

With that, I have created the following merge request: https://code.launchpad.net/~rafaeldtinoco/ubuntu/+source/grub2/+git/grub2/+merge/373792

And the following PPA:
https://launchpad.net/~rafaeldtinoco/+archive/ubuntu/lp1838525

MAAS deploy on servers with attached Pendrive

(1) LP: 1833618 - sg3-utils - failing to deploy Ubuntu Disco

Disco - https://code.launchpad.net/~rafaeldtinoco/ubuntu/+source/sg3-utils/+git/sg3-utils/+merge/373439
Bionic - https://code.launchpad.net/~rafaeldtinoco/ubuntu/+source/sg3-utils/+git/sg3-utils/+merge/373440
PPA - https://launchpad.net/~rafaeldtinoco/+archive/ubuntu/lp1833618

sg3-utils-udev package installed udev rules file that was changing ID_SERIAL attributes from connected USB block devices. This happened because USB memory sticks usually are SPC-only SCSI devices, not supporting VPDs 0x80 and 0x83. That caused MAAS to misbehave in getting ID_SERIAL of the connected pendrive.

Simplestreams SRU to Xenial

After analyzing the following bugs:

• LP: #1611987 - simplestreams - [SRU] glance-simplestreams-sync charm doesn't support keystone v3 
• LP: #1686437 - simplestreams - can't sync images for keystone v3
• LP: #1719879 - simplestreams - [SRU] swift client needs to use v1 auth prior to ocata
• LP: #1728982 - simplestreams - [SRU] openstack mirror with keystone v3 always imports new images 

For the keystone v3 fixes revno 454 is the minimum we need SRU’d back to xenial. Bionic 0.1.0~bzr460-0ubuntu1 has these changes. These two merges are the pertinent changes:

1. Keystone v3 Support - https://is.gd/wq7r6g
2. Fix KSv3 Bugs - https://is.gd/OOEo3G

0.1.0~bzr426-0ubuntu1.3 was uploaded. It contained a fix for LP: #1686437 (can’t sync images for keystone v3). That change contained a regression (LP: #1728982 - openstack mirror with keystone v3 always imports new images - and LP: #1719879 - swift client needs to use v1 auth prior to ocata) and was marked as verification needed.

Work was done to fix that regression. A merge proposal is made for Xenial at https://is.gd/7ixQbO. We have a PPA at https://is.gd/Boda8J that contains a fix for the regression caused by 0.1.0~bzr426-0ubuntu1.3 and others.

Feedback from that PPA was asked and was given by Ed, Chris, Felipe and Billy. Billy found an issue about squashfs and that was fixed into 0.1.0~bzr426-0ubuntu1.4~ppa0, also uploaded to PPA at https://is.gd/Boda8J.

SRU template is needed in all referenced bugs:

• 428-do-not-require-that-hypervisor_config-be-present.patch (LP: #1578622)
• 433-glance-ignore-inactive-images.patch (LP: #1583276)
• 436-glance-fix-race-conditions.patch (LP: #1584938)
• 450-453-454-keystone-v3-support.patch (LP: #1686437, #1728982, #1719879)
• 455-nova-lxd-support-squashfs-images.patch (LP: #1686086)

And version 0.1.0~bzr426-0ubuntu1.4 is good for a SRU and already tested by multiple people.