Improving bug visibility

Dear all!

I have Ubuntu 16.04 LTS installed.
No special software here. No PPAs. Only MATE desktop with LTSP-server components.

Sometimes my system does not have network after boot:

$ ifconfig 
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:79 errors:0 dropped:0 overruns:0 frame:0
          TX packets:79 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1 
          RX bytes:5335 (5.3 KB)  TX bytes:5335 (5.3 KB)

I have already found corresponding bug report 1487679 about break ordering cycle in systemd.
My symptoms are the same as in it:

$  dmesg | grep break
    [    4.634456] systemd[1]: sockets.target: Job sockets.target/start deleted to break ordering cycle starting with basic.target/start
    [    4.634893] systemd[1]: acpid.path: Job acpid.path/start deleted to break ordering cycle starting with paths.target/start
    [    4.635273] systemd[1]: NetworkManager.service: Job NetworkManager.service/start deleted to break ordering cycle starting with NetworkManager-wait-online.service/start

What should I do to force developers to fix this bug? Or to increase its importance?
Asking on AskUbuntu will not help, I think.
27 users are affected (plus 7 and 15 with duplicates - bug 1465196 and bug 1582986 ).

As far I can understand Ubuntu is enterprise-grade operating system, does not it?
It should provide 99.999999% uptime.

With best regards,
Norbert.

Couple things here,

  1. Comment #4 of LP:1487679 on 2016-04-07 is important - that’s the Triage statement. It specifies exactly why that particular bug is an nbd-specific bug. That particular bug was fixed (nbd shows Fix-released).

    Later on, somebody came along and hijacked the bug report and polluted it with a lot of non-nbd stuff. They changed the summary, adding the word CRITICAL without explaining exactly why it meets the guidelines for it. Those are big “stay away” flags for developers.

  2. Comment #1 of LP:1465196 is the Triage statement. It’s an a open-iscsi bug (not a dupe of #1487679 after all)…then the bug gets hijacked with a lot of useless “me too” and “what’s the status” comments and the unhelpful/incorrect mark as a dupe.

  3. LP:1582986 is a dumping ground of maybe-related, maybe-not systemd journal outputs.

    Learning from the two previous bug reports, we already know that systemd outputs are not the full story, and we also know culprits to look for: Packages not in the default install that use sysvinit scripts with aggressive early network requirements. Each one of those discovered is a separate bug.

    So this one is a big mess, seems not worth salvaging, and should simply be closed with a comment for each reporter on what to look for before re-filing a useful bug report.

  1. You CANNOT ‘force’ a developer to do anything at all. They are free. Most are volunteers.
    You CAN increase the likelihood of getting a bug fixed by doing the following:

    • Learn the life cycle of a bug report.
    • Learn how to Triage a bug report. Bug triage is usually a community task, not a developer task.
    • Help police the bug reports to keep them on-topic toward Triage, assigned to the correct package, weeding the support requests into Answers, etc.
    • Remind many LTS users to test if their bug still exists in development release of Ubuntu. Many bugs are fixed but not backported (there’s a policy for this)
    • Patiently teach users about their systems, about the bug process, and how to do the detective work to reach Triaged.

    In other words, Bug Squad.

  2. Remember that developers who fix bugs can work most efficiently with well-reported, well-triaged bug reports without a lot of irrelevant comments, complaints, SHOUTING, threats to leave Ubuntu, etc. Developers have plenty of bug reports to choose from, and tend to gravitate toward the well-written, properly-triaged reports. If you want a bug fixed, make it one of those.

6 Likes

Thank you for your detailed answer and links.

I’m launchpad member since 2010-03-29 with around 4k karma.
I have reported/commented-on about 557 bugs: 61 of my reported bugs were fixed, 10 are triaged, 73 are confirmed, 124 are new.

I’ll try to increase "Triage"ability of bugs which I find and/or confirm.
Including one from this post.

2 Likes

It’s really frustrating when this happens to bugs. It sounds like this case is typical: a general symptom gets reported as a bug, and everyone with the same general symptom “me toos”. But developers need one bug per root cause. So when a general symptom represents an entire class of bugs, the single pile-on bug becomes useless for developers.

Worse, users think that because many people are affected, “the bug” should surely have a high priority and be fixed. They get increasingly frustrated when the bug is seemingly ignored by developers because they don’t understand that the bug has become unactionable. This is made worse because any message by developers trying to explain this gets hidden in the noise. Sometimes a developer will choose the first reporter’s root cause, fix that root cause and mark the bug fixed. Then other users with different root causes get riled up because they think the developer is wrong and is ignoring their issue.

This general challenge is something that I have ideas on how to address, but unfortunately my ideas involve quite a bit of work, it’s too far down my list and I don’t think I’ll ever get to it.

I think Stack Exchange’s “question on hold” system handles this kind of quality matter well. We should recognise this kind of “piled on, multiple root causes” status in a bug. Launchpad could then put a big unmissable flag at the top of the bug description and close the bug to comments except to triagers and developers. The bug could be marked “Won’t Fix” to make it clear to users that no progress can be made. Users should still be able to mark themselves as affected by the bug, but the emphasis should be on getting them to file separate bugs for separate root causes. If they’re not sure about the root cause, they should be encouraged to file their own full bug reports (as long as they really are full bug reports and not mere “something went wrong” reports) since it’s easier for developers to mark them as duplicates later than to try and separate out dozens of root causes in a single bug.

3 Likes

OK, I understand that creating meta-bugs makes a lot of noise and does not help to triage them.

I found and reported bug myself about network-manager, systemd and offline system - it is bug 1727687.
I forgot to add it to the original question here.

This problem has random nature, may be race condition or something similar. I do not know when it will happens next time. I installed all upgrades to my 16.04.3 LTS affected system.
Is it related to bug 1487679? Or it’s unique (non duplicate)?

What should I do to help developers and testers to triage and fix my bug?
I have already attached logs, did apport-collect.
Which simple steps should I do next?
I really want to get it fixed.

Today my system booted offline again.
With the same symptoms. Where should I find help?
Only sudo service network-manager restart helped.

I can’t completely understand what should I do to eliminate this bug.

With all your great kindly support I can’t triage this bug.

Do you have a thread open in http://ubuntuforums.org? It has the best format for one-on-one help of this sort.

How forum with homemade experts can help me to triage the bug?
I can’t understand. I’m not sure that ubuntu developers, bug supervisors are registered on the forum.
If you are sure that this will help, I will try.

IRC will be the last chance I think.

Unfortunately some bugs just cannot be triaged, because nobody can figure out what is going on enough to be able to provide steps to reproduce it. You could be doing everything right as a bug reporter but still cannot get it to a triaged state.

Long time user, I’m very active in the Juju community, I was recently pulled into the desktop community by @didrocks as part of the Communitheme project.

It’s obvious the people are very willing, tons of passionate people around here but the system is broke.

Example: a disaster waiting to happen

The day Dell releases a new firmware version for one of their Ubuntu-supported laptops, fwupdate in Xenial will break tens of thousands of Dell Ubuntu laptops. It hasn’t yet because Dell hasn’t pushed a firmware update since the bug was introduced, so the only people affected are people with new laptops or who just upgraded. This is again an LTS. This is a clear and on-topic bug report. We know what the issue is, we have a patch ready, it has been released for Artful and Bionic, but stil nothing about Xenial.

The day I figured out I wasn’t the only one affected i contacted the desktop team over IRC, explained them the bug and asked them to temporarily stop fwupdate until this issue was fixed. Nothing happened. I know it’s possible, the issue with the Lenovo laptops shows that it’s possible…

The system does not work!

If it’s that hard to get a bug fixed that will brick tens of thousands of laptops in one of the next months, then the system is broken. What should I do? Start emailing other people directly? If this is your answer then it just confirms that the system does not work. You can only get stuff fixed if you know people personally. And what about the thousands of other bugs that go unanswered, that stay in the new state…? The hundreds of PR’s without any comment from a maintainer?

This is great for the people who are in the bubble, but most people are not, and most people have exact the same experience as me and @Norbert.

I’m very passionate about open source, about Canonical and Ubuntu, but truthfully, I am incredibly happy that Microsoft is entering this open-source game because they are doing this whole community thing a lot better.

With all due respect to the people. The people are awesome, but the system is broken.

1 Like

So what exactly will fix it…?

Reminder: Let’s focus on identifying systemic (not systematic) problems and upon constructive solutions.

Ground rules: In this kind of discussion, it’s important for everyone to be mature. This thread must be a safe place to share information and to discuss possible solutions.

  • Don’t focus on the details or language of a post. Focus on the intended meaning.
  • Don’t throw stones, don’t use provocative language, and have a thick skin (don’t react) when others do.
  • Remember that the goal is improvement, not being right.

Lasting improvements come from consensus.

3 Likes

The latter is not true. The patch was added to the Xenial queue on December 6. Next step will be that an SRU team member uploads to xenial-proposed and submits a call for verfication. Even if that sometimes takes longer time than you would wish, also Xenial will be updated in the end if someone properly verifies that the upload fixes the bug.

Situation repeated yesterday and today. I purged network-manager and set up eth0 as DHCP in /etc/network/interfaces.

As others have since pointed out, it was in the queue for SRU review, and that review has since happened.

I think that perhaps the real problem here is that the process to land fixes (in this case: SRUs) is unclear to those unfamiliar with it. The wider community cannot tell the difference between progress and no progress from a bug alone.

Separately, it looks like it took over two weeks for the upload to get reviewed by the SRU team. Your statement “…will brick tens of thousands of laptops in one of the next months” sounds like it needed a faster response than that. Ubuntu developers generally know how to flag something in the SRU queue for urgent attention. Unfortunately by it’s nature, “urgent attention” is a scarce resource and needs to be used sparingly, so a “get this bug urgent attention” button available to the general public doesn’t work. It’ll just get spammed by non-urgent cases, grinding progress to a halt. Instead we generally and informally stake our individual reputations: if someone who has earned respect in our community asks for urgent attention with a suitable justification, then we usually provide that, while ignoring the thousands of requests for urgent attention from everyone else. The problem is in identifying genuine cases from previously unknown people to the top, while filtering out all the noise. You say “you can only get stuff fixed if you know people personally”, and I realise that this statement probably doesn’t help. But I’m not sure of any other way to solve this problem. Do you have any suggestions?

Did an Ubuntu developer actually respond, and had you communicated the urgency of the matter? It would help if you linked to the IRC conversation. Otherwise nobody else can understand what happened, and we have no way of figuring out what we need to fix. You can find public logs at http://irclogs.ubuntu.com/.

As an aside, I’m not sure we are in a position to stop fwupdate. It’s rather different technically than temporarily pulling an installer ISO image download. Perhaps we should make sure we can, but that would be a different conversation.

I have fresh question about bug visibility.

Like many other users, I’m waiting for the next LTS release - 18.04 Bionic Beaver.
I have spent some time to find and report 35 new problems to launchpad, but many of them are marked as NEW.
I understand that there are at least 410 other new bugs in Bionic.

How should active users help developers confirm and sort these errors?
What should be the speed and effectiveness of our activities for obtaining the best results?

Start here: https://wiki.ubuntu.com/BugSquad/

For example, https://bugs.launchpad.net/ubuntu/+source/libreoffice/+bug/1698572 needs verification on Bionic and, if still relevant there, testing against upstream and filing upstream as appropriate.

https://bugs.launchpad.net/ubuntu/+source/guayadeque/+bug/1592966 likely won’t be fixed in Ubuntu because the package needs a maintainer in Debian. Triaged/Wishlist or even Won’t Fix/Wishlist is the right status, but it also needs to be explained in the bug why this is the case.

https://bugs.launchpad.net/ubuntu/+source/caja-extensions/+bug/1694002 again needs testing against Bionic, possibly against upstream depending on the result of that, and possibly an upstream bug filing.

And so on. I’m afraid most of these bugs don’t appear to be actionable in Ubuntu directly, but rather need upstream work.

Anybody can do this kind of triaging work - it doesn’t need any special privilege.

4 Likes

Linked these bugs to upstream. I always try to do this way.
Thank you!

1 Like

Great! So now that you understand what needed doing with those, do you have any suggestions for systemic improvement of our bug handling process? For the general case, rather than your specific ones?

1 Like

It seems to me that when fixing bugs deveperers should not only prevent data loss and realize security (in case of critical errors), but also should take into account user comfort.

It is clear that some errors must be corrected at the upstream, which is often not a problem for Ubuntu.

Usually I think this: if a problem/error/improvement is detected, then it should be sent to the LaunchPad. However, the reaction rate is very low.

Now there are a lot of disparate channels for interaction:

  • AskUbuntu is obviously a user, it’s usually not accepted to discuss bugs;
  • ubuntu mailing lists are ineffective (but LKML is the best, of course);
  • this site is still young, we have few Canonical and/or Ubuntu developers here (such as Brian Murray);
  • for the effective use of the IRC one need to sit constantly here to be heard (there is an impression that few people are engaged in reading the history).

Therefore, I think that the main problem is the improvement of the interaction between packages’ maintainers, representatives of Canonical and/or Debian and all interested people through the LaunchPad.