Defaulting to verify the image integrity before installing on desktop?

Triaging ubiquity bug reports on launchpad, one of the most common reason for failing installations is that the image/media used to do the installation is invalid/corrupted. It shows in the log with such error
‘SQUASHFS error: zlib decompression failed, data probably corrupt’

There is usually no user friendly explanation of what the problem is in those cases, which means users just download the iso/write it/boot the media and follow the steps and at some point get a random ubiquity error.
One recent example of such report
https://bugs.launchpad.net/ubuntu/+source/ubiquity/+bug/1853769
Since we don’t explain the issue nor recommend a solution it’s often not obvious to user what is going on and what they should be doing.

In that context, what would people think of making the default choice on the desktop liveCD to be ‘check & install’ and make the start of the installer conditional to not having error on the media?

I’ve tried to get some data for the discussion and tested the ‘check disk’ option on some configurations with an old/slow usb stick and a recent enough cheap usb3 one

  • a 10 years old latitude with an i5 cpu (bios)
  • an old/slow inspiron11 (uefi)
  • a recent XPS13 (uefi)

The check takes between 1 minute and a bit less than 3 minutes, depending of the configuration/media. (I didn’t measure the installation time then but it’s significantly longer on any of the machines)

And as an extra data point, I recently booted a fedora 31 ISO to test a bug and the liveCD menu default to check the media first there.

I think that the cost is reasonable and that it would avoid an awkward experience that some of the new Ubuntu users are getting.

What do others think? Should we default to check the media before booting the ISO? (And if so do we need to ensure the menu still provide a way to skip the test (we should at least for automatic installation)?)

Cheers,

6 Likes

As somebody who spends time in AskUbuntu fielding installer-failure queries, I think this is certainly worth a try.

The 20.04 Desktop .iso is going to be installed by new users for many years, so trying this sooner rather than later seems wise. For new users, 1-3 extra minutes seems a very acceptable cost.

Some folks will definitely want an option to skip the check. I use the same image to install on multiple machines over the course of a release, and those extra minutes would add up unnecessarily. I don’t mind if the option is somewhat hidden, as long as it’s discoverable.

6 Likes

I for sure think it’s worth it.

I’d like to think I know what I’m doing, but still have the occasional write failure to my thumb drives (iso.qa testing) that is detected by my automatic “check disc for defects” check before running a QA test.

I close (invalid) a few bug reports due to squashfs (they’re easy to spot), and likewise suggest it heaps on askubu, less so forums & other support sites.

I suspect many users will not like it, so I’d suggest making it clear on screen it’s checking media for defects (so they know what their box is doing) and we’ll hopefully get fewer “slower to install” reviews.

I’d also suggest (if possible) the option to abort/skip it be available for those in a ‘hurry’ who don’t mind risk, plus people like me who may check once before I QA-test, but then use that checked drive on 3-8 boxes (so I don’t have the same media being re-checked an extra 2-7 times).

Additional add: This will be a pain if skip isn’t available in the final days close to release time… in the last ~week prior to 19.10’s release, the number of QA-installs due to re-spins was not fun… And I sure don’t want it to take extra 60-180 secs for secondary-installs on subsequent QA testcases

4 Likes

Having an option to skip on the boot screen, combined with an option to cancel the check during the verification (in case someone changes their mind, perhaps with a confirmation prompt) seems like it would suffice.

As @seb128 noted, I also recently tried installing Fedora 31 (server) in the course of diagnosing an Ubuntu-specific bug with Podman, and it featured all these things - automatic check and install by default, option to skip at boot menu, and option to cancel during verification.

3 Likes

I agree that this is a good idea :slight_smile:

  • default: check the integrity
  • option to skip the check
1 Like

So now that this is out, I have two concerns:

  1. I’m sure it’s been chosen for its speed but collisions are most certainly a thing with md5. Could we use a different algorithm?
  2. The “Check disc for defects” (which itself hasn’t made sense for a long time considering most people use USBs and not discs) is still an option in the bootloader menu. I think part of the reason this seemed confusing is because of this fact.
  3. Though there have been two discussions on this subject, there’s been no announcement of the change which seems to be a rather huge omission. On that subject, it seems like the one request that was made multiple times over— the need to skip— is totally undocumented. It does seem like “s” is the key, though, looking at the code.
3 Likes

Looks like it certainly is ‘S’.

3 Likes

It might be good to change that message to emphasize the importance of at least running it once. Maybe something like:

“Checking installation media integrity. Please don’t skip unless you know what you’re doing, but press S if you do.”

3 Likes

I think this is a great idea. I’ve used Linux for years and never realized how important the check-media (grub entry) is. I’ve seen @wxl emphasizing it, and began doing it myself. I’ve discovered some corruptions. (Now I wonder how many problems I’ve had in the past due to this, and never knew the importance of checking that.).

I would recommend writing something to the install media (if the media is writable). This way the check would only have to be done once, the first time new media is detected. (After that a user could use the grub menu option.). I wonder if it might be annoying to have it run every time. But, I guess it’s just a matter of getting used to hitting “skip.”

As @wxl said above, I think an opportunity exists to be more educational about what’s happening. For example “It’s important for this validation process to finish at least once, to ensure the bootable media was created properly.” I think if the verbiage is too terse, geared toward people who already know, new users might think “I don’t see this on other distros. They seem better, and boot faster.” (Initial impressions.).

If it runs every time (if nothing’s written to the media to allow skipping it automatically on subsequent boots), it seems like it should be removed from the grub menu.

However these topics go, I like having it run automatically!

Using elements of both of the above my suggestion is:

Press ‘S’ to skip but please run at least once to ensure the bootable media has been created correctly.

with possibly ‘at least’ being omitted.

2 Likes

I quite like it as-is.

I think it’s phrased well for the non-experts whom it is intended to primarily benefit.

The eye is (properly) drawn toward the friendly “Checking…” and away from the rest of the text, exactly as we want.

It doesn’t say too little. It doesn’t say too much. Just right.

4 Likes

That sounds good. I agree, “at least” is unnecessary.

1 Like

I was thinking about that terminology, and wondered what it looks like if a corruption is discovered. I have today’s Lubuntu daily image on USB. I mangled one file, and booted. It says:

“Check finished: errors found in 1 files! You might encounter errors.”

That msg times out after 5 seconds, and proceeds to the desktop.

IMO, that doesn’t seem appropriate if checking for errors is so important that it’s less optional. If a failure is detected, I think it should not automatically (someone might look away and not even know).

Personally, if it detects an error, I wouldn’t let it proceed at all. But, I imagine that’s too much of a change.

I think it would be useful to say:

“Check finished: errors found in 1 file! Try recreating the installation media. If you choose to proceed, results are unpredictable. Continue? (y/N)”

Again, I wouldn’t even allow proceeding. If errors were found, “errors have been encountered” already. Why allow it to go further? In what case would someone want to install using known corrupt media? (But, I know that might be a big change).

I agree that it’s important to perform the check (not leave it to choice. I’m an example of how I ignored that option for years and probably had problems that were inexplicable.). But, if it is important, I think the result msg should treat it more importantly than it presently does. I’ll let you guys decide how much more importantly.

4 Likes

I agree with you – knowing what I know now. You’d think it should be clear enough. But, from my own experience (disregarding the grub option, and hearing the occassional "did you run media check?), it didn’t seem important to me. I write files all the time. How often do the writes not work? I viewed it that way. I thought someone was being excessively cautious, and there would never be a reason to run that grub option. I’ve never seen a file written that wasn’t correct. The odds seem astronomical.

And yet, in the past year (or 6 months), wxl’s frequently stressing the importance of checking the media finally caused me to start doing it. I’ve found errors. I would have never guessed that that was happening. It never happens with apt-get install. It never happens saving spreadsheets. It never happens restoring my home directory after distro hopping.

So, with that mindset, someone like me would easily hit the “s” key and think “c’mon! that never happens. Someone’s showing off with fancy over-complicated integrity checking. I’m wasting my time waiting for this.”

I’m serious. That’s exactly what I would have done the past 5-10 years (however long that grub option has existed and I ignored it.). I think it’s useful to stress that it really should be allowed to finish once. It’s not fluffy stuff. You know it’s not. But, I think most newbie’ish people would think it’s overkill and not seriously consider that something could be wrong. (Unetbootin completed successfully? Why should I sit here waiting for proof of something I already know?).

I would not underestimate people like me.

2 Likes

I don’t have a problem with what I’ve seen via Lubuntu 20.04 daily. My habit is always to run ‘Check disc for defects’ on first boot after media-write anyway and did so again today purely out of habit.

The S to skip message looks appropriate to me, I didn’t test skipping as I wanted to check media and I liked the progress screen. It takes awhile and my attention goes back to another screen, and next thing I know the system is running the ‘live’.

No report, it completed & then run?? If it’s a positive result fair enough & nice, but I’d like to be told.

This is likely exactly what happened and I missed it !

I repeated the process on the next box & same thing, I didn’t see any result. Third box & I saw it, but it felt shorter than 5 secs!

Maybe a longer period is required, or especially given I’m selecting the “Check disc for defects” option I’d suggest a keystroke be required before booting the ‘live’. I’d like not to have to dmesg |grep squashfs because I missed the message I asked for. If however it requires a key on fail I won’t be worried.

The not requiring me to reboot after a ‘Check disc for defects’ I did appreciate though !

2 Likes

I don’t mind that a success proceeds on its own. I do agree with @az2008 that a fail should prohibit continuing by default. A yes/no continue is reasonable at that point, but the default should be no.

2 Likes

I was thinking about this more. I agree with @ian-weisser that the current text is elegant, not distracting.

What if pressing “s” resulted in the hint of importance (which I feel needs to be seen–by people like me, who would have hit “s” all the time)? The msg could be simple:

It is important to see successful verification of this media at least once. Resume or skip? (_r_esume/_S_kip)

Or, more informative about why it’s important:

Warning: Media corruption is a frequent cause of unexpected problems. Return to verify? Or, continue with skip? (r/S)?

Those msg’s could even be combined to be more informative.

The nice things about that:

  1. The people who would let it run, and don’t need a verbose hint about it’s purpose would continue to see today’s reasonable message.
  2. Only the target audience (people like me in the past) would see the “think about what you’re doing” message.
  3. The power users who know what they’re doing and hit “s” all the time wouldn’t be too burdened because they all they have to do is hit “s” a second time. (If key presses could be buffered, they would habituate “ss” as fast as they can. That would be as convenient as the single “s” they’re accustomed to pressing.

I could be overthinking that. But, I do believe performing verification by default is the right thing to do. The communication is about it doesn’t matter as much. Anyone who “skips” will be more culpable than I was when I merely ignored grub item.

But, I do think the error handling is problematic.

Sometimes I start something and look/walk away. I expect whatever it is to let me know if there’s something I need to know. (The existing error msg makes sense for when it was a grub item. Someone would deliberately verify their media. It was reasonable to think they’d have an interest in the outcome. Now it’s more of a convenience. It should be more convenient about error discovery. It shouldn’t require eyes on it.).

2 Likes

Another thought: It would be a nice touch if the corrupted filenames could be listed (instead of “errors found in 1 file!”).

Earlier I said I wouldn’t let the boot-up continue if errors were found. I know that would a tough sell. But, if the problematic filenames were shown, that help someone know whether to continue. If it’s just an efi file, maybe continue. But, if it’s a driver, a person might want to know that (if their wifi won’t connect).

Maybe that could be an idea for the future.

1 Like

Yesterday I was testing a Lubuntu daily image (20200304) with an old Thinkpad which required numerous boots (trying to figure out a wifi problem). I hit “s” (skip) a few times since I knew the LiveUSB was verified already. I don’t think that key does anything.

I’m booting on an old Dell Latitude now, and hit “s” again. Same result. It seems to ignore the “s” key until the very end, and then displays “canceled” instead of the results it went ahead and gathered as it ignored the key.

If others can recreate that, maybe the wording doesn’t matter? Maybe just get rid of the “skip” option and let it verify each time? (The only thing the “s” option appears to do is show you the results or not.).

1 Like

I’ve had issues with skipping too on Lubuntu (haven’t yet been able to skip, with 4-5 boots of the same thumb drive on multiple days I’ve tried to skip 2-3 times for each of those days meaning multiple ISO fail to skip). I assumed I was too slow with my mind wandering to other things…

Same result pressing “S” or “s” on Lubuntu, though I’ve had Xubuntu daily skip.

1 Like