Hello all!
I figure I’m doing something wrong (or incomplete) but I’m not certain what’s going on. Figured I might as well ask.
I have the LXD snap setup and when I did the init, I used an existing zpool (mirrored). So the default pool itself is ZFS on partitions on a couple of drives.
The error is occurring with my BOSH CPI. It was working, but I was having issues with the image being copied every time (never was cached). Finally realized (I think) it was due to the fact I specified the size of the root disk. When I stopped doing that, the image copy was instantaneous (and I never saw the QCOW2 file being copied). This is not immediately pertinent to the question, but it could be that it was hiding this issue (by slowing VM creation down).
Posting the larger logs from the BOSH create going on. In this particular stage, all the packages are being compiled, and 5 VMs are being spun up. 4 of them are fine and the package gets compiled. The 5th one (uncertain which one since it’s not clear) never gets created – after the VM gets created (the state is stopped) an ephemeral disk of 10GiB is created. 4 of those disks gets created, but the 5th one boshdev_vol-e-62116b9e-92a6-41ad-4059-65d083399ed7
fails with the Failed to activate volume: Failed to locate zvol
error:
Task 9 | 17:51:19 | Preparing deployment: Preparing deployment (00:00:02)
Task 9 | 17:51:21 | Preparing deployment: Rendering templates (00:00:01)
Task 9 | 17:51:22 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 9 | 17:51:22 | Compiling packages: golang-1-linux/2721afc7ec762ad6364f4910407eb0dda5541f1288c9fccae9c75d33d7e07aff
Task 9 | 17:51:22 | Compiling packages: postgres-yq-4/dc26c48bc847408082bf497930f6ba6775935da6c143b2016bdfe3b8ba5b554c
Task 9 | 17:51:22 | Compiling packages: postgres-13/4534645edb722f4f365c67e374063bdd59c210af9eba5f961b9ad55ab1db84d1
Task 9 | 17:51:22 | Compiling packages: postgres-15/3fe5c61e6b6779ce8a2c127047e2572c200f765afce0df76510aebce15aff61b
Task 9 | 17:51:22 | Compiling packages: postgres-16/eefdc0f0bb8341f74113e36ba4604b46ca67cf54a82e57cb901470be38ba46e1
Task 9 | 17:51:55 | Compiling packages: golang-1-linux/2721afc7ec762ad6364f4910407eb0dda5541f1288c9fccae9c75d33d7e07aff (00:00:33)
L Error: CPI error 'Bosh::Clouds::CloudError' with message 'Create ephemeral disk: Failed to activate volume: Failed to locate zvol for "lxd-storage/custom/boshdev_vol-e-62116b9e-92a6-41ad-4059-65d083399ed7": context deadline exceeded' in 'create_vm' CPI method (CPI request ID: 'cpi-901775')
Task 9 | 17:52:16 | Compiling packages: postgres-yq-4/dc26c48bc847408082bf497930f6ba6775935da6c143b2016bdfe3b8ba5b554c (00:00:54)
Task 9 | 17:56:29 | Compiling packages: postgres-13/4534645edb722f4f365c67e374063bdd59c210af9eba5f961b9ad55ab1db84d1 (00:05:07)
Task 9 | 17:56:42 | Compiling packages: postgres-15/3fe5c61e6b6779ce8a2c127047e2572c200f765afce0df76510aebce15aff61b (00:05:20)
Task 9 | 17:56:53 | Compiling packages: postgres-16/eefdc0f0bb8341f74113e36ba4604b46ca67cf54a82e57cb901470be38ba46e1 (00:05:31)
Task 9 | 17:56:56 | Error: CPI error 'Bosh::Clouds::CloudError' with message 'Create ephemeral disk: Failed to activate volume: Failed to locate zvol for "lxd-storage/custom/boshdev_vol-e-62116b9e-92a6-41ad-4059-65d083399ed7": context deadline exceeded' in 'create_vm' CPI method (CPI request ID: 'cpi-901775')
What’s confusing here is that it works for 4 of 5 disks. Fortunately, the Go code is pretty much just LXD API calls:
func (c CPI) createDisk(size int, name string) error {
storageVolumeRequest := api.StorageVolumesPost{
Name: name,
Type: "custom",
ContentType: "block",
StorageVolumePut: api.StorageVolumePut{
Config: map[string]string{
"size": fmt.Sprintf("%dMiB", size),
},
},
}
return c.client.CreateStoragePoolVolume(c.config.Server.StoragePool, storageVolumeRequest)
}
Where, in the problem case, size=10240
and name=vol-e-62116b9e-92a6-41ad-4059-65d083399ed7
(e=ephemeral and that is a uuid at the end). … just like the other four.
I can’t find any errors in the Zpool:
$ zpool status
pool: lxd-storage
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
lxd-storage ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdb1 ONLINE 0 0 0
sdc1 ONLINE 0 0 0
errors: No known data errors
When I was watching load, it was not very high (2, I think?).
I found nothing in dmesg
. Uncertain where the ZFS logs are (if there are anything specific).
Any thoughts on what I should be looking at? I am assuming this isn’t really LXD.
Thanks!
-Rob