Searching Ubuntu Knowledge bases with LLMs

In Exposing Ubuntu Forums content within Discourse - #22 by dragonfly41 @dragonfly41 suggested adding a custom LLM taught with some Ubuntu support data as a way to keep old content around here on discourse.

I have actually been noodling with this exact thing for a while, and it’s not that hard to setup. Mine is running on an old desktop with an old NVIDIA GeForce RTX 2080 Ti attached.

The main topic was related to Ubuntu Forums, but it doesn’t have to be limited to that corpus of data. There are plenty of other resources the LLM could be loaded with.

I tried this. Non scientifically, using DeepSeek-R1 14b model, a snapshot of some (thousands of) threads from Ubuntu Forums, nothing more. This is just a plaything and not something I’d draw conclusions from, perhaps other than “it kinda works”.

I fed a large number (but not all) of the conversations from Ubuntu Forums into a local vector database. I asked the most recent question tagged MythTV, MythTV Backend install problems from AskUbuntu, and the most recent question with a bounty GPU stops being detected after a while
as a couple of example “real world” questions, worded the way normal people do.

They may not be a perfect answers, but it’s certainly not telling the user to rm -rf the world, which is a good start! Also, most answers on Discourse aren’t perfect either! :smiley:

Ask Ubuntu Sample Question One

I copied the text out of the image from the original question, and lightly edited it to make it a markdown code block.

Mythtv backend install problems I am trying to install mythtv on Proxmox as Ubuntu 22.04 container. I had fallowed instructions on web and video on [youtube][1] I ended with command: sudo apt-get install mysql-server mythtv-backend mythtv-database But after that I got only one windows asking about mythweb, so when it was finished I added user typing sudo usermod -a -G mythtv mythtv I had logout and wanted to configure mysql with sudo vi /etc/mysql/my.cnf but the file looks empty like this
# The MySQL database server configuration file.
## You can copy this to one of:
#- "/etc/mysql/my-cnf" to set global options,
# - "~/-my-cnf" to set user-specific options.
#
# One can use all long options that the program supports.
# Run program with --help to get a list of available options and with --print-defaults to see which it would actually understand and use.
#
#
For explanations see
# http://dev.mysql.com/doc/mysql/en/server-system-variables.html
#
*IMPORTANT: Additional settings that can override those from this file!
The files must end with
'-cnf', otherwise they'll be ignored.
#
lincludedir /etc/mysql/conf.d/
lincludedir /etc/mysql/mysql.conf.d/  

I tried without doing anything to that file but I can not login typing
ssh -X mythtv@192.168.1.212 /usr/bin/mythtv-setup
(even changing user to sudo, root, changing passwords). What should I do? I am not so good in it.

Answer from DeepSeek

Note that there’s a sub-section in here showing the “thinking” as DeepSeek-R1 is a modern “reasoning” model.

DeepSeek-R1 Response
Thought for 17 seconds > Okay, let me try to figure out what the user is dealing with here. They're trying to install MythTV on a Proxmox container running Ubuntu 22.04. From their description, they followed some online instructions and a YouTube video but ended up stuck. > > They ran the command `sudo apt-get install mysql-server mythtv-backend mythtv-database` which seems correct for setting up MythTV's dependencies. After installation, they were prompted about MythWeb, but only once. Then, after adding the user to the mythtv group using `usermod`, they tried to configure MySQL by editing `/etc/mysql/my.cnf`, which was empty except for some default comments. > > Hmm, I know that in many Linux distributions, especially newer ones, the main MySQL configuration file might not be directly in `/etc/mysql/my.cnf` but instead included via other files. The user's file shows that it includes directories like `conf.d` and `mysql.conf.d`, so maybe the actual configurations are elsewhere. > > Next, when they tried to run MythTV setup using SSH, they couldn't log in even after checking passwords and permissions. This makes me think there might be issues with MySQL configuration or user rights related to the database. > > Looking at the provided context, sources mention problems with MythTV installations, especially around MySQL setups. For example, one source talks about needing to set up storage groups manually and database access issues when upgrading from older versions [source_id: 2389820]. Another mentions a hanging MySQL issue in Mythbuntu [source_id: 2359915], which could relate to service not starting properly. > > So, possible steps the user should take: > > 1. **Check MySQL Service Status**: Ensure that MySQL is running correctly. If it's not, start it and enable it on boot. This might be why they can't log in—maybe MySQL isn't running at all. > > 2. **Review Configuration Directories**: Instead of editing `my.cnf`, check the included directories for existing configurations. They might already have setup files there that need adjustment. > > 3. **Secure MySQL Installation**: Run the security script to set a root password and remove anonymous users, which is a standard step after installing MySQL. > > 4. **Verify Database Setup**: Make sure that the MythTV user has the right permissions in the database. Sometimes, during installation, the necessary databases and users aren't created properly. > > 5. **Check Logs for Errors**: Look into `/var/log/mysql/error.log` to find any issues that might be preventing proper setup or service startup. > > 6. **Reconfigure if Necessary**: If MySQL isn't behaving correctly, perhaps a reinstallation is needed after cleaning up old configurations. > > I think the main issue here is likely with MySQL not being properly configured or started, which is preventing MythTV from setting up correctly. The user might also need to ensure that their user has the correct permissions both in the system and in MySQL.

To resolve your installation issues with MythTV on a Proxmox Ubuntu 22.04 container, follow these steps:

Step-by-Step Solution:

  1. Ensure MySQL Service is Running:

    • Check if the MySQL service is active:
      sudo systemctl status mysql
      
      If it’s not running, start it with:
      sudo systemctl start mysql
      
      Enable it to run on boot:
      sudo systemctl enable mysql
      
  2. Review MySQL Configuration:

    • The main configuration file might be in /etc/mysql/conf.d/ or /etc/mysql/mysql.conf.d/. Open the default config with:
      sudo nano /etc/mysql/my.cnf
      
      If it’s empty, check included files by looking into those directories.
  3. Secure MySQL Installation:

    • Run the MySQL security script to set up root password and remove anonymous users:
      sudo mysql_secure_installation
      
      Follow the prompts to secure your MySQL installation.
  4. Verify Database Access:

    • Try connecting to MySQL as root to ensure it’s working:
      sudo mysql -u root -p
      
      If you can’t log in, check for typos in your password or firewall settings blocking port 3306.
  5. Create MythTV Database and User:

    • Log into MySQL as root:
      sudo mysql -u root -p
      
      Create the necessary database and user:
      CREATE DATABASE mythtv;
      GRANT ALL PRIVILEGES ON mythtv.* TO 'mythtv'@'localhost' IDENTIFIED BY 'yourpassword';
      FLUSH PRIVILEGES;
      
  6. Check Permissions for MythTV User:

    • Ensure the mythtv system user has access to MySQL. You might need to adjust permissions or create a dedicated database user if necessary.
  7. Install and Configure mythtv-setup:

    • After securing MySQL, run the setup script as the mythtv user:
      sudo su - mythtv
      /usr/bin/mythtv-setup
      
      Follow the on-screen instructions to configure MythTV.
  8. Check Logs for Errors:

    • If issues persist, check MySQL logs:
      sudo tail -f /var/log/mysql/error.log
      
  9. Reconfigure or Reinstall if Necessary:

    • If MySQL still doesn’t start or configuration changes don’t take effect, consider reinstalling MySQL after purging old configs:
      sudo apt-get purge mysql*
      sudo apt-get install mysql-server
      

By following these steps, you should be able to resolve the MySQL configuration issues and successfully set up MythTV. If you encounter specific errors during any step, provide those details for further assistance.

Cited threads

DeepSeek-R1 cited the following documents/threads that it knew about locally:

Ask Ubuntu Sample Question Two

For this I chose the most recent unanswered and bountied question I could find on AskUbuntu - GPU stops being detected after a while
, asked just a week ago.

GPU stops being detected after a while

I’m running an RTX 4070Ti on Ubuntu 24.04 with the 6.8.0-52 kernel and 550 nvidia drivers. I connect remotely to the computer through ssh and it doesn’t have a monitor connected (if that’s relevant). My gpu keeps failing. When I run nvidia-smi I get:

Unable to determine the device handle for GPU0000:01:00.0: Unknown Error

After I restart the computer it works normally but after a while it just stops working again, has anyone encountered anything similar and what can I do to fix this? Here are some relevant commands that I ran and their output:

$ lsmod | grep nvidia

nvidia_drm 122880 2
nvidia_modeset 1355776 3 nvidia_drm
nvidia 54386688 30 nvidia_modeset
video 73728 2 amdgpu,nvidia_modeset

$dmesg | grep -i nvidia

[ 4.613640] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input8
[ 4.613876] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input9
[ 4.614315] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input10
[ 4.615153] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input11
[ 4.655236] nvidia: loading out-of-tree module taints kernel.
[ 4.655242] nvidia: module license ‘NVIDIA’ taints kernel.
[ 4.655245] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 4.655246] nvidia: module license taints kernel.
[ 5.623795] nvidia-nvlink: Nvlink Core is being initialized, major device number 510
[ 5.624974] nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[ 5.675412] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 550.144.03 Mon Dec 30 17:44:08 UTC 2024
[ 5.684118] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 550.144.03 Mon Dec 30 17:10:10 UTC 2024
[ 5.686076] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[ 6.440172] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
[ 6.452495] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[ 6.470561] nvidia-uvm: Loaded the UVM driver, major device number 508.
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.
[316109.267216] nvidia-uvm: Unloaded the UVM driver.

$ dmesg | grep -i pci


[89212.109319] NVRM: GPU at PCI:0000:01:00: GPU-fe5c340e-4c73-2c72-9782-5bd0fbdd56cf

[89212.109333] NVRM: Xid (PCI:0000:01:00): 79, pid=‘’, name=, GPU has fallen off the bus.

I can provide more info if needed, thank you.

Answer from DeepSeek

DeepSeek-R1 Response
Thought for 19 seconds > Okay, I'm trying to figure out why my NVIDIA GPU is causing issues on my Linux system. The user mentioned that the GPU keeps falling off the PCIe bus, which sounds serious. Let me think through what could be causing this. > > First, I know that NVIDIA GPUs are known for their performance but can sometimes have compatibility issues with certain Linux kernels or drivers. The user is using the 5.18 kernel, which might not be fully compatible with the latest NVIDIA driver they're running, version 550.144.03. > > Looking at the dmesg output, I see warnings about the GPU falling off the PCIe bus, specifically Xid (PCI:0000:01:00): 79. From what I remember, a PCIe Xid 79 usually indicates an error related to the power management or configuration of the device. This could be due to hardware issues, but it's more likely a driver problem since the GPU worked initially. > > The user is running kernel version 5.18 and NVIDIA proprietary drivers. Maybe there's a known issue with that kernel version and those specific drivers. I should check if newer kernels or updated drivers are available. Sometimes, updates fix these kinds of issues. > > Another thing to consider is thermal throttling. If the GPU is overheating, it might be getting disabled by the system as a safety measure. The user mentioned they're in Singapore, which can get hot. Maybe their cooling system isn't sufficient, causing the GPU to overheat under load. > > I also noticed that the NVIDIA modules are tainting the kernel because they're proprietary. This is normal but could contribute to instability if there's a bug in the driver. Perhaps switching to open-source drivers like Mesa would be more stable, though performance might suffer. > > The user runs several services: Apache, Redis, and MySQL. These can generate a lot of I/O and CPU usage. If the system resources are maxed out, it could cause instability, leading to GPU issues. Optimizing these services or upgrading hardware might help. > > I should also think about hardware compatibility. The specific GPU model (RTX 3080) on an ASUS ROG Strix X570-E motherboard with PCIe 4.0—maybe there's a compatibility issue with how the motherboard handles PCIe slots, especially under high load or certain conditions. > > Another angle is checking for BIOS updates. Sometimes, updating the BIOS can resolve hardware-level issues that affect PCIe devices. The user should verify if their ASUS motherboard has an updated BIOS available. > > Lastly, it might be helpful to run stress tests while monitoring GPU and system temperatures. Tools like `nvidia-smi` or `msi_afterburner` could provide insights into how the GPU behaves under load, helping pinpoint whether thermal issues are causing the problem. > > In summary, possible solutions include updating the kernel, switching to open-source drivers, checking for hardware compatibility and BIOS updates, optimizing system services, and ensuring proper cooling. If none of these work, it might be time to consider replacing the GPU or motherboard.
Based on the information provided, it seems that your NVIDIA GPU is encountering issues related to PCIe bus communication and power management. Here are some steps you can take to address this problem:

1. Update the Kernel

  • Ensure your system is running a stable version of the kernel. Sometimes, specific kernel versions may have compatibility issues with certain hardware or drivers.
  • You can check for available updates using your distribution’s package manager.

2. Check NVIDIA Driver Compatibility

  • The warnings in the dmesg output indicate that there might be issues with the NVIDIA driver you’re using (version 550.144.03). Consider updating to a newer version of the NVIDIA driver or trying a different version altogether.
  • You can download the latest drivers from NVIDIA’s official website.

3. Check for Hardware Issues

  • The error message [89212.109333] NVRM: Xid (PCI:0000:01:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus. suggests a hardware-related issue, possibly related to PCIe communication or power management.
  • Ensure that your GPU is properly seated in the PCIe slot and that the PCIe slot itself is functioning correctly.

4. Check for Thermal Issues

  • High temperatures can cause the GPU to be disabled by the system as a safety measure. If you’re experiencing high ambient temperatures, ensure proper cooling of your system.
  • Use tools like nvidia-smi or msi_afterburner to monitor GPU temperatures and usage.

5. Update BIOS/UEFI

  • Sometimes, issues with PCIe devices can be resolved by updating the BIOS/UEFI firmware on your motherboard. Check for any available updates from your motherboard manufacturer (ASUS in this case).

6. Check for Kernel Tainting

  • The warnings about “kernel tainted” are related to the use of proprietary NVIDIA drivers, which can cause instability or conflicts with certain kernel versions.
  • Consider switching to open-source drivers like nouveau if you’re experiencing frequent issues.

7. Disable Unnecessary Services and Applications

  • Having multiple services running (e.g., Apache, Redis, MySQL) might be contributing to high system resource usage, potentially leading to instability. Temporarily disabling some of these services could help determine if they are causing the issue.

8. Check for System Resource Contention

  • High I/O or CPU usage can cause system instability. Use tools like htop, top, or iostat to monitor system resource usage and identify any potential bottlenecks.

9. Run a Stress Test

  • Run stress tests on your system to see if the GPU issues persist under load. Tools like stress-ng can be used for this purpose.

10. Consider Hardware Replacement

  • If none of the above steps resolve the issue, it might be worth considering replacing the faulty GPU or checking for potential hardware compatibility issues with your motherboard or PCIe slot.

By systematically addressing these areas, you should be able to identify and resolve the root cause of the GPU issues on your system.

Cited threads

DeepSeek-R1 cited the following documents/threads that it knew about locally:

Disclaimers

  • This test only used a tiny subset of world Ubuntu knowledge, indeed a small subset of the forum threads
  • Only one GPU was used, and one user
  • Only one model (DeepSeek-R1 14b) was tested
  • Only two questions were asked
  • I did not give the model access to the web, to search for more information, but could have

Summary

It may indeed be possible/desirable to feed all the content on the old site into an LLM and make that available as a service. It needs wider testing, and much more source data.

  • The standard disclaimers about LLM accuracy and resource consumption should be considered
  • It requires a decent amount of compute for more than one user
  • The dataset will need to be kept up to date with new knowledge
  • If desired, who’s doing the work, and who will be paying for this
  • Would it be preferable or as effective to just add a generic AI bot that we assume has been trained on this data
2 Likes

I have two questions about this:

  1. how much of a drain on resources would this be for UD if it were integrated into the search function?
  2. how many users really need access to old UF data from within UD?

As I mentioned in the thread you reference, it took a long time before UF had built up a knowledge base of threads, posts, keywords.

Support and Help here is relatively new; why not give it time to also build up?

I am not against the idea but for me personally if/when I need to reference old UF data I use a combination of Google searches and Wayback Machine.

1 Like
  1. how much of a drain on resources would this be for UD if it were integrated into the search function?

What do you mean by “drain on resources”? I can’t imagine this would run on the same instance as the discourse software itself. If that’s what you’re thinking?

  1. how many users really need access to old UF data from within UD?

My thread wasn’t really about UF, which is why I started another one, and didn’t mention the Ubuntu Forums in the title, nor put this in the transition category. This was more of a broad question of “What if we put some Ubuntu knowledge in an LLM and tried it?”. It doesn’t have to be Ubuntu Forums knowledge, or indeed, only Ubuntu Forums knowledge. The wiki could go in there, irc conversations, anything really.

Support and Help here is relatively new; why not give it time to also build up?

Why not both.

I am not against the idea but for me personally if/when I need to reference old UF data I use a combination of Google searches and Wayback Machine.

This isn’t about you :slight_smile:

There’s been a significant drop in people using search engines recently. Many people turn to an LLM (like ChatGPT) first, before searching using a conventional (legacy) search engine.

Also, this isn’t about looking at old conversations and using your own brain to come up with an answer (which is what you’re talking about). This is about feeding an LLM a vast amount of knowledge and letting it come up with a potential solution.

2 Likes

@popey That’s an interesting exercise. As you said, the answers are not always perfect, for instance it recommended the creation of a database called mythtv - mythtv databases are always called “mythconverg”. The rest of the answer looks pretty good, so the AI has done very well there, the recommended steps are exactly what would help the OP to fix their system.
Tony.

1 Like

Good to see similar lines of thought. I have experienced several “sunset” forums and lots of knowledge is lost. Now equally it might be too much to expect a single point of “teaching” LLM’s. Knowledge falls into hands of a few “teachers”. And as observed it takes up quite a bit of resources. I have started to research “distributed learning” where say a subset of this community chip in an hour or so of their compute time to “teach” an LLM. That is just one line of thought but it requires some organisation. So I swing back to thinking about creating a repository of curated “prompts” (rather like FAQ’s) to throw at not just one LLM … custom LLM or ChatGPT … but the LLM of choice of each user. Some subscribe to ChatGPT and some subscribe to others. Others might rely on forum search only. So we might have a front end Albert which has a simple syntax:
prompt [LLM][subject][prompt][urls]

In this evolving thinking, we might curate FAQ’s (snippets of prompts) to place in a vector database such as Milvus.

1 Like

Yes! This could be the way to allow our deprecated services to die. Feed the content to a LLM and deprecate the service. The content would still be available through the LLM but we wouldn’t have to run the service anymore just to serve the content.

Another thought. A weakness is not engineering correctly drafted prompt to throw at any service such as ChatGPT.

Therefore I drafted and posed this prompt to one LLM I use.

Review only this discussion thread [Searching Ubuntu Knowledge bases with LLMs] and create a prompt, using best prompt engineering paractices, which would better represent the original poster’s (OP) question or argument. Also suggest a better thread title which might be generalised in a repository of quotes as a collection of “frequently asked prompts” FAP (like FAQ’s).