I currently work on running ML algorithms on resource-constrained environments, and I’ve been working on an architectural proposal to bring AI-powered natural language search to the Linux desktop natively. I’d love to get some feedback from the Ubuntu community on the technical design. I already posted this project as a proposal for Google Summer of Code 2026 under GNOME Foundation.
The Problem
Currently, desktop search on Ubuntu (via the GNOME overview) relies heavily on exact keyword matching and metadata. If you search for “Python script where I annotated the dataset”, standard indexers will fail to retrieve a file named ai_lecture.md.
Proprietary OS vendors are solving this with intrusive features (like Windows Recall) that rely on continuous screen recording or cloud-tethered LLMs, which are massive privacy nightmares.
The Proposed Solution
I am proposing a 100% offline, privacy-first Semantic Extractor pipeline that integrates directly into Ubuntu’s existing GNOME stack. It locally converts text, markdown, and source code into vector embeddings, allowing users to query their personal files using natural language natively from the Ubuntu search bar.
To ensure zero battery drain and no UI stuttering, this relies heavily on Edge AI optimization techniques rather than standard heavy desktop AI.
Proposed Architecture
Since Ubuntu ships with GNOME’s localsearch (formerly Tracker Miners) by default, the goal is to build an extension rather than a standalone app, avoiding duplicate file-watching overhead:
The File Watcher (localsearch): We completely rely on the existing localsearch daemon. This ensures we respect Ubuntu’s existing power management (pausing indexing on battery power) and sandbox rules.
The Semantic Extractor: We introduce a new extractor plugin. For plain text, it uses recursive character splitting. For source code, it uses tree-sitter bindings to chunk files by semantic AST boundaries (functions/classes).
The ML Inference Engine (ort): The extracted chunks are passed to an INT8 quantized all-MiniLM-L6-v2 model (~22MB) running purely on the CPU via ONNX Runtime. By treating the desktop CPU like an edge device, inference takes milliseconds without waking up the GPU.
Vector Storage (sqlite-vec & tinysparql): Since GNOME’s tinysparql is backed by SQLite, we can utilize the highly optimized, C-based sqlite-vec extension. Embeddings are stored locally and relationally linked to the existing file URNs.
I want to ensure this architecture aligns with Ubuntu’s performance and packaging standards. I would love the community’s thoughts on:
Storage Layer: Should sqlite-vec be injected directly into the tinysparql SQLite connection, or should it run as a parallel database that simply maps to localsearch URNs?
Packaging: For a system-level search extension like this, what are the implications for packaging it as a Snap vs. a native .deb component?
I’d appreciate any technical feedback or edge cases I should consider regarding battery life, memory management, or integration with the Ubuntu desktop experience.
The proposal is mighty complicated for me. I have one of the most cluttered desktops imaginable .. and what you are suggesting is .. show me files with these context clues .. I don’t know when or where .. and I also deal with the chaos of ML algorithms as they dump files into ~/Downloads. For example “in which earlier Tasks did I discuss this?” I studied context vectors pre 2000. But today a simple solution is to leverage Recoll .. provided that you have some memory of what the file contained. It is a powerful and sweet running engine. So if you can remember that it was a Markdown file then narrow down by the opener .. [ext:md] .. then follow with some context clues. I even convinced an AI Agent to run Recoll in its Ubuntu sandbox so that the search through Tasks can be done AI server side. I engineered a protocol for this “peer to peer” exchange. This becomes necessary when AI agents lose memory and repeat past work. So back to the desktop, harness Recoll. Then there is another tip. Install Albert and leverage the Files extension. [Ctrl+Shift+Space] then .. Files(space) and you are there. Your Files universe. The Ubuntu tools are sitting there to use in different ways. Use existing tools rather than building new ones. Apply Dijkstra principles. Finally to see the power of Recoll hover your cursor over the Recoll query form and a cheatsheet pops up. See if your search pattern fits.
But wont the idea of a 100% offline system be compromised. I wouldn’t want my data given to a 3rd party AI Agent over to a server running for hundreds of other users.
Anyway, your idea is great and i would start off the using existing tools.
Thank you for your response.
My immediate focus for the MVP is actually on parsing and embedding source code files first (using tree-sitter). Since programming languages and AST semantics rely heavily on standard English keywords and syntax, this specific model works perfectly for the initial development phase.
Once the code ingestion pipeline is stable, the ONNX Runtime architecture makes it trivial to swap in a drop-in multilingual replacement (like paraphrase-multilingual-MiniLM-L12-v2 or multilingual-e5-small) for general text and markdown. This will only add a negligible bump in memory (~40MB). I will update the roadmap to ensure a multilingual model becomes the default for the production release to serve the global community.
I agree that Agents grab ideas. I have seen that in my usage. So it is a balance of risk/reward. How to retain sovereignty of knowledge. That is why I pursue aims of AI safety, security, sovereignty. I have dubbed that research AIVectors (R). I suggest that you research Proton Foundation and use of Proton Lumo offline for an Edge User curated knowledge base. Lumo lives in its Edge User secure cocoon including offline .. useful for education context in current debate about social media influence. Use local acquired custom knowledge.
I completely agree with your take on AI agents and the balance of risk versus reward. The idea of “Sovereignty of Knowledge” is exactly what this project is about. I’m definitely going to read up on your AIVectors research, it sounds exactly like the kind of privacy-first approach we need right now.
Your point about the Proton Foundation and Lumo is also great. It really proves that users want the benefits of AI to help sift through their data, but they need it to live in a secure, offline cocoon.
You also made a really strong point about using existing tools rather than building entirely from scratch. Honestly, taking an incredibly powerful engine like Recoll and finding a way to implement it natively into the GNOME/Ubuntu ecosystem, perhaps enriching it with a local embedding layer for semantic queries, is a direction I am actively considering. The goal isn’t to reinvent the wheel, but just to give the native desktop that capability on the Edge.
Thanks again for pointing me toward Lumo and your research.
Consider instructing your AIAgent to follow “your” rules of engagement, not theirs. retain sovereignty of Tasks. For e xample try this in your next prompt. “[Follow these instructions (attached) but I require you as AIAgent to comply fully with Shannon’s Law of Information Theory. You will not render at the AI server but instead send instructions to render at Edge User environment. This requires you to teach me how to drive my local environment. Furthermore I require you to download this entire discourse in CherryTree container, extension *.ctd.]”.
Truly, avoid the need for “integrating” and think of “orchestrating”. Ubuntu has a rich sympthony orchestra (tools as “players”) and think instead of writing the music scores. I class myself now as “AI Playwright”. Orchestrate in terms of Shakespearean Acts and Plays. Different tools will be in stageprompts. Today I am looking at Jupyter Notebooks. Tomorrow something else. So write orchestration scripts. And that includes local UI orchestration. Try managing Blender UI for example. So “Gnome” is a red herring. Think above Gnome.
Thanks for sharing your workflow! To keep it brief, there is a fundamental difference in our two approaches:
1. Orchestration (Your approach): You are using AI agents as a “playwright” to actively run scripts, manipulate UIs (like Blender), and manage tools. This is fantastic for power-user personal automation. 2. Integration (My project): I am building a native, invisible OS feature. It must be a compiled, low-level binary (Rust/C++) that runs silently in the background via D-Bus with a 40MB memory limit, without the user ever knowing it’s there.
Both are valid ways to use local AI, but I am specifically building a lightweight system daemon, not an active workflow agent.
Swinging back to your diagram, then, I would ditch SQLite and look at Milvus/Zillix to analyse context. Rust Crate: The official Rust SDK for Zilliz Cloud and Milvus operations is available via the zilliz crate on crates.io.
Rust Architecture: Zilliz Cloud provides low-latency vector retrieval with a purpose-built Rust engine architecture.
Capabilities: It provides full lifecycle support, from embedding pipelines to high-performance vector retrieval with Rust-based engines. [1, 2, 3, 4, 5]
Thanks for the suggestion! You are totally right that Milvus and Zilliz are absolute powerhouses in the vector DB space. If I were building a centralized AI server or an enterprise RAG pipeline, that would be my first choice.
However, for a native desktop OS daemon, introducing Milvus/Zilliz completely breaks the architectural constraints of the project:
1. The Memory Footprint Constraint
A desktop background indexer must be invisible to the user. My hard constraint is keeping the daemon’s total memory footprint under 40–80 MB. Milvus (even local standalone instances) typically requires hundreds of megabytes to gigabytes of RAM just to idle, plus heavy runtime overhead. We cannot steal that much RAM from the user’s web browser or games just for background search.
2. Embedded vs. Client-Server sqlite-vec runs strictly in-process as an embedded database. It requires zero user configuration, zero background server processes, and zero network sockets. Milvus requires a separate server architecture, which is too heavy to ship as a default desktop feature.
3. The 100% Offline Guarantee
You mentioned Zilliz Cloud—this inherently breaks the strict “privacy-first, zero-network” rule of the project. The OS should never send personal file vectors to a cloud service. Everything must run entirely on the edge CPU.
4. The OS Standard
Ubuntu and GNOME’s existing indexer (localsearch/tinysparql) is already built entirely on SQLite. By using sqlite-vec, I can inject vector capabilities directly into the OS’s existing database architecture rather than forcing users to install a massive parallel database.
Milvus is definitely the king of heavy-duty vector retrieval, but for an invisible, battery-friendly edge daemon, an embedded SQLite solution is the only way to fit the tight OS constraints!
For Semantic Vector Search, ditching the database is mathematically impossible due to how Machine Learning models actually read text. Here is why the database is the most critical part of the architecture:
1. You cannot compute ML vectors on the fly
Semantic search works by passing text through a neural network (like MiniLM) which converts that text into a 384-dimensional mathematical vector (an array of 384 floating-point numbers). Running a machine learning model over gigabytes of files takes time and heavy CPU cycles. If we ditched the database and just scanned files on the fly (like a concordance tool does), the user’s laptop would freeze and the CPU would peg to 100% every time they typed a single letter into the search bar.
2. The Database is the “Pre-Computed Memory”
To make search instant (under 40 milliseconds), we must pre-compute all those 384-dimensional vectors in the background while the computer is idle, and save them. The database (sqlite-vec) is simply the storage locker for those pre-computed math equations.
3. Approximate Nearest Neighbor (ANN) Math
Concordance tools scan text linearly (O(N) time). Vector databases use specialized spatial math (like HNSW - Hierarchical Navigable Small World graphs) to instantly find vectors that are “close” to each other in 384-dimensional space (O(log N) time). You need a dedicated vector database to perform that specific geometry at lightning speed.
AntConc is amazing for deep, active textual analysis! But for millisecond-latency, AI-driven concept retrieval in a desktop UI, a pre-computed vector database is the only way to beat the physics of CPU processing limits.
I was not writing lightly when I mentioned Dijkstra and Albert (lightning fast). In fact as I wrote in one topic (and it was regarded as OT) it reminds me of “klattu barada nikto”.
You could prototype by writing an Albert python extension.
P.S. I’m assumig that at some stage you come up for air and re-connect to “recharge” Lumo.
You are completely right, when a tool like Albert is configured perfectly.
I want to genuinely thank you for the suggestion to prototype this as an Albert Python extension. That is an incredibly smart way to build the Minimum Viable Product (MVP).
Instead of wrestling with GNOME’s D-Bus APIs or KDE’s KRunner bindings right out of the gate, I can write a lightweight Python Albert extension to test the actual hard parts of this project: the tree-sitter file chunking, the ONNX INT8 model latency, and the sqlite-vec retrieval speed. If I can get the semantic search feeling “lightning fast” inside Albert’s UI, I will know the mathematical pipeline is solid.
Once the pipeline is proven in Albert, translating that core logic into a compiled Rust/C++ background daemon for the native desktop environments (for users who stick to default setups) will be much safer and easier.
Thanks for pushing me in this direction, this kind of practical stepping-stone advice is exactly why I brought this to the community!
I want to thank fractalx, too, for reminding me to pivot to keep thinking about scenarios when the lights go out. Sans Internet connection. I have swung back to look at this again. And I draw attention to basic knowledge collection tools such as CherryTree which can be a container of vectors. Rich text and code boxes (including Rust). Indeed I devised a prototype shuttle protocol to post CherryTree to two AI Agents and requested that they return payload instead of scattered downloads to deploy at desktop. As a CherryTree XML container. That’s it. One upload, one download. Multiple CherryTrees (what is the plural - orchard) can be desktop managed via eXistDB. As XML containers. and searched by Recoll command line. Offline. So knowledge can be held in desktop (Ubuntu - dive, dive, dive) until resurfacing to be recharged. At desktop Albert allows docker to be launched. Also now looking at Qdrant. Vectors.
It is great to see we are so closely aligned on the “offline-first” philosophy! Designing for scenarios “when the lights go out” is exactly why I’m so passionate about keeping AI entirely on the edge. If the internet goes down, our local knowledge bases should become more valuable, not inaccessible.
The CherryTree “shuttle protocol” you engineered is a genuinely brilliant way to maintain a clean, containerized knowledge base. Managing an “orchard” of XML containers via eXistDB and keeping it all searchable offline is a seriously robust power-user setup. It completely prevents the scattered ~/Downloads chaos we were talking about earlier!
I also see you are now looking into Qdrant for vectors! Qdrant is an incredibly powerful, high-performance vector search engine, and it makes total sense for your workflow since you are already using Albert to spin up Docker containers.
For my specific GNOME/LocalSearch project, I have to stick with sqlite-vec simply because we can’t require the average Ubuntu desktop user to run Docker daemons in the background—it has to be a totally invisible, embedded C library. But the underlying mathematical concepts (embedding chunks of text and doing nearest-neighbor searches) are exactly the same as what you will be exploring in Qdrant!