AI update: Why is a separate embedding model necessary?

Created by Danny Sternol | 01/27/2026 | News

For us at Flying Circus, digital sovereignty is more than just a buzzword. Specifically, it means that I must be able to switch providers without losing my data or having to rebuild my systems.

The challenge: consistent embeddings As part of our migration from Ollama to pure llama-cpp (for greater granularity in operation), we had to ensure that the generated embeddings remained bit-compatible. The risk: If the vector output changes even minimally due to the change in engine, the new search queries will no longer match the stored data. The result would be a complete re-indexing of the vector database at the customer's site. A no-go for seamless operations.

Our findings: The devil is in the "dense modules"

We investigated which factors change the vectors. Our analysis revealed something surprising:

Inference engine & hardware: Negligible.
Quantisation: Plays a minor role, but is manageable.
Model architecture: The decisive factor.

Standard conversions of embedding models to GGUF format often ignore specific layers (such as the dense modules after the transformer stack) that are essential for the accurate calculation of the vector. If these are missing, the output is mathematically "broken" compared to the original.

The solution: our own derivative

To ensure absolute compatibility, we do not use the standard convert, but have created a specially adapted derivative of Google/Gemma3 that explicitly retains these modules. This proves that true digital sovereignty also works in the age of AI – if you look closely and understand the tools.

The model (EmbeddingGemma-300m with Dense Modules) is now available to the community on Hugging Face: https://huggingface.co/flyingcircusio/embeddinggemma-300m-GGUF-with-dense-modules

For those who want to delve deeper into the details: we have documented the complete analysis report, including all steps for reproduction, on GitHub.https://github.com/flyingcircusio/skvaider/blob/move-to-llama-cpp-inference/doc/stability-comparison/README.md

Back

Visualisation of the change from ollama to llama.cpp and how the devil is in the details

The image was generated with Banana Pro Ki.

AI update: Why is a separate embedding model necessary?

Our findings: The devil is in the "dense modules"

The solution: our own derivative

Contact

Important links:

Cloud made in Germany