Architecture Update: Migration of Our AI Infrastructure to NVIDIA
Our initial approach for the AI platform was clearly defined: building a high-performance GPU cluster based on AMD. The goal was to create technological redundancy and establish a true alternative to the status quo. However, after six months in production, we realized that the performance and stability requirements conflicted with the chosen hardware architecture.
Technical Pain Points: Why We Are Changing Our Architecture
During intensive stress tests with our reference model gpt-oss-120, the AMD W7900 configuration hit hard limits. Ultimately, a combination of four critical factors led to our decision to change the architecture.
A major initial hurdle was the lack of model compatibility, as many models simply could not run on the AMD architecture without requiring disproportionate upfront effort. Additionally, there were significant performance limitations: at just 64 concurrent requests, the performance metrics stagnated drastically, meaning neither the required concurrency nor the necessary total throughput (peak) for high workloads could be achieved. Furthermore, the stability of the entire software and hardware stack on our platform did not meet the quality standards we aim to guarantee for our users. Finally, the lack of vendor support during critical phases, combined with an unclear technological roadmap regarding PCIe-supported inference, made reliable and future-proof planning impossible for us.
The Solution: Migration to NVIDIA RTX PRO 6000
As a logical consequence of this development, we decided on a rigorous technology shift. The benchmarks under load clearly demonstrate the decisive performance advantage for our customers:
| Metric | AMD W7900 | NVIDIA RTX PRO 6000 | Delta | Factor |
|---|---|---|---|---|
| Max. Concurrency | 64 Requests | 1.024 Requests | +1.500% | x16 |
| Total Throughput (Peak) | 422 tokens/s | 8.317 tokens/s | +1.870% | x19 |
Focus on OCR: Specialized Visioning with dots/mocr
We are simultaneously utilizing the gained computing capacity and stability to realign our model portfolio. The user demand for efficiently processing unstructured data such as receipts, scans, forms, or handwritten notes has increased massively.
This prompted us to replace mistral-small3.2 with dots/mocr. dots/mocr is a highly specialized visioning model. Due to the enormous need for precise OCR (Optical Character Recognition) tasks, we consciously chose to prioritize a true specialist over an all-rounder. Thanks to the new NVIDIA infrastructure, we can now integrate the extraction of structured data from image sources into the platform with extreme reliability and significantly lower latency.
Conclusion
In the end, the lack of day-to-day reliability, the associated losses in productivity, and the missing technological roadmap compelled us to decisively leave the current ecosystem. By migrating to NVIDIA, we ensure the scalability and stability that are essential for professional AI workflows. However, our original motivation remains unchanged, which is why we will continue to monitor the market.