Architecture Update: Migration of Our AI Infrastructure to NVIDIA

04/10/2026 | Platform

Our initial approach for the AI platform was clearly defined: building a high-performance GPU cluster based on AMD. The goal was to create technological redundancy and establish a true alternative to the status quo. However, after six months in production, we realized that the performance and stability requirements conflicted with the chosen hardware architecture.

Technical Pain Points: Why We Are Changing Our Architecture

During intensive stress tests with our reference model gpt-oss-120, the AMD W7900 configuration hit hard limits. Ultimately, a combination of four critical factors led to our decision to change the architecture.

A major initial hurdle was the lack of model compatibility, as many models simply could not run on the AMD architecture without requiring disproportionate upfront effort. Additionally, there were significant performance limitations: at just 64 concurrent requests, the performance metrics stagnated drastically, meaning neither the required concurrency nor the necessary total throughput (peak) for high workloads could be achieved. Furthermore, the stability of the entire software and hardware stack on our platform did not meet the quality standards we aim to guarantee for our users. Finally, the lack of vendor support during critical phases, combined with an unclear technological roadmap regarding PCIe-supported inference, made reliable and future-proof planning impossible for us.

The Solution: Migration to NVIDIA RTX PRO 6000

As a logical consequence of this development, we decided on a rigorous technology shift. The benchmarks under load clearly demonstrate the decisive performance advantage for our customers:

Metric	AMD W7900	NVIDIA RTX PRO 6000	Delta	Factor
Max. Concurrency	64 Requests	1.024 Requests	+1.500%	x16
Total Throughput (Peak)	422 tokens/s	8.317 tokens/s	+1.870%	x19

Focus on OCR: Specialized Visioning with dots/mocr

We are simultaneously utilizing the gained computing capacity and stability to realign our model portfolio. The user demand for efficiently processing unstructured data such as receipts, scans, forms, or handwritten notes has increased massively.

This prompted us to replace mistral-small3.2 with dots/mocr. dots/mocr is a highly specialized visioning model. Due to the enormous need for precise OCR (Optical Character Recognition) tasks, we consciously chose to prioritize a true specialist over an all-rounder. Thanks to the new NVIDIA infrastructure, we can now integrate the extraction of structured data from image sources into the platform with extreme reliability and significantly lower latency.

Conclusion

In the end, the lack of day-to-day reliability, the associated losses in productivity, and the missing technological roadmap compelled us to decisively leave the current ecosystem. By migrating to NVIDIA, we ensure the scalability and stability that are essential for professional AI workflows. However, our original motivation remains unchanged, which is why we will continue to monitor the market.

Back

Architecture Update: Migration of Our AI Infrastructure to NVIDIA

Technical Pain Points: Why We Are Changing Our Architecture

The Solution: Migration to NVIDIA RTX PRO 6000

Focus on OCR: Specialized Visioning with dots/mocr

Conclusion

Contact

Important links:

Cloud made in Germany