Filter - Success Stories

GPU power on demand: CampusCloud optimized with Proxmox VE

When students and researchers need GPU computing power simultaneously, traditional approaches quickly reach their limits: GPU computing power is expensive, demand fluctuates, and fixed assignments often lock up unused capacity. The University of Applied Sciences St. Pölten (USTP) has developed a solution that makes local AI workloads predictable and significantly accelerates deployment.

At USTP's Department of Computer Science and Security, AI is not only covered in the course materials but is also taught in a practical, hands-on manner. In both teaching and research, staff and students use GPU-supported systems for Large Language Models (LLMs), AI assistants, hate speech detection, classic machine learning algorithms, and reinforcement learning, such as for computer vision applications with AWS DeepRacer vehicles. A large portion of the GPU hardware is used within the AI RealLabor project, where AI-driven methods are tested and further developed in real-world applications.

GPU-accelerated workloads have been a cornerstone of USTP’s Data Science and AI programs for nearly eight years, building on an even longer history of use in IT security. However, as demand grew, the challenge shifted from providing raw power to ensuring equitable access. Because GPU capacity is significantly more expensive and scarce than CPU resources, efficient orchestration across user groups became the decisive factor for success.

GPU Isolation: Maximizing Utilization

Initially, multiple user groups worked directly on bare-metal servers – a model that quickly reached its limits in practice. The lack of resource isolation and mutual interference threatened not only data integrity, but also the stability of the entire system. Moving to virtualization was the logical step to enhance security.

While Proxmox VE and PCIe passthrough achieved clean user isolation, a new problem emerged: GPUs were rigidly bound to individual VMs. In a dynamic university environment, this led to costly idle time – valuable resources remained blocked even when not in use.

The breakthrough came in late 2024 with NVIDIA Time-Sliced vGPUs. Thanks to official Proxmox support, VRAM can now be split flexibly via profiles. The result: hardware utilization has significantly improved, as computing power is distributed based on actual demand rather than being locked in idle VMs. This created a production-ready environment that scales seamlessly with fluctuating workloads.

From Manual to Self-Service Provisioning

The second milestone is the central orchestration via the in-house platform CampusCloud. Using the Proxmox API, the system manages the entire cluster: staff and students can independently provision web hosting environments (PHP/NodeJS) or powerful VMs with GPU acceleration.

The speed improvements are remarkable:

  • Web hosting: Ready in under one minute, including reverse proxy and TLS certificate.
  • VMs: Accessible in less than two minutes after provisioning.

Previously, time-consuming processes like manual cloning, IP assignment, and firewall configuration slowed down the workflow. Today, intelligent, time-limited GPU allocation ensures maximum stability and fair utilization, even during peak loads of 200 simultaneous users within a 1,200-person user base.

High-End GPU Performance with a Solid Compute Base

CampusCloud is built on a hardware platform designed to run both advanced GPU workloads and general-purpose compute tasks:

GPU Cluster (High-End AI Workloads)

  • Dell PowerEdge R760XA: 2× Intel Xeon Platinum 8452Y, 512 GB DDR5 RAM, 4× NVIDIA H100
  • Dell PowerEdge XE9680: 2× Intel Xeon Gold 6542Y, 1,5 TB DDR5 RAM, 8× NVIDIA H200
  • 2× Dell PowerEdge XE7740: each with 2× Intel Xeon 6747P, 1 TB DDR5 RAM, 8× NVIDIA RTX PRO 6000 Blackwell Server Edition

GPU via PCIe Passthrough

  • Supermicro SuperServer 4029GP-TRT2: 8× NVIDIA GTX 1080 Ti
  • 4× HPE ProLiant ML110 Gen10: each with 2× NVIDIA RTX 4060 Ti

CPU Compute (Traditional Workloads)

  • 3× Cisco UCS C240 M5S: each with 2× Intel Xeon Gold 6248, 256 GB DDR4 RAM

This setup provides the optimal mix of GPU power and CPU capacity for each use case, maximizing resource utilization and minimizing idle hardware.

Open Source Instead of Vendor Lock-in

Switching to a pure Proxmox environment was also a strategic decision. While parallel VMware ESXi clusters became economically unattractive following the Broadcom acquisition and new licensing models, the department's IT team consistently relies on Open Source.

For the team, this is practiced in everyday operations: the infrastructure is based on open standards using Linux, TrueNAS Scale, and ZFS.

  • No vendor lock-in: Using standard components like QEMU, KVM, and LXC preserves technological freedom.
  • Budget efficiency: Saved licensing costs are reinvested directly into more powerful hardware – a key factor for tight budgets.

The Result: More Performance, less operational overhead

The synergy between vGPU technology and CampusCloud has transformed daily operations: GPUs are no longer rigidly reserved but dynamically shared. This significantly increases utilization while virtualization ensures the necessary stability and security between user groups.

By combining enterprise-grade hardware with professional open-source virtualization, we create optimal conditions. This allows us to provide the best possible support for students and researchers in the field of AI and get the most out of our infrastructure.
Raphael Schrittwieser, IT Responsible, Department of Computer Science and Security

Automation and Next-Gen GPU Sharing

The next stage of expansion is just around the corner: a fully automated booking system. Users will select time slots in a calendar; the system then uses Ansible to move the VM to the appropriate node, configure the vGPU, and install the necessary drivers. Once the time slot expires, the resource is immediately released for the next scheduled task. In addition, tests for NVIDIA Multi-Instance GPU (MIG) on the new XE7740 systems are underway. The goal is to enforce even stricter GPU resource isolation, eliminating any interference between users entirely. Step by step, this creates an infrastructure that delivers computing power exactly where it is needed: quickly, precisely, and with maximum efficiency.

Raphael Schrittwieser

IT Responsible for the laboratories of the Department of Computer Science and Security


About USTP

The University of Applied Sciences St. Pölten is an Austrian university with a strong praxis-oriented approach that closely integrates education, research, and economy. It offers degree programs in a variety of fields, ranging from Informatics and AI to security, digital technologies, media, communication, and management, as well as health, social sciences, and rail technology. The focus is not only on imparting theoretical knowledge but also on applying it in projects, laboratories, and collaborations so that students and research teams can work on real-world problems.

Contact

City:
St. Pölten
Country:
Austria
Website: