Skip to main content

Windows 11 will soon harness your GPU for generative AI

Following the introduction of Copilot, its latest smart assistant for Windows 11, Microsoft is yet again advancing the integration of generative AI with Windows. At the ongoing Ignite 2023 developer conference in Seattle, the company announced a partnership with Nvidia on TensorRT-LLM that promises to elevate user experiences on Windows desktops and laptops with RTX GPUs.

The new release is set to introduce support for new large language models, making demanding AI workloads more accessible. Particularly noteworthy is its compatibility with OpenAI’s Chat API, which enables local execution (rather than the cloud) on PCs and workstations with RTX GPUs starting at 8GB of VRAM.

Nvidia’s TensorRT-LLM library was released just last month and is said to help improve the performance of large language models (LLMs) using the Tensor Cores on RTX graphics cards. It provides developers with a Python API to define LLMs and build TensorRT engines faster without deep knowledge of C++ or CUDA.

Get your weekly teardown of the tech behind PC gaming
Check your inbox!

With the release of TensorRT-LLM v0.6.0, navigating the complexities of custom generative AI projects will be simplified thanks to the introduction of AI Workbench. This is a unified toolkit facilitating the quick creation, testing, and customization of pretrained generative AI models and LLMs. The platform is also expected to enable developers to streamline collaboration and deployment, ensuring efficient and scalable model development.

A graph showing TensorRT-LLM inference performance on Windows 11.
Nvidia

Recognizing the importance of supporting AI developers, Nvidia and Microsoft are also releasing DirectML enhancements. These optimizations accelerate foundational AI models like Llama 2 and Stable Diffusion, providing developers with increased options for cross-vendor deployment and setting new standards for performance.

The new TensorRT-LLM library update also promises a substantial improvement in inference performance, with speeds up to five times faster. This update also expands support for additional popular LLMs, including Mistral 7B and Nemotron-3 8B, and extends the capabilities of fast and accurate local LLMs to a broader range of portable Windows devices.

The integration of TensorRT-LLM for Windows with OpenAI’s Chat API through a new wrapper will allow hundreds of AI-powered projects and applications to run locally on RTX-equipped PCs. This will potentially eliminate the need to rely on cloud services and ensure the security of private and proprietary data on Windows 11 PCs.

The future of AI on Windows 11 PCs still has a long way to go. With AI models becoming increasingly available and developers continuing to innovate, harnessing the power of Nvidia’s RTX GPUs could be a game-changer. However, it is too early to say whether this will be the final piece of the puzzle that Microsoft desperately needs to fully unlock the capabilities of AI on Windows PCs.

Editors' Recommendations

Kunal Khullar
A PC hardware enthusiast and casual gamer, Kunal has been in the tech industry for almost a decade contributing to names like…
Everything you need to know about buying a GPU in 2024
RTX 4090.

The graphics card, also known as the GPU, is arguably one of the most exciting components in any PC build. Alongside the processor, your graphics card often has the greatest impact on the overall performance of your PC. That makes it a pretty high-stakes purchase, especially if you consider that GPUs can get pretty expensive.

The GPU market has a lot to offer, and that's regardless of your needs and your budget. Whether you're aiming for something super cheap to support some light browsing or a behemoth to handle the most GPU-intensive games, you have lots of options. In this guide, we'll show you what to look out for so that you can pick the best GPU that fits your budget and needs.
Nvidia, AMD, or Intel?
Consumer graphics cards are generally split into two categories -- integrated and discrete graphics. Since you're here, you're most likely looking for a discrete (or dedicated) GPU, and that's what we're going to focus on in this article.

Read more
I’ve reviewed every AMD and Nvidia GPU this generation — here’s how the two companies stack up
Three graphics cards on a gray background.

Nvidia and AMD make the best graphics cards you can buy, but choosing between them isn't easy. Unlike previous generations, AMD and Nvidia trade blows point-for-point in 2024, and picking a brand to go with isn't as easy as counting the dollars in your wallet.

I've reviewed every graphics card AMD and Nvidia have released this generation, comparing not only raw performance, but also features like DLSS and FSR, ray tracing performance, and how VRAM works in modern games. After dozens of graphics card reviews, here's how AMD and Nvidia stack up against each other in 2024.
Nvidia vs. AMD in 2024

Read more
How much does an AI supercomputer cost? Try $100 billion
A Microsoft datacenter.

It looks like OpenAI's ChatGPT and Sora, among other projects, are about to get a lot more juice. According to a new report shared by The Information, Microsoft and OpenAI are working on a new data center project, one part of which will be a massive AI supercomputer dubbed "Stargate." Microsoft is said to be footing the bill, and the cost is astronomical as the name of the supercomputer suggests -- the whole project might cost over $100 billion.

Spending over $100 billion on anything is mind-blowing, but when put into perspective, the price truly shows just how big a venture this might be: The Information claims that the new Microsoft and OpenAI joint project might cost a whopping 100 times more than some of the largest data centers currently in operation.

Read more