Skip to main content

Anthropic aims to fix one of the biggest problems in AI right now

the Anthropic logo

Hot on the heels of the announcement that its Claude 3.5 Sonnet large language model beat out other leading models, including GPT-4o and Llama-400B, AI startup Anthropic announced Monday that it plans to launch a new program to fund the development of independent, third-party benchmark tests against which to evaluate its upcoming models.

Per a blog post, the company is willing to pay third-party developers to create benchmarks that can “effectively measure advanced capabilities in AI models.”

“Our investment in these evaluations is intended to elevate the entire field of AI safety, providing valuable tools that benefit the whole ecosystem,” Anthropic wrote in a Monday blog post. “Developing high-quality, safety-relevant evaluations remains challenging, and the demand is outpacing the supply.”

The company wants submitted benchmarks to help measure the relative “safety level” of an AI based on a number of factors, including how well it resists attempts to coerce responses that might include cybersecurity; chemical, biological, radiological, and nuclear (CBRN); and misalignment, social manipulation, and other national security risks. Anthropic is also looking for benchmarks to help evaluate models’ advanced capabilities and is willing to fund the “development of tens of thousands of new evaluation questions and end-to-end tasks that would challenge even graduate students,” essentially testing a model’s ability to synthesize knowledge from a variety of sources, its ability to refuse cleverly worded malicious user requests, and its ability to respond in multiple languages.

Anthropic is looking for “sufficiently difficult,” high-volume tasks that can involve as many as “thousands” of testers across a diverse set of test formats that help the company inform its “realistic and safety-relevant” threat modeling efforts. Any interested developers are welcome to submit their proposals to the company, which plans to evaluate them on a rolling basis.

Andrew Tarantola
Andrew has spent more than a decade reporting on emerging technologies ranging from robotics and machine learning to space…
Stability AI’s music tool now lets you generate tracks up to 3 minutes long

Fears are already growing over generative AI’s challenge to human talent in the creative industries, and an update from Stability AI on Wednesday will only serve to heighten those concerns.

The London-based startup has just released Stable Audio 2.0, the latest version of its music-generation platform.

Read more
Asus ROG Ally X vs. Steam Deck OLED: Has the champion been dethroned?
The Asus ROG Ally X console.

It's not much of an overstatement to say that when Valve released the original Steam Deck, it started a real handheld PC revolution. Launching the Steam Deck OLED only emphasized that while there may be other, more powerful consoles on the market now, Valve's offering still stands strong against the competition. But can it hold its ground against the Asus ROG Ally X?

The two handhelds have more in common than it might seem at first glance. While both are refreshes, neither is a full-blown version 2.0. How do they stack up against each other, though? We've reviewed both ourselves, so we now know the answer to that question. Read our comparison to find out which device wins in a battle between the Asus ROG Ally X and the Steam Deck OLED.

Read more
The best 5K monitors you can buy for max resolution
A person using the Dell UltraSharp 40 U4025QW 40-inch curved Thunderbolt hub monitor with a Dell laptop on a desk.

We all know that 5K monitors come with a relatively steep price tag. However, they remain a top choice among serious creative professionals, including photographers, videographers, filmmakers, and graphic designers. These displays not only deliver exceptionally sharp and detailed imagery but also feature high-end, factory-calibrated panels to ensure precise color reproduction.

A true 5K resolution is defined as 5120 by 2880 pixels, with many manufacturers emphasizing the horizontal pixel count. It's important to note that only a few monitors offer this exact resolution. Therefore, we have compiled a list of the top monitors that also provide a 5K2K resolution (5120 by 2160 pixels).

Read more