Skip to main content

Anthropic aims to fix one of the biggest problems in AI right now

the Anthropic logo
Anthropic

Hot on the heels of the announcement that its Claude 3.5 Sonnet large language model beat out other leading models, including GPT-4o and Llama-400B, AI startup Anthropic announced Monday that it plans to launch a new program to fund the development of independent, third-party benchmark tests against which to evaluate its upcoming models.

Per a blog post, the company is willing to pay third-party developers to create benchmarks that can “effectively measure advanced capabilities in AI models.”

“Our investment in these evaluations is intended to elevate the entire field of AI safety, providing valuable tools that benefit the whole ecosystem,” Anthropic wrote in a Monday blog post. “Developing high-quality, safety-relevant evaluations remains challenging, and the demand is outpacing the supply.”

The company wants submitted benchmarks to help measure the relative “safety level” of an AI based on a number of factors, including how well it resists attempts to coerce responses that might include cybersecurity; chemical, biological, radiological, and nuclear (CBRN); and misalignment, social manipulation, and other national security risks. Anthropic is also looking for benchmarks to help evaluate models’ advanced capabilities and is willing to fund the “development of tens of thousands of new evaluation questions and end-to-end tasks that would challenge even graduate students,” essentially testing a model’s ability to synthesize knowledge from a variety of sources, its ability to refuse cleverly worded malicious user requests, and its ability to respond in multiple languages.

Anthropic is looking for “sufficiently difficult,” high-volume tasks that can involve as many as “thousands” of testers across a diverse set of test formats that help the company inform its “realistic and safety-relevant” threat modeling efforts. Any interested developers are welcome to submit their proposals to the company, which plans to evaluate them on a rolling basis.

Andrew Tarantola
Former Digital Trends Contributor
Andrew Tarantola is a journalist with more than a decade reporting on emerging technologies ranging from robotics and machine…
OpenAI’s ChatGPT Search is now free to use without a login
A person sits in front of a laptop. On the laptop screen is the home page for OpenAI's ChatGPT artificial intelligence chatbot.

ChatGPT is becoming more accessible to the masses. Its ChatGPT Search feature is now available without having to log in to the popular chatbot. Parent company OpenAI has also confirmed that ChatGPT Search will be free to use– the feature works similarly to a search engine.

When accessing the service’s web address, ChatGPT you will see ChatGPT Search front and center, with a message saying “What can I help you with?” You can immediately input your query into the text box. At the bottom of the text box are options that say “Search” and “Reason.” The Search option is the option that allows you to use the page without logging in. Selecting the Reason option will prompt you to log in or sign up to access ChatGPT.

Read more
DeepSeek’s censorship is a warning shot — and a wake-up call
Homepage of DeepSeek's mobile AI app.

The AI industry is abuzz with chatter about a new large language model that is taking the fight to the industry’s top dogs like OpenAI and Anthropic. But not without its generous share of surprises. The name is DeepSeek.

It comes out of China. It is open source. Most importantly, it is said to have been developed at a fraction of the cost compared to what current industry leaders from OpenAI, Meta, and Google have burned.

Read more
OpenAI’s big, new Operator AI already has problems
OpenAI logo on a white board

OpenAI has announced its AI agent tool, called Operator, as a research preview as of Thursday, but the launch isn’t without its minor hiccups.

The artificial intelligence brand showcased features of the new tool in an online demo, explaining that Operator is a Computer Using Agent (CUA) based on the GPT-4o model, which enables multi-modal functions, such as the ability to search the web and being able to understand the reasoning of the search results.

Read more