MiniMax-M3 scores 55 on the Artificial Analysis Intelligence Index. Once the weights are released, it will be the leading open weights model M3 is @MiniMax_AI's first multimodal M-series model, adding image and video input and a 1M token context window over the text-only… https://twitter.com/ArtificialAnlys/status/2064066303863005254/photo/1
中文: MiniMax-M3 在人工智能分析指数中得分为 55 分。重量释放后,它将成为领先的开放式重量模型 M3 是 @MiniMax_AI 的首个多模态 M 系列模型,通过仅文本内容添加图像和视频输入,以及一个 1M 令牌上下文窗口。
Artificial Analysis and IBM Research are launching ITBench-AA, the first in a new series of benchmarks evaluating models on agentic enterprise IT tasks, starting with Site Reliability Engineering tasks where frontier models score below 50% ITBench-AA’s SRE tasks benchmark model… https://twitter.com/ArtificialAnlys/status/2059698327235805258/photo/1
中文: 人工分析与IBM研究公司正在推出ITBench-AA,这是评估代理企业IT任务模型的全新测试中的首个,首先从前沿模型得分低于50%的站点可靠性工程任务开始 ITBench-AA的SRE任务基准模型......
Alibaba’s new Qwen3.7 Max model scores 56.6 on the Artificial Analysis Intelligence Index, 4.8 points higher than Qwen3.6 Max Preview (51.8). While Alibaba still trails models from OpenAI, Anthropic and Google, Qwen3.7 Max is the closest they have been to the frontier Qwen3.7… https://twitter.com/ArtificialAnlys/status/2057374452883788196/photo/1
中文: 阿里巴巴新款Qwen3.7 Max在人工智能指数上的得分为56.6,比Qwen3.6 Max预展指数(51.8)高出4.8个百分点。尽管阿里巴巴仍落后于OpenAI、Anthropic和谷歌的车型,但Qwen3.7 Max距离他们最近的市场 Qwen3.7...
Announcing the Artificial Analysis Coding Agent Index! Our new coding agent benchmarks measure how combinations of agent harnesses and models perform on 3 leading benchmarks, token usage, cost and more When developers use AI to code they’re choosing a model, but also pairing it… https://twitter.com/ArtificialAnlys/status/2053865095076438427/photo/1
中文: 宣布人工分析编码剂指数!我们的新编码代理基准测试可衡量代理线束和模型组合在3个领先基准、代币使用、成本等方面的表现 当开发者使用人工智能编写代码时,他们选择的是模型,同时也在配对......
Exciting launch by OpenRouter that uses Artificial Analysis benchmarks
中文: 由使用人工分析基准的 OpenRouter 推出令人振奋的发布
OpenAI has released GPT-Realtime-2, achieving 96.6% in our Speech Reasoning benchmark, Big Bench Audio, and #1 in our Conversational Dynamics benchmark Released today, GPT-Realtime-2 is OpenAI's new flagship native Speech to Speech model, introducing adjustable reasoning effort… https://twitter.com/ArtificialAnlys/status/2052486470469140777/photo/1
中文: OpenAI 发布了 GPT-Realtime-2,在语音推理基准、大长凳音频中实现了 96.6%,在会话动力学基准中实现了第1 今天发布的GPT-Realtime-2是OpenAI的全新旗舰原生语音模式,引入了可调节的推理功能......
MiniMax-M2.7 is now available across six inference providers on Artificial Analysis, with significant differentiation in speed and price @SambaNovaAI leads on speed at 435 output tokens/s, >3x faster than any other provider. @FireworksAI_HQ, @novita_labs, @togethercompute, and… https://twitter.com/ArtificialAnlys/status/2051735255044997215/photo/1
中文: MiniMax-M2.7 现已在六家人工智能推理提供商中推出,在速度和价格方面具有显著差异 @SambaNovaAI 以 435 个输出令牌的速度领先,比任何其他提供商都快。@FireworksAI_HQ、@novita_labs、@togethercompute 和......
GPT-5.5 (xhigh) uses ~40% fewer output tokens to run our Index than its predecessor https://twitter.com/ArtificialAnlys/status/2047378423933489364/photo/1
中文: 与前代产品
Meta is back! Muse Spark scores 52 on the Artificial Analysis Intelligence Index, behind only Gemini 3.1 Pro, GPT-5.4, and Claude Opus 4.6. Muse Spark is the first new release since Llama 4 in April 2025 and also Meta's first release that is not open weights Muse Spark is a new… https://twitter.com/ArtificialAnlys/status/2041913043379220801/photo/1
中文: Meta 回来了!缪斯·斯帕克在人工智能分析指数中得分为52分,仅次于Gemini 3.1 Pro、GPT-5.4和Claude Opus 4.6。Muse Spark 是自2025年4月《Llama 4》以来的首个新版本,也是Meta首次未实现开放权重的发布 缪斯·斯帕克是一个全新的......
We’ve added a new pseudonymous video model to our Text to Video and Image to Video Arenas.‘HappyHorse-1.0’ is currently landing in the #1 spot for Text and Image to Video (No Audio) and the #2 spot for Text and Image to Video (With Audio). Further details coming soon. Example… https://twitter.com/ArtificialAnlys/status/2041591989083500933/photo/1
中文: 我们在视频竞技场的文本到视频和图像中新增了一个化名视频模型。HappyHorse-1.0 目前排名第一,适用于文本和图像视频(无音频),以及“文本与图像到视频”(带音频)的第2位。 更多细节即将公布。 示例......