Introducing 𝐆𝐞𝐦𝐦𝐚 𝟒 𝟑𝟏𝐁 𝐓𝐮𝐫𝐛𝐨 ⚡️
It runs on a 𝘴𝘪𝘯𝘨𝘭𝘦 RTX 5090, at 51 tok/s (single) and 1244 tok/s (batched). And prefills up to 15359 tok/s.
It's 𝟔𝟖% 𝐬𝐦𝐚𝐥𝐥𝐞𝐫 in GPU memory and ~𝟐.𝟓𝐱 𝐟𝐚𝐬𝐭𝐞𝐫 than the base model, and retains nearly… https://twitter.com/LilaRest/status/2042320271005069618/photo/1
中文: 推出 Gemma 4 31B Turbo ⚡️
它采用单次RTX 5090,单次运行,单次运行为51 tok/s,并采用1244 tok/s(单次单次)运行。预填充时间高达15359。
GPU内存小68%,比基础型号快约2.5倍,且几乎保持在...