RT @Prince_Canuma: 🗓️ Release Week Recap Big week. mlx-audio and mlx-vlm are now among some of the fastest-growing OSS projects. Here’s wh…
中文: RT @Prince_Canuma:🗓️ 发布周回顾 大周。mlx-audio 和 mlx-vlm 现已成为增长最快的 OSS 项目之一。这是......
Directed, shot and edited by @Edypurp! This is perhaps the best video of me ever 🤣
中文: 由 @Edypurp 执导、拍摄和剪辑! 这或许是我有史以来最好的视频🤣
❤️
🗓️ Release Week Recap Big week. mlx-audio and mlx-vlm are now among some of the fastest-growing OSS projects. Here’s what we shipped last week. Gemma 4 on Apple Silicon Two awesome releases by our partner @GoogleDeepMind & @googlegemma : > Gemma 4 12B — their new dense,… https://twitter.com/Prince_Canuma/status/2064002269394211301/video/1
中文: 🗓. 发布周回顾 大周。mlx-audio 和 mlx-vlm 现已成为增长最快的 OSS 项目之一。我们上周发货的内容如下。 苹果硅片上的Gemma 4 我们的合作伙伴 @GoogleDeepMind & @googlegemma 发布了两部精彩内容: Gett;Gemma 4 12B——它们新的密集...
RT @berryxia: 🚀 mlx-audio v0.4.4 已发布——这是我们迄今为止推出的功能最强大的版本。 新增了 15 个 TTS(文本转语音)、ASR(自动语音识别)及 VAD(语音活动检测)模型,提升了长文本内容转录的速度,并改进了与 OpenAI 兼容的音…
🚀 mlx-audio v0.4.4 is out — our biggest model drop yet. 15+ new TTS, ASR & VAD models, faster long-form transcription, and an expanded OpenAI-compatible audio server. All running local on Apple Silicon. 🎤 New TTS • VoxCPM2 — 2B, 48kHz, 30 languages • MOSS-TTS / TTSD / 1.5… https://twitter.com/Prince_Canuma/status/2063284707186319860/photo/1
中文: mlx-audio v0.4.4 已退出——这是我们迄今为止最大的型号跌幅。 15 个全新的 TTS、ASR & VAD 模型、更快的长格式转录以及扩展的 OpenAI 兼容音频服务器。所有在苹果硅上运行的本地设备。 🎤 新TTS • VoxCPM2 — 2B、48kHz、30 种语言 • MOSS-TTS / TTSD / 1.5...
🚀
🚀 mlx-vlm v0.6.2 is here — and we're a launch-day partner for @googlegemma Gemma 4 QAT release! Today Google DeepMind released Gemma 4 quantization-aware training (QAT) checkpoints, optimized to run locally on consumer GPUs and edge devices. These checkpoints allows us to… https://twitter.com/Prince_Canuma/status/2063025820881293317/photo/1
中文: 🚀 mlx-vlm v0.6.2 已发布——我们是 @googlegemma Gemma 4 QAT 发布的发布日合作伙伴! 今天,谷歌DeepMind发布了Gemma 4量子感知训练(QAT)检查点,优化为在消费级GPU和边缘设备上本地运行。这些检查点使我们能够......
Well done @_ARahim_, that's really fast!
中文: 干得好,@_Arahim_,真的太快了!
RT @LocallyAIApp: Locally is now @lmstudio's mobile app, and we are bringing LM Link to your iPhone. Use your largest models from your pho…
中文: RT @LocallyAIApp:Locally 现已成为 @lmstudio 的手机应用程序,我们将为您的 iPhone 提供 LM 链接。 使用你最大的型号,从你的 pho...
Gemma 4 12B + MTP speculative decoding on mlx-vlm 🚀 We benchmarked MTP on Gemma 4 12B across all 4 modalities in mlx-vlm — and it speeds up everything: text, image, audio, and combined audio+image, up to 1.72× and 80 tok/s on a single M3 Ultra. Get started today: > uv pip… https://twitter.com/Prince_Canuma/status/2062627452745384046/photo/1
中文: Gemma 4 12B + MTP 对 mlx-vlm 进行投机解码 🚀 我们在 mlx-vlm 的 4 种模式中,对 Gemma 4 12B 的 MTP 进行了基准测试,并加快了所有功能的速度: 单次 M3 Ultra 上的文本、图像、音频以及组合音频+图像,最高可达 1.72 × 和 80 次。 立即开始: 网址:uv pip...
Coming to MLX 🚀
中文: 来到MLX 🚀
RT @jorgeham: Ya salió el nuevo modelo de Google de su familia @googlegemma #Gemma4. Es un modelo de 12B que cae justo en medio entre los h…
🚀 Gemma 4 12B is here! We partnered with @GoogleDeepMind to bring and optimize their new dense and unifed multimodal model for Apple Silicon. ◈ 12B dense · 256K context ◈ Thinking mode (built-in reasoning) ◈ Vision: dynamic res, OCR, UI + charts ◈ Native audio: ASR +… https://twitter.com/Prince_Canuma/status/2062224761841672509/photo/1
中文: 🚀 Gemma 4 12B 来了! 我们与@GoogleDeepMind合作,为苹果硅胶公司推出并优化其新型密集且单调的多模态模式。 ◈ 12B 密集 · 256K 上下文 ◈ 思维模式(内置推理) ◈ 视觉:动态 res、OCR、UI+ 图表 ◈ 原生音频:ASR+...
Already on MLX 🚀
中文: 已在MLX上🚀
RT @ivanfioravanti: Simple Self-Distillation with MLX is proceeding! Now running generation phase distributed on multiple nodes, it's manua…
中文: RT @ivanfioravanti:使用MLX进行简单的自我蒸馏正在进行! 现在在多个节点上运行生成阶段,它是 manua...
⚡ MLX-VLM v0.6.0: speculative decoding that's ~2× faster and byte-for-byte exact. Qwen3.6-27B by @Alibaba_Qwen + MTP, generating 2K tokens on AIME 2026 #13 and thinking mode on. 📊 4-bit : 34.05 → 64.73 tok/s 🎯 bf16: 12.33 → 27.90 tok/s Same output, half the wall time —… https://twitter.com/Prince_Canuma/status/2061559360728281559/photo/1
中文: ⚡ MLX-VLM v0.6.0:精确编程速度快、字节字节。 @Alibaba_Qwen + MTP 的 Qwen3.6-27B,在 AIME 2026 #13 和 think mode 上生成 2K 代币。 📊 4 位:34.05 → 64.73 次 🎯 bf16:12.33 → 27.90 次 输出相同,是墙面时间的一半——
Shoutout to @badlogicgames and @mitsuhiko for bringing Pi agent harness to the world! It's pretty easy to setup with local models and run without even looking at the docs.
中文: 向@badlogicgames和@mitsuhiko大喊大叫,以将Pi代理技术带给世界! 使用本地型号进行设置非常简单,即使不看文档也能运行。
Today we're shipping our biggest MLX-VLM release yet: v0.6.0 ...and we are raising 💸 This one's about turning your Apple devices into real local agent machines. From your desk to your pocket. What's new: ⚡ Speculative decoding everywhere — Gemma 4 EAGLE3 + DFlash, Qwen… https://twitter.com/Prince_Canuma/status/2061541992790683726/video/1
中文: 今天我们发布了迄今为止最大的MLX-VLM版本:v0.6.0 ......我们正在筹集💸 这是关于将你的苹果设备变成真正的本地代理设备。从你的办公桌到口袋。 有什么新的: ⚡ 无处不在的投机解码——Gemma 4 EAGLE3 + DFlash,Qwen...
RT @mitsuhiko: Heads up: we're hiring members of the technical staff at Earendil again. If you're interested mail [email protected]
中文: RT @mitsuhiko:抬头:我们再次招聘Earendil技术人员。如果您有兴趣,请发送邮件至 job [email protected]
See you y’all soon 🚀
中文: 再见了,你们都快来了🚀
Wooohoo Can’t wait to get access to the OSS weights 🔥🚀
中文: 哇哦 迫不及待想获取OSS权重🔥🚀
Video director 🎬
Tomorrow is big a day! 🚀 https://twitter.com/Prince_Canuma/status/2061116294322180359/photo/1
中文: 明天真大!🚀
Tomorrow is big day! 🚀 https://twitter.com/Prince_Canuma/status/2061116242979627150/photo/1
中文: 明天是大日子!🚀
You can run the whole pipeline in real-time on your Mac :)
中文: 你可以在 Mac 上实时运行整个流程 :)
This is why I love Poland 🇵🇱
中文: 这就是为什么我热爱波兰 🇵🇱
RT @ivanfioravanti: @Prince_Canuma Thanks! 🙏 Awesome mlx-vlm! It's becoming my go to engine for hacking mlx models!
中文: RT @ivanfioravanti:@Prince_Canuma 谢谢!🙏 太棒了,mlx-vlm! 它正成为我破解 MLX 模型的引擎!
RT @aagosh: @Prince_Canuma this is the craziest model i have tested so far. 185 t/s on m3 ultra 256 gb Thanks for the work you are doing!
中文: RT @aagosh:@Prince_Canuma 这是迄今为止我测试过的最疯狂的型号。m3 Ultra 256 gb 的 185 t/s 感谢你所做的工作!
Awesome work Ivan!
中文: 很棒的作品,伊万!
MLX-VLM docs coming soon and it looks great! Thank you Charmaine ❤️
中文: MLX-VLM 文档即将推出,看起来很棒! 谢谢Charmaine ❤️
Already on MLX 😎🚀
中文: 已在MLX上😎🚀
LFG 🚀
About to land on MLX-VLM 😎 https://twitter.com/Prince_Canuma/status/2059756524872794318/photo/1
中文: 即将登陆MLX-VLM 😎
Would’ve been extraordinary if it was the reverse evolution. Because I still think they messed up big time. I don’t like Electric cars but I could see myself driving a Taycan, Cayenne, even a Telsa, but not that monstrous creation
中文: 如果是逆向进化,那将会非同寻常。 因为我仍然觉得他们把时间搞砸了。 我不喜欢电动汽车,但我能看到自己驾驶一辆泰坎、卡宴,甚至一辆泰莎,但并非那可怕的作品
RT @mervenoyann: RF-DETR just landed to @huggingface transformers 🥵🔥 sota real-time detection & segmentation models by @roboflow 💜 > pla…
中文: RT @mervenoyann:RF-DETR 刚刚登陆 @huggingface 互接器 🥵🔥 实时检测与实时检测;由 @roboflow 进行细分模型 💜 加以......
Coming to MLX 🚀
中文: 来到MLX 🚀
Stay tuned, great news coming soon!
中文: 敬请关注,好消息即将传来!
True Ferrari design and DNA❤️ These pictures were taken ~4 years ago, the first time I saw a Ferrari in-person at the Ferrari dealership in Katowice, Poland 🇵🇱 https://twitter.com/Prince_Canuma/status/2059040836982063604/photo/1
中文: 真正的法拉利设计与DNA 这些照片拍摄于大约4年前,我第一次在波兰卡托维兹的法拉利经销商处看到法拉利🇵🇱
Tell me this is not true 😪😭 What’s happening to car manufacturers designs? https://x.com/mike_matas/status/2059005227236495755/video/1
中文: 告诉我这不是真的😪😭 汽车制造商的设计情况如何?
Thanks Marlene! Great catching up, exploring London, and geeking out over industry stuff. Can’t wait to do this again!
中文: 谢谢玛琳! 很棒,能探索伦敦,并探索行业的精彩。迫不及待想再做一次!
Who is in the UK?
中文: 谁在英国?
We are cooking 👨🏾‍🍳!
中文: 我们正在烹饪👨🏾🍳!
On my way to London for the week 🇬🇧 Let’s meet up @marlene_zw @reach_vb @thatfiredev
中文: 前往伦敦的一周🇬🇧 让我们见面吧 @marlene_zw @reach_vb @thatfirestev
RT @MaziyarPanahi: OpenMed Agent + Claude Opus 4.7 just ran a 14-step special-pathogen ED workup on a synthetic VHF case. Live CDC + WHO +…
中文: RT @MaziyarPanahi:OpenMed Agent + Claude Opus 4.7 刚刚对一个合成型VHF机箱进行了14步特殊病原体的ED处理工作。 实时疾控中心 + 世卫组织 +...
RT @bilgeycl: 🥳 My talk from @aiDotEngineer Europe is live on YT!! this was my first ever talk on sovereignty so I'm quite excited to shar…
中文: RT @bilgeycl:🥳 我在 @aiDotEngineer Europe 上的演讲直播了! 这是我第一次谈论主权,所以我非常兴奋地进行......
M3 Max is back from service and fully operational 🚀 Thanks to the guys at iMAD Poland 🇵🇱 All devices are back to normal, now time to get back to shipping! https://twitter.com/Prince_Canuma/status/2057549166461563106/photo/1
中文: M3 Max 已恢复运营并全面投入运营 🚀 感谢iMAD波兰的各位🇵🇱 所有设备都恢复正常了,现在是时候重新发货了!
Well done @gonizahavy 🔥 MLX on CUDA and now Vulkan
中文: 干得好 @gonizahavy 🔥 MLX 在 CUDA 上,现在为 Vulkan
Coming to MLX 🚀
中文: 来到MLX 🚀
Congrats, you guys should follow him! @N8Programs is one of the smartest, most intellectually curious people I know. Someone with real agency who gets things done. 🔥
中文: 恭喜,你们应该跟着他! @N8Programs 是我认识的最聪明、最有思想好奇心的人之一。有真正机构的人,谁就能把事情做完。🔥
RT @N8Programs: Excited to announce an open-sourcing webui to experiment w/ steering vectors! Works OOTB w/ Gemma 26B A4B and Gemma 4B E4B…
中文: RT @N8Programs:很高兴宣布推出开源网页设计,以尝试转向矢量!与Gemma 26B A4B和Gemma 4B E4B合作作品
This is really cool! I used to live in India 🇮🇳, hope to visit again someday to share some cool ideas and experiences
中文: 这真的很酷! 我曾经住在印度,希望有朝一日能再次来,分享一些酷炫的想法和体验
Titan temperatures are back to normal range! Hallelujah ❤️🙌🏽🙏🏽 Looks like the water cooling system just needed sometime and more air to cool down the CPU. https://twitter.com/Prince_Canuma/status/2056460338988458111/photo/1
中文: 泰坦温度已恢复到正常范围!哈利路亚❤🙌🏽🙏🏽 看起来水冷系统只是需要一段时间,而需要更多空气来冷却CPU。
Quick update on the water situation 💦 M3 Ultra and Titan (RTX6000 Pro) seem to have recovered with little to no visible damage. The main issues are with my MacBook which is in service and Titan CPU temperatures being above avg when idling (58C up from 35C prior to water… https://twitter.com/Prince_Canuma/status/2056453683265540434/video/1
中文: 水情况快速更新 💦 M3 Ultra 和 Titan(RTX6000 Pro)似乎已恢复,几乎无法发现任何明显损坏。 主要问题是我的 MacBook 正在运行,而 Titan 的 CPU 温度在怠速时(高于 35 温度,高于 35 温度)
RT @NielsRogge: Introducing a revival of PapersWithCode! As @ilyasut said, we're back to the "age of research". Hence, it's important to…
中文: RT @NielsRogge:推出 PapersWithCode 的复兴版! 正如@yilasut所说,我们又回到了“研究时代”。 因此,重要的是......
RT @pcuenq: Using Hermes Agent for long-running non-work tasks, on the premise it'll become more useful as it knows more about me. Below is…
中文: RT @pcuenq:使用 Hermes Agent 完成长期不工作任务,前提是它会随着对我的更多了解而变得更加有用。以下是......
Codex remote control is by far the best I tried I got to try it last week and I love that I can connect to multiple devices, start new sessions and it’s so stable even after days. Well done @OpenAIDevs @reach_vb and the team 🔥🙌🏽 https://twitter.com/Prince_Canuma/status/2056291556336738470/photo/1
中文: 远程编程是我尝试过的最好的 上周我尝试了,非常喜欢能够连接到多个设备,开启新的会话,而且即使几天后它也非常稳定。 干得好 @OpenAIDevs @reach_vb 和团队 🔥[EE]🏽
See you in a week hopefully 🤞🏾 Who’s gonna carry the Day-0 support and the evals? https://twitter.com/Prince_Canuma/status/2056290620654678086/photo/1
中文: 希望一周内见🤞🏾 谁来承担第0天的支持和比赛的英勇?
RT @marlene_zw: Wow! My talk from @aiDotEngineer Europe is finally up on YouTube🥳 Many people I admire gave great talks at this conference…
中文: RT @marlene_zw:哇!我来自@aiDotEngineer Europe的演讲终于在YouTube上发表了。在这次会议上,我敬佩了许多人的精彩演讲......
Monday we’ll find out if it’s damaged and the extent of it 🙏🏽 https://twitter.com/Prince_Canuma/status/2055715552975340004/photo/1
中文: 周一我们将了解其损坏程度及其程度 🙏🏽
Meanwhile touching iron 🏋🏾‍♂️ after a while https://twitter.com/Prince_Canuma/status/2055629263940862158/photo/1
中文: 与此同时,一段时间后触摸铁🏋🏾 ♂️
Hardest lesson, But happy everyone is safe 👌🏽 https://twitter.com/Prince_Canuma/status/2055588120339353933/photo/1
中文: 最难的教训 但大家很高兴安全 👌🏽
New MLX-VLM and MLX-Audio releases delayed 🥲 Major storm hit Krakow yesterday, my office and all my devices got flooded. I did the best to dry them but MacBook Pro is randomly shutting down and dimming display, and haven’t tested Titan (the RTX6000 pro box) yet. So far, only… https://twitter.com/Prince_Canuma/status/2055567637090873568/photo/1
中文: 新的MLX-VLM和MLX-Audio版本延迟了🥲 昨天,克拉科夫遭遇了大风暴,我的办公室和所有设备都被淹了。 我尽力将它们晾干,但MacBook Pro正在随机关闭和调光显示屏,但尚未测试过Titan(RTX6000 Pro Box)。 到目前为止,仅限......
This is brilliant! The main issue with Eagle, MTP, and DFlash is that verification eats 75–90% of the speculative overhead, which really shows on dense models. While working on speculative techniques for mlx-vlm, I noticed we weren't getting the claimed speedups on Apple… https://twitter.com/Prince_Canuma/status/2055044319166222522/photo/1
中文: 这真是太棒了! Eagle、MTP和DFlash的主要问题在于,验证消耗了75%至90%的投机性开销,这在密集模型上确实显示出来。 在为 mlx-vlm 开发推测性技术时,我注意到我们并没有在苹果上获得声称的加速速度......
RT @thinkwithmark: We raised $1M dollars to reinvent how people read. Introducing Mark II - a $159 AI bookmark. Thread below https://t.co/e…
中文: RT @thinkwithmark:我们筹集了100万美元,以重塑人们的阅读方式。介绍Mark II——一本售价159美元的AI书签。 下方链接
Finally! 🚀
中文: 终于!🚀
Ohh yeah, I only found out about this thanks to my dear friend @andimarafioti!
中文: 哦,是啊,我之所以发现这件事,只是感谢我亲爱的朋友@andimaramafioti!
Ohh yeah, I only found out about this thanks to my dead friend @andimarafioti https://x.com/ClementDelangue/status/2054932025853772182/video/1
中文: 哦,是的,我之所以发现这件事,只是感谢了我已故的朋友@andimarafioti
RT @zcbenz: We have achieved a milestone in MLX that all tests are passing in CUDA backend now. https://twitter.com/zcbenz/status/2054699392071016743/photo/1
中文: RT @zcbenz:我们已在MLX领域实现了一个里程碑,即所有测试都在CUDA后端通过。
RT @sakurayukiai: @Prince_Canuma @googlegemma @RedHat_AI Even as an RTX 5070 Ti loyalist I have to admit MLX is cooking here. Eagle bypassi…
中文: RT @sakurayukiai:@Prince_Canuma @googlegemma @RedHat_AI 即使作为RTX 5070的忠实支持者,我也不得不承认,MLX 正在这里烹饪。鹰绕行......
RT @pmarca: Co-sign.
中文: RT @pmarca:共同签名。
Gemma 4 + 🦅 = brrr Next MLX-VLM release will be packed with improvements! Here is an initial preview of Eagle3 speculative decoding. @googlegemma @RedHat_AI https://twitter.com/Prince_Canuma/status/2054676893899645208/photo/1
中文: Gemma 4 + 🦅 = brrr 下一个MLX-VLM版本将包含改进内容! 以下是Eagle3推测性解码的初步预览。 @googlegemma @RedHat_AI
RT @eisokant: Applied Research Hackathon. We’re sponsoring compute. @PrimeIntellect’s excellent stack will be there to support RL and evals…
中文: RT @eisokant:应用研究黑客马拉松。我们正在赞助计算。@PrimeIntellect 出色的堆栈将支持 RL 和 evals...
RT @poolsideai: Poolside is hosting a 2-day model research hackathon in London. Join us to push an open-weight agent model as far as you…
中文: RT @poolfsideai:Poolside 正在伦敦举办一场为期两天的模特研究黑客马拉松活动。 加入我们,向您推广一款敞力量的代理车型......
RT @adrgrondin: The latest update of @LocallyAIApp brings many improvements to the app. My favorite is the improved support for iPad Windo…
中文: RT @adrgrondin:@LocallyAIApp 的最新更新为该应用带来了许多改进。 我最喜欢的是对 iPad Windo 的改进支持......
Coming to MLX 🚀
中文: 来到MLX 🚀
🚀
RT @vincentweisser: We are open sourcing renderers For RL, the inference server should be simple Tokens in, tokens out renderers is the t…
中文: RT @vincentweisser:我们是开源渲染器 对于RL,推理服务器应该是简单的代币, 渲染器就是......
RT @Prince_Canuma: My @aiDotEngineer talk is live: "On-device Intelligence using MLX" 🎥 Huge thanks to @swyx and the team for having me —…
中文: RT @Prince_Canuma:我的@aiDotEngineer演讲即时直播:“使用MLX的设备智能”🎥 非常感谢@swyx和团队让我拥有的...
Swatch about to make a bag💰💰💰
中文: 即将制作一个包的 💰💰💰
Update 🥲 Need to wait a few weeks for them to fix the fiber optic cable in my area
中文: 更新 🥲 需要等几周才能修复我区域的光纤电缆
RT @nopmobiel: @tdkardum @ivanfioravanti @Prince_Canuma @huggingface https://twitter.com/nopmobiel/status/2054150871445905495/video/1
RT @nopmobiel: @ivanfioravanti @Prince_Canuma @huggingface this is a fully local stack to get your Reachy Mini running (with vision/tts/stt…
中文: RT @nopmobiel:@ivanfioravanti @Prince_Canuma @huggingface 这是一个完全本地化的堆栈,可让您的 Reachy Mini 运行(带有 vision/tts/stt...
RT @DittmannAxel: ZAYA1-VL-8B (Zyphra) on MLX-VLM. M3 Max: 4.6s load, 52 tok/s decode, ~20 GB peak. Pointed it at my dogs on a tree stump.…
中文: RT @DittmannAxel:ZAYA1-VL-8B(Zyphra)在MLX-VLM上播出。M3 最大值:4.6 秒负载,52 次解码,约 20 GB 峰值。 指着我的狗在树桩上。......
This is so good! Especially in the context of Open-source where you get a lot of pool towards many different directions https://x.com/Pragmatic_Eng/status/2053845082042679382/video/1
中文: 这太好了! 尤其是在开源领域,你在开源领域获得大量面向多种方向的资源
RT @OpenBMB: Thanks so much, Prince! Really appreciate the Day 0 support for MiniCPM-V 4.6 on MLX-VLM 🚀 125 tok/s full precision on M3 Ma…
中文: RT @OpenBB:非常感谢,普林斯!非常感谢在MLX-VLM上支持MiniCPM-V 4.6的第0天🚀 M3 上的 125 托克/秒 全精度...
I love this! Guess I'm going back to full on terminal mode
中文: 我爱这个! 我猜我将恢复到终端模式
RT @awnihannun: Claude Code agent view is where I start and manage a lot of my work. This is an exceedingly useful new feature - 10/10
中文: RT @awnihannun:我开始并管理大量工作的Claude Code代理视图。这是一个非常有用的新功能——10/10
If you didn’t see the Giraffe 🦒 in my talk you missed an important part
中文: 如果你在我的演讲中没有看到长颈鹿🦒,你就忽略了一个重要的部分
Tomorrow I’m getting fiber optic connection installed in my house ❤️🔥😭 After nearly a year of 5G WiFi, an ISP finally added it to my area. Now ports are gonna go brrrr https://twitter.com/Prince_Canuma/status/2053947549921210800/photo/1
中文: 明天我将在家中安装光纤连接❤️🔥😭 经过近一年的5G WiFi后,一家互联网服务提供商终于将其添加到了我的区域。 现在港口将前往:
RT @MaziyarPanahi: @Prince_Canuma @aiDotEngineer @swyx I love it! Thanks bro for the shoutout and showing my demo! 🤗 https://twitter.com/MaziyarPanahi/status/2053930781668479350/photo/1
中文: RT @MaziyarPanahi:@Prince_Canuma @aiDotEngineer @swyx 我很喜欢!谢谢兄弟的大声喊叫,并演示我的演示!🤗
My @aiDotEngineer talk is live: "On-device Intelligence using MLX" 🎥 Huge thanks to @swyx and the team for having me — hands down the best tech event I've been to. And a shoutout to the community shipping with our packages and pushing the ecosystem forward: @MaziyarPanahi,…
中文: 我的@aiDotEngineer 演讲即时直播:“使用 MLX 的设备智能” 🎥 非常感谢@swyx和团队的加入——请亲自参加我参加过的最棒的科技活动。 向社区大声疾呼,用我们的包裹运送,推动生态系统向前发展:@MaziyarPanahi,......
Congratulations to @OpenBMB on the launch of MiniCPM-V 4.6! We have Day-0 support for it on MLX-VLM 🚀 h/t Magic Yang Runs at 125 tok/s in full precision on M3 Max. https://github.com/Blaizzy/mlx-vlm/pull/1058 https://twitter.com/Prince_Canuma/status/2053924952605090216/video/1
中文: 祝贺@OpenBMB推出MiniCPM-V 4.6! 我们在MLX-VLM上支持第0天🚀 h/t Magic Yang 在M3 Max上以全精度运行125 tok/s。
Awesome release by @gonizahavy 🚀
中文: 精彩发布:@gonizahavy 🚀
Feelings aren’t facts! This is nonsense 😂
中文: 感情不是事实! 这是无稽之谈😂
Congrats to the @ZyphraAI on the launch! 🚀 Zaya1-VL comes with day-0 support on mlx-vlm Install from source for now if you want to try it out https://github.com/Blaizzy/mlx-vlm/ https://twitter.com/Prince_Canuma/status/2053239339820224533/video/1
中文: 祝贺 @ZyphraAI 发布会!🚀 Zaya1-VL 在 mlx-vlm 上提供 DY-0 支持 立即从源头安装,如果想试用
RT @adrgrondin: Early WIP port of Gemma 4 multi-token prediction (MTP) on MLX Swift With MTP, Gemma 31B is 30-40% faster on M5 Max and wit…
中文: RT @adrgrondin:MLX Swift 上 Gemma 4 的早期 WIP 端口(MTP) 使用MTP时,Gemma 31B在M5 Max上的速度比30-40%快,机智......
GPU going brrrrr 🚀
Hype is such an interesting concept
中文: 炒作是一个非常有趣的概念
I used a curse word every other word I uttered until I met my wife Since then my vocabulary changed dramatically 😁 https://twitter.com/Prince_Canuma/status/2053144498453053687/photo/1
中文: 我直到遇见妻子,才用一个诅咒这个词说出 从那以后,我的词汇量发生了巨大变化😁
Bro is him 🙌🏽🔥 I was at the edge of my seat the whole season https://twitter.com/Prince_Canuma/status/2053120576907096369/photo/1
中文: 兄弟就是他🙌🏽🔥 整个赛季我都坐在座位边缘
RT @MaziyarPanahi: The stack: → SAM 3.1 (@metaai) segmentation + multi-target tracking → Falcon Perception (@TIIuae) open-vocab detection…
中文: RT @MaziyarPanahi:堆栈: → SAM 3.1(@metaai)细分+多目标跟踪 → 猎鹰感知(@TIIuae)开放式词汇检测......
One the coolest local vision grounding examples
中文: 最酷的本地视野接地实例之一
🤣
Have a great weekend! 🤗 https://twitter.com/Prince_Canuma/status/2053040880848601303/photo/1
中文: 周末过得愉快!🤗
Coming to MLX 🚀
中文: 来到MLX 🚀
mlx-audio v0.4.3 is here 🚀 A massive release across models, server, and DX 🙌🏽 → 6 new TTS models: Higgs Audio v2 (voice cloning), OmniVoice (646+ languages), LongCat-AudioDiT 1B, MOSS-TTS-Nano, Irodori-TTS v2, MeloTTS-English → Mel-Band-RoFormer for vocal source separation… https://twitter.com/Prince_Canuma/status/2053016238050136242/photo/1
中文: mlx-audio v0.4.3 已发布 🚀 在模型、服务器和 DX 上大规模发布 🙌🏽 → 6款新的TTS型号:Higgs Audio v2(语音克隆)、OmniVoice(646多种语言)、LongCat-AudioDiT 1B、MOSS-TTS-Nano、Irodori-TTS v2、MeloTTS-English → 梅尔-邦德-罗·福尔默用于声源分离......
😂 I’m gonna run meetings with this on
中文: 😂 我将在此进行会议
RT @eliebakouch: i get a lab, you get a lab, everybody get a lab!!! https://twitter.com/eliebakouch/status/2052235995362328857/photo/1
中文: RT @eliebakouc:我得一个实验室,你得到一个实验室,每个人都有实验室!
Congratulations guys, we all saw this come to light, well done!👏🏾
中文: 恭喜大家,我们都看到这一切曝光了,干得好!EE0[EE]
What a release!
中文: 真是一次释放!
Big moves 🚀
中文: 大动作 🚀
RT @marovole: Prince Canuma(mlx-vlm 创始人)发布了 v0.5.0,这是 Apple MLX 生态中最大的模型推理库更新。 【1】服务器级能力全面落地 连续批处理(Continuous Batching)让多并发请求吞吐量大幅提升;配合 KV…
RT @ivanfioravanti: Time for a deep testing session! Well done!
中文: RT @ivanfioravanti:是时候进行深度测试了!干得好!
RT @jelveh: Dang! Epic release!!
中文: RT @jelveh:Dang!史诗级发布!!
Lots to come still! I had to release early just to stop the onslaught of issues and PRs 😅 Also so the next release card would be readable https://twitter.com/Prince_Canuma/status/2052154161894986059/photo/1
中文: 还有很多要来的! 我不得不提前发布,以阻止问题和公关的冲击😅 因此,下一张发行卡将可读
True! And we just getting started 🚢
中文: 真的! 我们才刚刚开始🚢
If you agent harnesses like Pi by my buddy @badlogicgames, then this mlx-vlm release (v0.5.0) is for you 🚀 You can offload your cached context to disk and also cap it (i.e. max 50GB or 100GB). When you exceed it, we automatically trim it for so your device doesn’t run out of…
中文: 如果你的经纪人使用我好友@badlogicgames 的 Pi 之类的工具,那么这个 mlx-vlm 版本(v0.5.0)就是为你而动的。 您可以将缓存的上下文卸载到磁盘,并将其封盖(例如最大50GB或100GB)。 当你超过它时,我们会自动修剪它,以免你的设备耗尽......
Prompt caching with SSD offloading now active and works with all your favourite harnesses: Pi, Hermes, OpenCode, Claude code and more. Here is the performance of Qwen3-VL-4B-Instruct when using our new prompt caching. https://twitter.com/Prince_Canuma/status/2052144699553533983/photo/1
中文: 使用SSD卸载的快速缓存现已生效,可与您喜爱的所有线束配合使用:Pi、Hermes、OpenCode、Claude 代码等。 以下是使用我们新的提示缓存时Qwen3-VL-4B-Instruct的性能。
The community is truly amazing! This release had 21 contributors out of which 18 are new 🔥🙌🏽
中文: 这个社区真是太棒了! 本新闻稿有21名贡献者,其中18名为新贡献者🔥🙌🏽
mlx-vlm v0.5.0 is here 🚀 This is the largest release ever 🙌🏽 → Continuous batching server + KV cache quantization → MTP and DFlash speculative decoding (single, batch, server) → Distributed inference: Qwen3.5, Kimi K2.5 & K2.6 → Prompt caching w/ warm-disk persistence… https://twitter.com/Prince_Canuma/status/2052138203302510984/photo/1
中文: mlx-vlm v0.5.0 已到此处 🚀 这是有史以来最大的一次发布🙌🏽 → 连续批处理服务器 + KV缓存量化 → 单、批次、服务器) MTP 和 DFlash 推测解码 → 分布式推理:Qwen3.5、Kimi K2.5 和 amp;K2.6 → 提示缓存,并具有保暖服务...
RT @TalSchuster: And beautiful benchmarks from @Prince_Canuma with MLX on Apple silicon https://x.com/i/status/2051716011892605017
中文: RT @TalSchuster:@Prince_Canuma 在苹果硅片上使用 MLX 的精美标杆
RT @TalSchuster: We've just released open source MTP style drafters for Gemma 4 models ⚡ Now Gemma 4 models are even faster on your choice…
中文: RT @TalSchuster:我们刚刚发布了针对Gemma 4型号的开源MTP风格选秀机型⚡ 现在,Gemma 4 型号在您选择上更加快速......
RT @TalSchuster: @Prince_Canuma @GoogleDeepMind Thanks for the great partnership! This was super helpful
中文: RT @TalSchuster:@Prince_Canuma @GoogleDeepMind 感谢大家的精彩合作!这非常有帮助
RT @utkuevci: @Prince_Canuma @GoogleDeepMind Thank you Prince for making this happen and demonstrating how batch processing is key for enab…
中文: RT @utkuevci:@Prince_Canuma @GoogleDeepMind 感谢 Prince 实现这一成就,并演示批处理对 enab 至关重要......
Btw these were taken from M3 Max not Ultra 😅 Ultra will be much faster 🚀 Been cooking on two machines and messed up provenance
中文: 这些是从M3 Max 而不是 Ultra 上取的 😅 Ultra 会快得多 🚀 一直在两台机器上做饭,并弄乱了出处
Congratulations to @GoogleDeepMind on the launch of Gemma 4 Multi-Token-Prediction Drafters 🎉🚀 Happy to have partnered with them for Day-0 support on MLX The new drafters accelerate both single and batch requests by upto 3x. Here is a graph showing how different block… https://twitter.com/Prince_Canuma/status/2051716011892605017/photo/1
中文: 祝贺@GoogleDeepMind推出Gemma 4多代币预测绘图员🎉🚀 很高兴能与他们合作,在MLX上支持Day-0 新的起草者将单批次和批次请求的加速度提高至3倍。 这是一张图表,显示了不同块的用途......
I’m cooking 👨🏾‍🍳
中文: 我在做饭👨🏾 🍳
Congrats to the @kyutai_labs team on the release of Multilingual Pocket TTS. Day-0 support on MLX-Audio in both our Python and Swift SDK
中文: 祝贺@kyutai_labs团队推出多语言版Pocket TTS。 在 Python 和 Swift SDK 中,在 MLX-Audio 上支持 Day-0
RT @ben_burtenshaw: made this video on how to win humanity's last hackathon. remember, this hackathon is on context engineering, not code.…
中文: RT @ben_burtenshaw:制作了这段关于如何赢得人类最后一场黑客马拉松的视频。请记住,这个黑客马拉松是在语境工程上进行的,而不是代码。......
RT @neural_avb: Yooo this PR on native structured output generation support got merged into mlx-vlm. Thanks @Prince_Canuma for all the fee…
中文: RT @neural_avb: 这款原生结构化输出生成支持的PR被合并为 mlx-vlm。 谢谢@Prince_Canuma,感谢所有费用......
RT @osanseviero: Gemma 4 was released just a few weeks ago. Since then, it has been downloaded over 50 million times and there are almost…
中文: RT @osanseviero:Gemma 4 几周前刚刚发布。 从那以后,它已被下载超过5000万次,几乎出现了......
RT @vincentweisser: RL just works across almost any verifiable domain Epic work from our RL residents across continual learning, automatin…
中文: RT @vincentweisser:RL 几乎可验证的域名都可运行 来自RL居民的史诗级作品,通过持续学习、自动化...
Ran Nemotron 3 Nano Omni (Q4) on M3 Ultra using MLX-VLM at 138 tok/s 🚀 https://twitter.com/Prince_Canuma/status/2049535667625939447/photo/1
This hackathon is going to be wild. Build the fastest Metal kernels for Apple Silicon 🚀 Ben pitched me on this idea in person a couple weeks ago, happy finally seeing it happen!
中文: 这场黑客马拉松将会很疯狂。 为苹果硅构建最快的金属内核 🚀 几周前,本亲自向我提出了这个想法,终于看到这件事发生了,我很开心!
Congratulations to @NVIDIAAI on launching Nemotron 3 Nano Omni It comes with day-0 on MLX 🚀 Highlights: • • Video+speech comprehension • Graphical User Interface (GUI) • Optical Character Recognition (OCR), and speech transcription. Model collection 👇🏽…
中文: 祝贺 @NVIDIAAI 推出 Nemotron 3 Nano Omni MLX 上附带第 0 天 🚀 亮点: • • 视频+语音理解 • 图形用户界面(GUI) • 光学字符识别(OCR)和语音转录。 模型收藏 👇🏽...
Coming to MLX 🚀
中文: 来到MLX 🚀
RT @ivanfioravanti: Canuma did it again! 🚀
中文: RT @ivanfioravanti:Canuma 再次做到了!🚀
Who’s going to carry the boats?! Stay hard! https://twitter.com/Prince_Canuma/status/2049172088263463205/photo/1
中文: 谁来载船? 保持努力!
Made a little Space Invaders game on my Mac 🚀 Model: @poolsideai Laguna XS.2 on MLX Harness: Pi by @badlogicgames Took a few rounds of back-and-forth, not one-shot, but a fun first run with their debut OSS model. https://twitter.com/Prince_Canuma/status/2049171902694916403/video/1
中文: 在我的 Mac 上制作了一款小型太空入侵游戏 🚀 模特:@poolfsideai 拉古纳 XS.2 在 MLX 上 配色:@badlogicgames 的 Pi 进行了几轮来回对决,并非一击,而是凭借首发OSS模式进行了有趣的首发。
RT @eisokant: Today we’re shipping Laguna M.1 and Laguna XS.2 – our first public models. We’re also shipping our agent harness and a previe…
中文: RT @eisokant:今天我们寄送 Laguna M.1 和 Laguna XS.2——这是我们首批公开发布的车型。我们还在运送我们的代理线束和预装服务......
RT @eisokant: @Prince_Canuma @poolsideai Thank you so much Prince and the whole team! You guys have been amazing day 0 partners to us. Feel…
中文: RT @eisokant:@Prince_Canuma @poosideai 非常感谢 Prince 和整个团队!你们对我们来说是非常棒的0日伴侣。感觉......
Day-zero support for Laguna XS.2 in MLX🔥🚀 @poolsideai’s first open-weight model is now supported in MLX. 33B total params, 3B activated, built for agentic coding, and running natively on Apple Silicon. Huge thanks to team at Poolside for the early collaboration 🙌🏽 Heads up:…
中文: MLX🔥🚀 中对拉古纳 XS.2 的日零支持 @poolsideai 的首款敞轻量级车型现已在 MLX 中得到支持。 33B 总参数,3B 激活,用于代理编码,并在 Apple Silicon 上原生运行。 非常感谢Poolside团队的早期合作🙌🏽 抬头:
RT @poolsideai: We’d love feedback from the developer and research community as we keep improving Laguna XS.2 and the stack around it. Lag…
中文: RT @poosideai:我们很乐意从开发者和研究界的反馈,不断改进 Laguna XS.2 及其内部的改进。 拉格......
RT @MaziyarPanahi: Update: it's out. 🔥 OpenAI's privacy-filter retrained on @nvidia's Nemotron-PII data. 8 → 50+ labels. Healthcare + ente…
中文: RT @MaziyarPanahi:更新:已发布。🔥 OpenAI 的隐私筛选器在 @nvidia 的 Nemotron-PII 数据上进行了重新训练。8 → 50 多个标签。医疗保健 + 入口......
RT @kernelpool: MiMo-V2.5-Pro support in mlx-lm: https://github.com/ml-explore/mlx-lm/pull/1219
Shoutout to @0xClandestine for the quick fix! Was literally driving 300KM back from an appointment to fix this. @0xClandestine: “Lemme know if you need anything for your PR”💥 This is why I love this community.
中文: 向@0xClandestine 快速解决! 正从预约时开车返回300KM来解决这个问题。 @0xClandestine:“请让Lemme知道是否需要任何公关内容”💥 这就是我热爱这个社区的原因。
RT @art_zucker: Reading @deepseek_ai 's v4 paper.... absolute hats off. Every problem has a mathematical solution, nothing is left to cha…
中文: RT @art_zucker:阅读 @deepseek_ai 的 v4 论文......绝对令人心碎。 每个问题都有数学解决方案,没有什么可以解决的......
😂🙌🏽🔥
Benchmark it and let me know how it scores :)
中文: 将其基准,然后告诉我它是如何得分的 :)
I’m truly humbled to have pleasure of working with the companies I most admire in the world ❤️
中文: 我感到非常荣幸能够与世界上最敬佩的公司合作❤️
RT @Prince_Canuma: Model is available here @simonw @ivanfioravanti https://huggingface.co/mlx-community/DeepSeek-V4-Flash-2bit-DQ
中文: RT @Prince_Canuma:型号可在此处获取 @simonw @ivanfioravanti
Upload done after almost 7hours 😅
中文: 将近7小时后完成上传😅
Model is available here @simonw @ivanfioravanti https://huggingface.co/mlx-community/DeepSeek-V4-Flash-2bit-DQ
中文: 型号可在此处购买 @simonw @ivanfioravanti
Well done my friend! 🙌🏽 I watched the training saga on Strava
中文: 干得好,我的朋友!🙌🏽 我在斯特拉瓦上观看了训练传奇
RT @LyalinDotCom: Prince keeps doing amazing work, follow him if you're into offline models!
中文: RT @LyalinDotCom:普林斯不断做着出色的工作,如果你进入线下模式,请关注他!
Sorry guys, I don’t have fiber optic in my area so during the day it speed plummets but at night (in a few hours it will go upto +100MB/s upload speed)😅 Bear with me! Cc: @simonw https://twitter.com/Prince_Canuma/status/2048414721708040590/photo/1
中文: 抱歉,我的区域没有光纤,所以白天速度会骤降,但晚上(几小时内上传速度会提高到100MB)。 跟我一起看! cc:@simonw
Which starlink setup should I get? Small, medium or big antenna? Is it any good at all? My area doesn’t have fiber optic so most model downloads and quant uploads take +6h.
中文: 我应该使用哪种星链? 小天线、中天线还是大天线? 完全有好处吗? 我的区域没有光纤,因此大多数模型下载和量子上传都需要+6小时。
RT @sharaff: Bro’s boiling the cauldron like Getafix for Obelix 4 parallel DeepSeek-V4-Flash agents cooking games on M3 Ultra at 30-34 tok…
中文: RT @sharaff:兄弟为奥贝利克斯烤了像《Getafix》那样的大锅 4 个并行的 DeepSeek-V4-Flash 代理在 M3 Ultra 上以 30-34 托克进行烹饪游戏......
DeepSeek-V4-Flash can run on 128GB Mac's 🚀 DeepSeek-V4-Flash-2bit-DQ coming to the mlx-community HF! It's a Q2 mixed dynamic quant recipe (Q2 experts and Q4 the rest) thanks to @antirez's tip (90GB on disk). https://huggingface.co/mlx-community/DeepSeek-V4-Flash-2bit-DQ https://twitter.com/Prince_Canuma/status/2048388876251631782/photo/1
中文: DeepSeek-V4-Flash 可在 128GB 的 Mac 上运行 🚀 DeepSeek-V4-Flash-2bit-DQ 即将进入 mlx-community HF! 这款Q2混合式动态量子配方(Q2专家,其余部分),得益于@antirez的提示(盘式90GB)。
All great things started small 🍃
中文: 所有伟大的事物都始于微小 🍃
RT @0xClandestine: Was amazing to collab with a legend in the space! @Prince_Canuma
中文: RT @0xClandestine:能与这个领域的传奇人物合作真是太棒了! @普林克·卡努马
The community can now download pre-quantized weights from MLX community repo on HF thanks to @LambdaAPI Model collection: https://huggingface.co/collections/mlx-community/deepseek-v4
中文: 社区现在可以通过 @LambdaAPI 在 HF 上下载来自 MLX 社区版的预量化权重 模型收藏:
DeepSeek-V4-Flash powering 4 parallel agents on Pi (by @badlogicgames) 🚀 Running on M3 Ultra at ~30-34 tok/s and 160-187GB peak URAM using MLX-LM. Special shoutout to @0xClandestine, @pcuenq, @kernelpool, @ivanfioravanti and others for helping optimize and shape this PR. PR:… https://twitter.com/Prince_Canuma/status/2048347742750064926/video/1
中文: DeepSeek-V4-Flash 为 Pi 上的 4 个并行代理提供动力(由 @badloggeames 提供)🚀 运行M3 Ultra,价格约为30-34 点,使用 MLX-LM 运行时达到 160-187GB 峰值 URAM。 特别向@0xClandestine、@pcuenq、@kernelpool、@ivanfioravanti 等人大声呼喊,帮助优化和塑造本公关。 公关:
Best comeback 🤣🙌🏽
中文: 最佳复出🤣[EE]🏽
How about 30 tok/s ? 😎 https://twitter.com/Prince_Canuma/status/2048129857918292081/photo/1
中文: 30个Tok/s怎么样?😎
Alongside @0xClandestine we got it from 26 to ~30-32 tok/s 🚀 Will cleanup and merge this afternoon https://twitter.com/Prince_Canuma/status/2048001850415210740/photo/1
中文: 与@0xClandestine一起,我们从26到约30-32 kok/s获得了它。🚀 今天下午将进行清理和合并
DeepSeek V4 MLX Quants now on MLX community HF repo, Made possible by @LambdaAPI and @TheZachMueller ❤️ Without a GPU cluster it would take me a week to upload the quants… Model collection 👇🏽
中文: DeepSeek V4 MLX Quants 现已在 MLX 社区 HF 版上使用 由 @LambdaAPI 和 @TheZachMueller 实现 如果没有GPU集群,我需要一周时间才能上传量子...... 模型收藏 👇🏽
RT @jelveh: Wow - great improvement!! 🙏🔥
中文: RT @jelveh:哇——非常改进!!🙏🔥
8bit as well :) https://twitter.com/Prince_Canuma/status/2047776492898165156/photo/1
中文: 8位以及 :)
Woke up saying "OMG, DeepSeek-v4 is out! But it's been 5h so maybe I should not move a finger..." Then MLX King in me said, it's go time the M3 Ultra is ready! https://twitter.com/Prince_Canuma/status/2047771027325759740/photo/1
中文: 醒来后说:“OMG,DeepSeek-v4 已经出局了!但已经五点了,所以也许我不应该动手指...... 然后我说,M3 Ultra 准备好了,M3 King 已经准备好了!
DeepSeek-v4 now runs at ~23-26 tok/s on MLX! I made some custom kernels for the sinkhorn and it took gen speeds for 17 -> 26 tok/s. The weights are also significantly smaller thanks to @pcuenq tip about keeping the experts in MXFP4! Now you can use it to power your local… https://twitter.com/Prince_Canuma/status/2047768990798184630/video/1
中文: DeepSeek-v4 现在在 MLX 上以 ~23-26 次的速度运行! 我为下沉角做了一些定制的内核,其代速为17 -> 26 次。 重量也明显缩小,这要归功于 @pcuenq 关于让专家留在 MXFP4 中的建议! 现在你可以用它来为本地供电了......
Told you don’t mind the speed 😎 Let me continue cooking! https://twitter.com/Prince_Canuma/status/2047741497034829878/photo/1
中文: 告诉你不要介意速度😎 让我继续做饭!
They don't know me, son! https://twitter.com/Prince_Canuma/status/2047707468440953032/photo/1
中文: 他们不认识我,儿子!
RT @ulusoyapps: Stage is on 🔥 @Firebase @MiguelRamosPM @thatfiredev https://twitter.com/ulusoyapps/status/2047699857490735142/photo/1
中文: RT @lusoyapps:舞台已登上🔥 @Firebase @MiguelRamosPM @thatfiresev
It’s even faster now https://twitter.com/Prince_Canuma/status/2047697692554301517/photo/1
中文: 现在速度更快了
Quants on they way, courtesy of @TheZachMueller and @LambdaAPI 🚀 https://twitter.com/Prince_Canuma/status/2047693737950670940/photo/1
中文: 途中的量子点由 @TheZachMueller 和 @LambdaAPI 🚀 提供
DeepSeek-V4 running on M3 Ultra 🚀 Don't mind the speed, that's gonna improve soon. https://twitter.com/Prince_Canuma/status/2047689914028888125/video/1
中文: 运行在M3 Ultra上的DeepSeek-V4 🚀 别介意速度,很快就会好转。
You can now run DeepSeek4-Flash on 256GB Mac. Next up speed 🚀 PR: https://github.com/ml-explore/mlx-lm/pull/1192 https://twitter.com/Prince_Canuma/status/2047685898163147125/photo/1
中文: 现在你可以在 256GB 的 Mac 上运行 DeepSeek4-Flash。 下一步速度 🚀 公关:
Ported DeepSeek-V4 to MLX 🔥 There still lots to optimize but it’s work well https://twitter.com/Prince_Canuma/status/2047675693358526763/photo/1
中文: 将 DeepSeek-V4 移植到 MLX 🔥 还有很多需要优化的,但工作得很好
History repeats itself! Frontier labs refuses to access and/or OSS their models because it was "too dangerous to release" Now we have and can run OSS models with the similar performance at home 😎
中文: 历史在重演! 前沿实验室拒绝访问和/或开放其模型,因为“发布起来太危险” 现在我们已经拥有并可以运行具有类似性能的OSS模型 😎
Attention This is not a drill🤯 Coming to MLX already
中文: 注意 这不是演习🤯 已进入MLX
An absolute must read PR by @lllucas! Beautiful and elegant code 🙌🏽 We froze features to trimming down dependencies and tidy up across the board. Next release is gonna be fire 🚀 https://github.com/Blaizzy/mlx-audio/pull/672
中文: 绝对必读公关,由@lllucas提供! 精美优雅的代码 🙌🏽 我们冻结了功能,以缩小依赖性并全面调整。 下一次发布将会起火🚀
🤣 I hate ugly designs https://twitter.com/Prince_Canuma/status/2047434260324233695/photo/1
中文: 🤣 我讨厌丑陋的设计
Fixed it! I traced down the bottle neck to the roll-back mechanism, because even at ~3 acceptance rate my drafter should be faster than baseline. After the fix it’s looking much better and clearly show the work with did with multimodal training. Shall I release my Qwen3.5-4B… https://twitter.com/Prince_Canuma/status/2047424180950532186/photo/1
中文: 修复它! 我沿着瓶颈追溯到回滚装置,因为即使以 ~3 接受率,我的 degraft 也应该比基线更快。 修复完成后,它看起来要好得多,并清楚地展示了多式联运训练的工作。 我应该发布我的Qwen3.5-4B吗......
RT @trycua: We're open-sourcing Cua Driver - our new macOS driver that lets any agent (Claude Code, Codex, your own loop) drive any app in…
中文: RT @trycua:我们正在开源 Cua Driver——我们的新版 macOS 驱动程序,允许任何代理(Claude Code、Codex,您自己的循环)在...中驱动任何应用程序
RT @swyx: looks like new Pareto frontiers across everything: - Context: 400K context in Codex and a 1M in API - API Pricing: $5/m input a…
中文: RT @swyx:看起来就像在一切事物上的新帕雷托前沿: - 上下文:Codex 中的 400K 上下文和 API 中的 1M 环境 - API 价格:500 万美元输入 a...
RT @b_ostrov: @djdlc_1895 @Prince_Canuma @OpenAI @Prince_Canuma The first in the pinned browser bookmarks in GitHub and huggingface
中文: RT @b_ostrov:@djdlc_1895 @Prince_Canuma @OpenAI @Prince_Canuma 是 GitHub 中第一个被固定的浏览器书签和 hubgingface
@ivanfioravanti The results show better on vLLM and CUDA. On apply silicon things are getting better but there is till some work to get the acceptance higher. I trained these as a test with less than ~10% of the data Z-lab used on my RTX6000. I’m scaling it up into a larger run and will… https://twitter.com/Prince_Canuma/status/2047351794356126173/photo/1
中文: @ivanfioravanti 在 vLLM 和 CUDA 上的效果更好。 在应用硅方面,情况正在好转,但直到一些工作才能提高接受度。 我用RTX6000上使用的Z-实验室数据不到10%来训练这些测试。 我正在将其扩展为更大的运行,并且会......
RT @jtdavies: Yesterday I had the privilege of giving the keynote at MLCon in Amsterdam, a talk entitled “The AI Smörgåsbord”. Very much p…
中文: RT @jtdavies:昨天,我有幸在阿姆斯特丹的MLCon上发表了题为“人工智能的斯莫戈博德”的演讲。 非常多......
RT @djdlc_1895: @Prince_Canuma @OpenAI Btw, people SHOULD bookmark this as their browsers start page 😂 https://huggingface.co/prince-canuma/activity/all
中文: RT @djdlc_1895:@Prince_Canuma @OpenAI Btw,人们应该在浏览器开始页面时添加此书😂
RT @martinamps: I've started handling all of my e-mails with claude code via `gws` and it's amazing... I wasn't so good at keeping up with…
中文: RT @martinamps:我已经开始通过 `gws` 代码处理所有带有 `gws` 代码的电子邮件,这真是太棒了......我不太擅长跟上......
Congratulations @OpenAI on the release of privacy filter! It comes with day-0 support on MLX, now developers can run PII filtering and more completly on-device. PR will be merged in a couple minutes 🚀 https://twitter.com/Prince_Canuma/status/2047063252161446163/photo/1
中文: 祝贺@OpenAI发布隐私筛选! 它支持MLX的第0天,现在开发者可以运行PII过滤功能,并在设备上更完全地进行。 PR 将在几分钟内合并 🚀
Coming to MLX 🚀
中文: 来到MLX 🚀
Happy birthday @swyx 🎉🎊 https://twitter.com/Prince_Canuma/status/2046980846679199762/photo/1
中文: 生日快乐 @swyx 🎉🎊
RT @MaziyarPanahi: HIPAA and GDPR without cloud. On iPhone. OpenMedKit runs 200+ PII models. Now adding GLiNER: 90+ state-of-the-art zero-…
中文: RT @MaziyarPanahi:无云版 HIPAA 和 GDPR。在iPhone上。 OpenMedKit 运行 200 多种 PII 型号。现在新增GLINER:90多种最先进的零...
Coming to MLX 🚀
中文: 来到MLX 🚀
RT @angeloskath: I am quite excited about making JACCL a standalone lib. The power of open-source is that we can all benefit from shared-e…
中文: RT @angeloskath:我对让JACCL成为独立自由人感到非常兴奋。 开源的力量在于,我们都可以从共享资源中受益。
Shout to @lllucas and @beshkenadze for flagging and moving quick on this! Parakeet is now significantly faster on mlx-audio and mlx-audio-swift (soon) https://twitter.com/Prince_Canuma/status/2046745147581342182/photo/1
中文: 向@lllucas和@beshkenadze大喊大叫,以获取快速进展! 现在,在mlx-audio和mlx-audio-swift(即将)上,Parakeet的速度明显更快。
This was one of the best conversations I had the pleasure of attending 🔥🙌🏽
中文: 这是我有幸参加🔥🙌🏽的最佳对话之一
Tokenmaxing be like 🤣 https://x.com/mytechceoo/status/2046664705176187180/video/1
中文: 代币最大化就像🤣
RT @maximelabonne: Big announcement: I'm releasing a new book! 📙📙📙 Over the years, I've talked to a lot of professionals and students abou…
中文: RT @maximelabonne:重大公告:我将发布一本新书!📙📙📙 多年来,我与许多专业人士和学生进行了交谈......
RT @yoobinray: mfers will do anything for stars on github https://twitter.com/yoobinray/status/2046343233114931530/photo/1
中文: RT @yoobinray:mfers 将在 github 上为明星做任何事情
Yap
RT @aiDotEngineer: 🆕Gemma, DeepMind's Family of Open Models https://www.youtube.com/watch?v=_gVFUEdhCyI In the first ever public talk after the Gemma 4 launc…
中文: RT @aiDotEngineer:🆕 Gemma,DeepMind的开放模型家族 在《Gemma 4》系列之后首次公开演讲中......
Tim Cook is an absolute legend! Not only did he step up, he evolved Apple into a phenomenal position as a global leader. Thank you for the incredible work, @tim_cook 🙏
中文: 蒂姆·库克是一位绝对传奇! 他不仅挺身而出,还将苹果公司发展为全球领导者的非凡地位。 感谢您所做的精彩工作,@tim_cook 🙏
The cat is out of the bag! And a wise choice 🔥
中文: 猫已经出局了! 以及明智的选择 🔥
RT @ivanfioravanti: Kimi K2.6 running locally on 2 x M3 Ultra 512GB! 😱 1T parameters model! 😱😱
中文: RT @ivanfioravanti:Kimi K2.6 本地运行,配备 2 个 M3 Ultra 512GB 版本!😱 1T 参数模型!😱😱
RT @ivanfioravanti: @Prince_Canuma @Kimi_Moonshot @pcuenq mlx-vlm is top! 🚀 testing the server on my side 😎 stay tuned!
中文: RT @ivanfioravanti:@Prince_Canuma @Kimi_Moonshot @pcuenq mlx-vlm 是顶级的!🚀 测试我这边的服务器 😎 敬请关注!
Congratulations to @Kimi_Moonshot for releasing Kimi K2.6! It comes with day-0 support on mlx-vlm and you can run the 1T param VLM using two M3 Ultras thanks to the awesome @pcuenq 🔥🚀
中文: 祝贺 @Kimi_Moonshot 发布 Kimi K2.6! 它在 mlx-vlm 上支持 day-0,使用两个 M3 Ultra 来运行 1T 参数 VLM,这得益于出色的 @pcuenq 🔥🚀
RT @ivanfioravanti: oMLX is working really well as single machine inference engine for coding agents! Caching is managed perfectly (it can…
中文: RT @ivanfioravanti:oMLX 作为单个机器推理引擎,对于编码代理来说确实非常有效! 缓存管理得非常完美(可以......)
RT @MaziyarPanahi: Stack: → Falcon Perception (green, near-water, trucks) → SAM 3.1 open-vocab (aircraft) Every detection is a text prompt…
中文: RT @MaziyarPanahi:堆栈: → 猎鹰感知(绿色、近水、卡车) → SAM 3.1 开放式伏特加(飞机) 每一次检测都是文本提示......
Man this guy is cooking the craziest grounded reasining examples using MLX-VLM🚀 I think we are ready to power games locally
中文: 这家伙正在用MLX-VLM🚀,制作最疯狂的接地气的制作实例 我认为我们已准备好在当地为游戏供电
If you don't understand the code, AI won't save you. Watched the newest Opus 4.7 struggle with an obvious issue in a codebase full of similar patterns I already established years ago. After I pointed it at a matching architecture, done in seconds. Expertise is still the moat. https://twitter.com/Prince_Canuma/status/2045819896055947296/photo/1
中文: 如果你不懂代码,人工智能就无法拯救你。 观看了最新的Opus 4.7,在一个充满多年前我早已建立的类似模式的代码库中,遇到了一个显而易见的问题。 之后,我将其指向一个匹配的架构,几秒钟内完成。 专业知识仍然是护城河。
My home compute for MLX and research: • M3 Ultra — 512GB (sponsored by community + @wai_protocol) • RTX PRO 6000 — 96GB (sponsored by @jelveh / https://t.co/AuEh8djTOa) • M3 Max — 96GB Every model I port, every kernel I tune, every release I ship gets stress-tested here… https://twitter.com/Prince_Canuma/status/2045802261419352108/photo/1
中文: 我的家庭计算机器学习和研究: • M3 Ultra — 512GB(由社区+@wai_protocol 赞助) • RTX PRO 6000 — 96GB(由 @jelveh 赞助 / • M3 Max — 96GB 每个我移植的型号,每一首我调音的内核,以及我发布的每个版本,都会在这里接受压力测试......
We’re living in interesting times. Traveled ~300km from home. Left a Claude Code session running on my M3 Ultra to test continuous batching across all models (2TB of weights) and check for regressions. Overnight the M3 Ultra auto-updated, restarted, and killed both my session… https://twitter.com/Prince_Canuma/status/2045781748571681231/photo/1
中文: 我们生活在有趣的时代。 离家约300公里。在我的M3 Ultra上运行一个Claude Code会话,测试所有模型(2TB权重)的连续批处理,并检查回归。 隔夜,M3 Ultra 自动更新,重新启动,并导致我的两次训练都被击碎了......
RT @ActuallyIsaak: Introducing the MLX-Benchmark Suite!! https://github.com/Goekdeniz-Guelmez/MLX-Benchmark The first comprehensive benchmark for evaluating LLMs on…
中文: RT @ActuallyIsaak:推出MLX-Benchmark套件!! 首个用于在......上评估LLM的综合基准
Unlocking even more perf 😤🚢 https://twitter.com/Prince_Canuma/status/2045098671008575691/photo/1
中文: 解锁更多 😤🚢
RT @N8Programs: Qwen3.6 4bit DWQ now up on MLX, uses custom quantization scheme (4bit MLP 8bit everything else) + DWQ for additional gains.…
中文: RT @N8Programs:在MLX上显示的Qwen3.6 4位DWQ,采用自定义量化方案(4位MLP 8bit其他所有)+ DWQ以获取额外收益。......
My wife: “You should journal more, our kids will benefit from your life lessons while they are fresh.” Me: “Yeah, I will do it. I don’t have time… then tweets and validates wife requests”
中文: 我妻子:“你应该多写日记,我们的孩子在新生时会从你的人生课程中受益。” 我:“是的,我会去做。我没有时间......然后发推文并验证妻子的请求
RT @jelveh: Fantastic work by @Prince_Canuma - you rock!!
中文: RT @jelveh:@Prince_Canuma 的出色作品——太棒了!
RT @neural_avb: MLX bros and sises - DON’T miss this guy’s next post! Youll be able to do parallel and async requests to mlx vlm server af…
中文: RT @neural_avb:MLX 兄弟与丝——别错过这家伙的下一篇帖子! 您可以对 mlx vlm 服务器进行并行和同步请求。
The cat is out of the bag Dflash + continuous batch is coming as well. The current draft models are best with text-only inputs. https://twitter.com/Prince_Canuma/status/2044912770718511486/photo/1
中文: 猫已经出柜了 Dflash + 连续批次也即将到来。 当前的草稿模型最好使用仅使用文本输入。
When y’all realise how much I cooked here this will blow up
中文: 当你们都意识到我在这里做饭的多少时,就会大吃一惊
RT @dreamworks2050: @Prince_Canuma This is too much bro. 😭
中文: RT @dreamworks2050:@Prince_Canuma 这太过分了。😭
RT @dreamworks2050: MLX-VLM FEELS ILLEGAL to use 🔥 💀 🔥
中文: RT @dreamworks2050:MLX-VLM 使用 🔥 💀 🔥
Y'all aint ready for local multimodal coding agents on your Mac! Coming to a Mac near you tomorrow :) https://twitter.com/Prince_Canuma/status/2044883144982028292/photo/1
中文: 准备好在 Mac 上使用本地多模式编码代理! 明天来到你附近的一家Mac上 :)
Next mlx-vlm release will ship with continuous batching support on the server 🚀 What's coming: → Continuous batching — new requests join the active batch immediately, no waiting. Mixed image + text batches supported → OpenAI-compatible API — field-for-field match with… https://twitter.com/Prince_Canuma/status/2044882569020518746/video/1
中文: 下一个mlx-vlm版本将在服务器上持续发货🚀 即将发生的事情: → 连续批处理——新请求立即加入激活批次,无需等待。支持混合图像 + 文本批次 → 兼容 OpenAI 的 API — 字段匹配...
Great job guys, I expected as much!
中文: 工作很棒,我期望的一样!
RT @Ridheshdabhi: Cracked devs be like…
中文: RT @Ridheshdabhi:破解的开发者就像......
Awesome release, congrats to the. @PrismML team! It comes with day-0 support on MLX thanks to some of the work we did with bitnet-1.58 kernels a year ago. https://huggingface.co/collections/prism-ml/ternary-bonsai
中文: 精彩的发布,恭喜。@PrismML 团队! 得益于我们一年前使用 bitnet-1.58 内核所做的一些工作,它在 MLX 上支持了 Day-0。
👀
❤️
Haha, it's not that easy! I have my skills and they save me time but they are far from replacing me.
中文: 哈哈,这可不是那么容易! 我掌握自己的技能,他们为我节省时间,但远未取代我。
Congrats to @pcuenq, it was awesome to collaborate and trade notes on this! One of the most exciting ideas that came up thanks to the level of quality coding agents have achieved. I have been running a similar workflow since earlier this year and it saves me hours to sometimes…
中文: 恭喜@pcuenq,合作并就此进行交易真是太棒了! 得益于高质量编码代理的水平,提出了最令人兴奋的想法之一。 从今年早些时候开始,我一直在运行类似的工作流程,这为我节省了时间,有时甚至......
When I get too comfortable with a model, then boom 💥 provider drops a new iteration 🙌🏽
中文: 当我对一个模型过于自在时,那么 boom 💥 供应商就会放弃新版本 🙌🏽
This is the most detailed benchmark I have seen on my recent implemetations of TriaAttention + TurboQuant
中文: 这是我最近对TriaAttention + TurboQuant的影响最详细的基准
Coming to MLX 🚀
中文: 来到MLX 🚀
RT @elliotarledge: @Prince_Canuma wow! i'll be coming back to this from time to time to refresh myself. i think the education thing is pret…
中文: RT @elliotarledge:@Prince_Canuma 哇!我会时不时地回到那里来重新振发自我。我觉得教育问题很精彩......
My wife: “You should journal more, our kids will benefit from your life lessons while they are fresh.” Me: “Yeah, I will do it. I don’t have time… then tweets and validates wife requests”
中文: 我妻子:“你应该多写日记,我们的孩子在新生时会从你的人生课程中受益。” 我:“是的,我会去做。我没有时间......然后发推文并验证妻子的请求
RT @_karthik: > filling forms on the web sucks! you're usually giving the same info over and over again > so i made clacky, inspired by cl…
中文: RT @_karthik: 网上填写表格很糟糕!你通常会一遍又一遍地提供相同的信息 等等,我制作了clacky,灵感来自cl...
TriAttention MLX benchmark run on the full MATH500 is done after ~30h. We ran Gemma4-26B (5-bit) on M3 Ultra with KV cache budgets of 512, 1024, and 2048: → TA-2048: 76.6% vs 77.4% baseline — 4 problems lost out of 500 (-0.8%) → TA-1024: 75.6% — 9 problems lost (-1.8%) →… https://twitter.com/Prince_Canuma/status/2044539341762933040/photo/1
中文: 完整 MATH500 上的 TriAttention MLX 基准运行时间在 ~30 小时后完成。 我们在M3 Ultra上运行了Gemma4-26B(5位),KV缓存预算为512、1024和2048: → TA-2048:76.6% 对 77.4% 基线——在 500 个基准基准中丢失了 4 个问题(-0.8%) → TA-1024:75.6%——9个问题丢失(-1.8%) → . . .
RT @kyutai_labs: We're releasing OVIE, a novel view generation model trained entirely on single images. No multi-view datasets needed. Giv…
中文: RT @kyutai_labs:我们将推出OVIE,这是一种完全基于单一图像训练的新型视图生成模型。无需多视图数据集。 吉夫......
Makes me happy to hear this!❤️ Reminds me of how much fastMLX would have been great. We are cooking something new…
中文: 听到这个让我很开心!EE0]️ 让我想起 fastMLX 会有多棒。 我们正在烹饪一些新东西......
I will replace my dash cam analysis with Gemma 4 + Falcon Perception 🔥🙌🏽
中文: 我将用Gemma 4 + Falcon Perception 🔥🙌🏽 替换我的仪表场分析
Well done to the z-lab team 🔥🚀
中文: 对z-lab团队做得很好🔥🚀
❤️
This is running using MLX-VLM 😎 https://github.com/Blaizzy/mlx-vlm/pull/926
中文: 这是使用MLX-VLM 😎运行的
I have friends in high places 😎
中文: 我在高处有朋友😎
In case you missed 😎
中文: 如果错过了😎
Well said!
中文: 说得好!
👀
RT @WolframRvnwlf: My @aiDotEngineer Europe 2026 Highlight Reel - personal impressions from 3 days at the world's best AI conference: https…
中文: RT @WolframRvnwlf:我的@aiDotEngineer欧洲2026年高亮卷轴——全球最佳人工智能大会3天的个人印象:https...
RT @ivanfioravanti: @Prince_Canuma This is pure power!!!
中文: RT @ivanfioravanti:@Prince_Canuma 这是纯粹的力量!!!
RT @ClementDelangue: Introducing Kernels on the Hugging Face Hub ✨ What if shipping a GPU kernel was as easy as pushing a model? - Pre-co…
中文: RT @ClementDelangue:在拥抱面部中心上介绍内核 ✨ 如果运送一个GPU内核像推一个模型一样简单呢? - 预科...
Woohoo! Congratulations to my brother for this awesome release 🚀
中文: 哇呜!祝贺我哥哥发布这个精彩版本🚀
📊 TriAttention perplexity results on Gemma4-31B (bf16, wikitext-2) using MLX-VLM TA-2048 is lossless at 1K–2K context when it activates, then degrades gracefully: • +0.46 PPL at 4K • +1.25 at 8K • +1.95 at 50K — and stabilizing, not blowing up Important nuance:… https://twitter.com/Prince_Canuma/status/2044043971391893765/photo/1
中文: 📊 使用MLX-VLM对Gemma4-31B(bf16,wikitext-2)进行TriAttention的困惑效果 TA-2048 在 1K–2K 上下文激活时无损耗,然后优雅地降解: • 4K时+0.46 PPL • 8K时+1.25 • 50公里时+1.95,稳定,不起重 重要细微差别:
🧮 MATH 500 results for TriAttention on Gemma4-26B-A4B-it (5-bit quantized, M3 Ultra 512GB) using MLX-VLM TA-2048 preserves 96% of baseline accuracy (22/30 vs 23/30) with KV cache capped at 2048 tokens, regardless of reasoning length. Throughput stays rock-solid at ~77 tok/s… https://twitter.com/Prince_Canuma/status/2044040708571410763/photo/1
中文: 使用MLX-VLM在Gemma4-26B-A4B-it(5位量子化,M3 Ultra 512GB)上的MATH 500结果 TA-2048 保留了基线精度的96%(22/30 对 23/30),其KV缓存上限为2048个代币,无论其推理长度如何。 吞吐量保持坚如磐石,网址为 ~77 次......
RT @NarayanSanath: Our Falcon Perception with Gemma4 prompting for open-vocabulary segmentation, running locally on MLX
中文: RT @NarayanSanath:我们使用Gemma4的猎鹰感知,提示在MLX本地运行的开放式词汇细分
RT @neural_avb: Got to try out this VoxCPM2 model locally. Was trying out some voice cloning with the Pytorch as well as the 4-bit MLX ve…
中文: RT @neural_avb: 可以尝试在本地使用这款 VoxCPM2 模型。 正在尝试使用Pytorch以及4位MLX ve.com进行语音克隆。
RT @altryne: Our world is changing. I spent the last week listening to, chatting, dining, dancing with and interviewing the top AI Enginee…
中文: RT @altryne:我们的世界正在发生变化。 我花了上一周时间听、聊天、用餐、跳舞,并采访了顶尖的AI Engine......
One of the videos I featured at my @aiDotEngineer talk 🔥
中文: 我在 @aiDotEngineer 演讲中展示的其中一段视频 🔥
RT @MaziyarPanahi: Gemma 4 sees a kid and three dogs. Decides what matters. Calls SAM 3.1 Mask and bounding box. Spotlight on subjects. Ba…
中文: RT @MaziyarPanahi:Gemma 4 看到一个孩子和三条狗。决定什么重要。呼叫 SAM 3.1 面具和带边框。关注主题。巴......
Absolute killer use case for MLX-VLM!🔥🙌🏽
中文: MLX-VLM绝对杀手使用案例!EE0[EE]🏽
Local grounded reasoning using MLX will power a whole new generation of use cases that were previously only available on the cloud! From satellite imagery analysis, security systems all the way to robotics. I’m really excited for the latter. I spoke at length about these… https://twitter.com/Prince_Canuma/status/2042761667017105517/video/1
中文: 使用MLX进行本地接地推理,将为此前仅在云端可用的新一代使用案例提供动力! 从卫星图像分析,安防系统一直到机器人。 我对后者感到非常兴奋。 我详细谈到了这些......
RT @osanseviero: Our first successful Gemma 4 Runtime in London with @swyx @patloeber @nick_kango @cormacb and others! 💎Great to go out for…
中文: RT @osanseviero:我们首次在伦敦成功完成Gemma 4 Runtime,与 @swyx @patloeber @nick_kango @cormacb 等同!💎 出去很棒......
RT @adrgrondin: I’m excited to announce that I’ve joined @lmstudio 👾 The team behind the app is amazing and I couldn’t be more proud. I’l…
中文: RT @adrgrondin:我很高兴地宣布,我已加入 @lmstudio 👾 这款应用背后的团队非常出色,我再为此感到自豪。 我会......
❤️
🚀🔥
Woohoo, congratulations @adrgrondin! I couldn’t imagine a better match 🚀
中文: 哇,恭喜@adrgrondin! 我无法想象会有更好的比赛🚀
Just implemented TriAttention in MLX and the results are wild! You can get up to 81% KV compression at 60K tokens for Gemma-4-31B-IT in BF16 🔥 Unlike TurboQuant, which quantizes KV cache values, TriAttention prunes low-importance tokens entirely by scoring keys using… https://twitter.com/Prince_Canuma/status/2042021304270819394/photo/1
中文: 刚刚在MLX中实施了TriAttention,结果很疯狂! 在 BF16 中,Gemma-4-31B-IT 的 60K 代币最多可获得 81% 的 KV 压缩 🔥 与 TurboQuant 使用 TurboQuant 量化 KV 缓存值不同,TriAttention 完全通过使用密钥评分来修剪低重的代币......
I’m behind them chatting with @altryne and @marlene_zw 😂🙌🏽
中文: 我跟在他们后面聊天,@altryne 和 @marlene_zw 😂🙌🏽
🚀
RT @ClementDelangue: Anthropic had the most powerful cyber-security model in the history of this world and their internal code based still…
中文: RT @ClementDelangue:Anthropic 拥有当今世界历史上最强大的网络安全模式,其内部代码代码依然存在......
RT @julien_c: We are giving away Safetensors to the @pytorch foundation (shepherded by the Linux Foundation) Our shared goal is to make th…
中文: RT @julien_c:我们将向 @pytorch 基金会(由 Linux 基金会提供)赠送 Safetensors 我们共同的目标是实现......
Ask Mythos to leak its own weights 😂 https://twitter.com/Prince_Canuma/status/2041839027217641750/photo/1
中文: 请让神话泄露自身权重 😂
RT @angeloskath: A long time coming but new mlx-lm is here with better batching support in the server and Gemma 4. pip install -U mlx-lm…
中文: RT @angeloskath:即将到来很长一段时间,但新的 mlx-lm 已在服务器和 Gemma 4 中提供了更好的批处理支持。 点胶安装 -U mlx-lm...
I love the internet! 😂 For me the most important part was OSS attempt (humbling experience) and seeing my child hood fav actress show up in an unexpected place. It’s obvious our beloved Milla knows nothing about the space, and honestly didn’t expect her too. Two things that… https://twitter.com/Prince_Canuma/status/2041612468208988354/photo/1
中文: 我热爱互联网!😂 对我来说,最重要的部分是奥西斯的尝试(令人谦卑的经历),看到我那个童子女女主角出现在一个意想不到的地方。 很明显,我们心爱的米拉对这个空间一无所知,说实话也没想到她。 两件事......
Ain’t no way your name is…
中文: 你的名字绝不是......
RT @OlivierBachem: Our goal in the Gemma team is to ship models that are useful by generalizing to unseen tasks. Hence, we are extremely s…
中文: RT @OlivierBachem:我们在Gemma团队的目标是通过通用化未知任务来提供有用的模型。因此,我们极其......
I have my visa for the UK for 6 months If you would like me to speak at your event, DMs are open 🚀
中文: 我持有英国签证六个月 如果你想让我在活动上发言,DMs 是开放的 🚀
Literally got this at 3pm today and have to fly tomorrow, Thank God!
中文: 今天下午3点终于得到了这个,明天必须飞起来。 感谢上帝!
I got my UK visa 😭❤️🙌🏽 UK and @aiDotEngineer here comes the King! https://twitter.com/Prince_Canuma/status/2041574009284767856/photo/1
中文: 我收到了英国签证 😭❤ø�🙌 英国和@aiDotEngineer来了国王!
👀 will you donate the Mac Mini for the cause https://twitter.com/Prince_Canuma/status/2041524355083989460/photo/1
中文: 👀 会捐赠 Mac Mini 以获取该事业的
Medium was once a great place... I wrote my best articles there back in 2018
中文: Medium 曾经是个很棒的地方...... 我在2018年在那里写过我最好的文章
Well done guys ❤️🚀🔥
中文: 干得好,伙计们❤🚀🔥
My favourite action actress from Resident Evil and many awesome movies is doing open source ❤️ First, never saw that coming! Second, that a time to be alive and doing open source! Open source for the win 🚀
中文: 我最喜欢的《生化危机》动作演员以及许多精彩电影都在做开源的❤ 首先,永远也看不到那一幕! 其次,是时候活着并做开源了! 获胜的开源 🚀
If this works well, we are looking into a new era! Well done Anemll 🔥🙌🏽
中文: 如果这效果好,我们正着眼于一个新时代! 干得好,阿尼姆🔥🙌🏽
This example as so much alpha! You can now literally generate vision agent traces and train smaller VLM on it completly on-device 🔥 cc: @TheZachMueller @MaziyarPanahi @ivanfioravanti @ActuallyIsaak https://twitter.com/Prince_Canuma/status/2041286374431633886/photo/1
中文: 这个例子就是阿尔法! 现在,您可以完全在设备上生成视觉代理痕迹并训练较小的VLM 🔥 @TheZachMueller @MaziyarPanahi @ivanfioravanti @AclyIsaak
The best ideas are the simplest, thank you @dahou_yasser! "the idea: Gemma4 looks at the image, decides what to segment, Falcon Perception returns pixel-accurate masks + metadata (centroid, area_fraction, bbox), Gemma4 reasons on the numbers and calls the next tool or answers."…
中文: 最好的想法很简单,谢谢@dahou_yasser! 想法:Gemma4 会查看图像,决定分割内容,Falcon Perception 返回像素准确的 mages + 元数据(centroid、area_fractie、bbox),Gemma4 对数字进行推理,并调用下一个工具或答案。......
RT @nibzard: added @cohere transcribe to my small transcribing cli running natively on Apple Silicon via MLX-audio from @Prince_Canuma http…
中文: RT @nibzard:通过 @Prince_Canuma 的 @cohere 转录到我通过 MLX-audio 原生运行的 Apple Silicon 小转录中。
🫡❤️
Awesome work by @no_stp_on_snek 🔥
中文: @no_stp_on_snek 的出色作品 🔥
RT @roboflow: here's what you can build for $0.00 with 3 open source models token cost breakdown below and the company getting rich off i…
中文: RT @roboflow:以下是使用3种开源模型,售价0.00美元的版本 以下是代币成本的细分,公司从中致富......
RT @TheZachMueller: Well, that's kinda cool https://twitter.com/TheZachMueller/status/2041139872849690789/photo/1
中文: RT @TheZachMueller:嗯,这有点酷
Woohoo 🎉
中文: 伍胡 🎉
RT @MaziyarPanahi: https://x.com/i/article/2041078649185591296
Gemma 4 26B A4B IT (4bit) + M5 Max + MLX-VLM 🚀
中文: Gemma 4 26B A4B IT(4位)+ M5 Max + MLX-VLM 🚀
We exist in a corner of X 🫡
中文: 我们存在于X 🫡的一个角落
Have a new label for certain type of PRs 😤 https://twitter.com/Prince_Canuma/status/2040902161555464504/photo/1
中文: 为特定类型的公关人员(EE0)提供新标签
Awesome work! 🔥
中文: 很棒的工作!🔥
Hopefully this shines a light 💡on anyone trying to benchmark TBQ on MLX but doesn’t know how
中文: 希望这能让任何试图在MLX上对TBQ进行基准测试但不知道如何进行的人大放异彩
@zigelbaum @GoogleDeepMind “Via codex” It’s hard to test something if you don’t know how to test it yourself. I put up a benchmark script in the thread that you can use to test and have codex interpret the results for you. But before you run it, ask codex to install the changes in this branch.…
中文: @zigelbaum @GoogleDeepMind “通过密码” 如果你不知道如何自己测试,就很难测试它。 我在帖子中设置了一个基准脚本,供你用来测试,并让密码为你解释结果。 但在运行之前,请让 Codex 安装此分支中的更改。......
This is benchmark is multimodal (images + text) It has from 1 up-to 26 images in a prompt Use this PR, it has patch to enable Gemma 4 to support multiple images: https://github.com/Blaizzy/mlx-vlm/pull/938
中文: 基准是多式联运(图片 + 文本) 提示中最多可显示1到26张图片 使用此PR,它具有补丁,使Gemma 4能够支持多种图像:
Why TBQ only quantizes full-attention layers in Gemma 4 31B, not the sliding-window ones: TLDR, It’s a bad idea because the sliding layers are already memory-efficient by design. 😂 → 50 sliding layers hold a fixed ~400MB regardless of context length → 10 full-attention…
中文: 为何TBQ仅量化Gemma 4 31B中的全图层,而不是滑动窗口层: TLDR,这不太好意,因为滑动层在设计上已经具有了高能效。😂 → 50个滑动层,无论上下文长度如何,都固定在约400MB → 10个全心......
Alongside MM-NIAH I’m also running LongBench-V2 to truly showcase where TurboQuant shines which is large context ( above 60K) Running will take around 24h to complete. Meanwhile, here is a sneak peak of 6 samples across different context sizes. See you in a day or two 🫡 https://twitter.com/Prince_Canuma/status/2040881635449598238/photo/1
中文: 除了MM-NIAH之外,我还将运行LongBench-V2,以真正展示TurboQuant在大背景环境中的亮点(高于60K) 跑步大约需要24小时才能完成。与此同时,以下是不同上下文大小的6个样本的快速峰值。 一两天内见 🫡
TurboQuant: Open Evals on MLX 🔥 Yesterday I launched mlx-vlm v0.4.4 with major TurboQuant performance improvements. Today, the open benchmark results on MM-NIAH (val, 520 samples) using Gemma 4 26B IT by @GoogleDeepMind on M3 Ultra: → 0 quality loss — 78% accuracy for both… https://twitter.com/Prince_Canuma/status/2040877782922649865/photo/1
中文: TurboQuant:在MLX上打开Evals 🔥 昨天我推出了 mlx-vlm v0.4.4 ,带来了 TurboQuant 的显著性能改进。 今天,使用 @GoogleDeepMind 的 Gemma 4 26B IT 在 M3 Ultra 上对 MM-NIAH(val,520 个样本)的公开基准结果进行: → 0 质量损失——两者准确率均为78%......
RT @jtdavies: TurboQuant from mlx-vlm seems to help with larger context (64k and above). I ran the full 4, 8-bit and bf16 of the Gemma 4 26…
中文: RT @jtdavies:来自 mlx-vlm 的 TurboQuant 似乎有助于更大的上下文(64k 及以上)。我运行了Gemma 4 26的完整4、8位和bf16。
RT @osanseviero: See you there! Excited to share about Gemma 4 and what the team has been cooking for the last few months
中文: RT @osanseviero:见!很高兴能分享关于Gemma 4以及团队过去几个月所烹饪的内容
Falcon Perception by @TIIuae on MLX-VLM 🚀
中文: 《猎鹰感知》,@TIIuae,MLX-VLM,EE0
RT @osanseviero: Gemma 4 is now in Android Studio! You can use Android Studio Agent mode to develop features, vibe code Android apps, refa…
中文: RT @osanseviero:Gemma 4 现已在 Android Studio 中! 您可以使用 Android Studio Agent 模式开发功能、 vibe 代码 Android 应用、refa...
Another really awesome visual grounding example powered by a couple vision language models (Gemma 4 + Falcon Perception) on a M1 Pro with 32GB using mlx-vlm 🚀 Well done @korale77
中文: 另一款出色的视觉接地示例,采用M1 Pro,配备32GB,采用MLX-vlm,采用多种视觉语言模型(Gemma 4 + Falcon Perceptions)🚀 干得好 @korale77
Goals 🤣🙌🏽
中文: 进球🤣🙌🏽
RT @ivanfioravanti: I spent 3 hours this morning working with coding agents on MacBook 16" M5 Max in LOW POWER mode! 😱 I noticed 0 differe…
中文: RT @ivanfioravanti:今天早上我花了3个小时在低功率模式下与 MacBook 16 英寸 M5 Max 的编码代理合作!😱 我注意到0个不同......
❤️
😂 oh my
中文: 😂 天啊
❤️
❤️ https://x.com/Prince_Canuma/creator-subscriptions/subscribe
RT @awnihannun: Because of AI people are starting to value experience in a domain more than they used to. It feels short sighted. - Many (…
中文: RT @awnihannun:由于人工智能,人们开始比过去更重视在域名中的体验。感觉视线很短。 - 很多(......
First time lapse of Gym Geeks 🤣 Where wifey and I train hard, and maybe discuss latest updates in the AI space. @MaziyarPanahi https://twitter.com/Prince_Canuma/status/2040564084576281022/video/1
中文: 健身房间歇🤣 在我们和妻子进行艰苦训练的地方,或许可以讨论人工智能领域的最新动态。 @MaziyarPanahi
Awesome work by Yasser from @TIIuae 🚀
中文: 雅瑟尔从 @TIIuae 🚀 出的作品精彩
This demo is such powerful example of what’s possible on your Mac using MLX-VLM! It joins two of my favourite latest releases from @GoogleDeepMind and @TIIuae 🚀
中文: 这个演示是使用MLX-VLM在Mac上实现的强大示例! 它与我最喜爱的两个最新版本来自 @GoogleDeepMind 和 @TIIuae 🚀
If you quantize the model you have even more memory savings and speeds ups 🚀 Thanks to @jtdavies for testing it out!
中文: 如果你量化模型,你的内存节省更多,并加快了速度 🚀 感谢@jtdavies 进行测试!
Woohoo Gemma 4 in your pocket thanks to @adrgrondin MLX-Swift port 🚀 Download and try out on his @LocallyAIApp
中文: Woohoo Gemma 4 在您的口袋里,感谢 @adrgrondin MLX-Swift 端口 🚀 下载并试用他的@LocallyAIApp
RT @Prince_Canuma: @ptremblay I know you weren’t, we have interacted before 😊 Just been a long day for me since none of the other guys wan…
中文: RT @Prince_Canuma:@ptremblay 我知道你不是,我们之前有过互动😊 对我来说,一整天都过得不久,因为其他人都不想......
RT @phonezawphyo: @Erik0XAi @Prince_Canuma This is what I’m running python3 -m mlx_vlm.server --model gemma-4-26b-a4b-it-4bit --port 8086…
中文: RT @phonezawphyo:@Erik0XAi @Prince_Canuma 这就是我正在运行的 python3 -m mlx_vlm.server --- 型号 gemma-4-26b-a4b-it-4位 - 端口 8086...
@ptremblay I know you weren’t, we have interacted before 😊 Just been a long day for me since none of the other guys wanted to reason about this. In short and simple terms, I think current models have significantly higher usable context, most around 128K to 256K. But we are now seeing…
中文: @ptremblay 我知道你不是,我们之前有过互动 😊 对我来说只是漫长的一天,因为其他人都不想为此感到拋子。 简而言之,我认为现有模型的可用环境显著更高,大多数约为128K到256K。但我们现在看到了......
RT @ollama: @Prince_Canuma @spark_arena @WesEklund @Prince_Canuma Thank you for all the work you do! Here to just give you ❤️❤️❤️❤️
中文: RT @ollama:@Prince_Canuma @spark_arena @WesEklund @Prince_Canuma 感谢你们所做的一切工作!在这里,请给你❤️❤️❤️❤️
This is incredible work by my great friend Zach 🔥 Generating synthetic data using an OSS models and agent harness. The data is all open too. Check it out!
中文: 这是我的好朋友扎克🔥的这项出色作品 使用OSS模型和代理线束生成合成数据。 数据也全部开放。 来看看吧!
Don’t understand people that try to gaslight others when they start losing an argument, it rarely works… One of my mentors and former boss always used to asked me: “Is that opinion backed by data or intuition? It’s ok if you don’t have data, just make sure you don’t make…
中文: 不要理解那些在别人开始失去争论时试图点燃他人的人,这种事很少奏效...... 我的一位导师和前任老板总是问我: 这种观点是由数据还是直觉支持的? 没有数据就没问题,只需确保你不要做......
🤣
Lol I’ve been doing ML research for a decade and helped the field progress fam… https://twitter.com/Prince_Canuma/status/2040489309040423100/photo/1
中文: 哈哈 我从事机器学习研究已有十年,并帮助该领域取得进步......
Thanks Rojan! More public tests on the improved TurboQuant 🚀 You should see improvements across the board Here you see a slightly improvement in speed and peak memory even at lower context settings between v0.4.3 and v0.4.4 It should be much larger as context grows
中文: 谢谢罗扬! 改进后的TurboQuant 🚀 上的更多公开测试 你应该全面看到改进 在这里,即使在 v0.4.3 和 v0.4.4 之间的较低上下文设置下,您也能看到速度和峰值内存的略微提升 随着背景的增长,应该会大得多
Can’t wait 🚀
中文: 迫不及待了 🚀
RT @ivanfioravanti: BOOM! Let's test this magic!
中文: RT @ivanfioravanti: BOOM!让我们来测试一下这个魔法!
Do your homework before speaking or forming opinions…
中文: 在说话或形成意见之前先做好功课......
@spark_arena @WesEklund I have been a contributor on MLX-LM since 2024 I know everything there is about that project. It has real benchmarks that work and it’s the inspiration for MLX-VLM and all my projects. You confuse model tests to guard against regressions with benchmarks. Those tests are…
中文: @spark_arena @WesEklund 自2024年以来,我一直是MLX-LM的撰稿人,我了解该项目的全部内容。 它具有真正有效的基准,也是MLX-VLM及我所有项目的灵感来源。 将模型测试与基准测试混淆,以防止回归。这些测试是......
Well done 🔥🙌🏽
中文: 干得好🔥🙌
My brother is pushing his Mac to the max using MLX and torch🚀 Don’t know why he is using torch 😭 when the entire pipeline exists in MLX-VLM Sam 3 ✅ RF-DETR ✅
中文: 我哥哥正在用MLX和火炬将他的Mac推向最大值🚀 不知道为什么在MLX-VLM中存在整个管道时,他使用了火炬😭 萨姆 3✅ 射频-DETR ✅
I hope I can make in time for AI Engineer next week ❤️ Still waiting for the VISA
中文: 我希望下周能及时为人工智能工程师工作❤ 仍在等待签证
Haha 😎 I will share a detailed post later about all the improvements
中文: 哈哈 😎 稍后我会分享一篇关于所有改进的详细文章
Awesome testimony 🫡 It makes me happy to hear stories like this
中文: 出色的证词 🫡 听到这样的故事让我感到很开心
Now let’s pull some heavy weights Back day 🏋🏽‍♂️ https://twitter.com/Prince_Canuma/status/2040467019879854104/photo/1
中文: 现在让我们减轻一些沉重负担 返回日 🏋🏽 ♂️
You can’t fake what you care about! I truly care about helping people through technology. That’s my life’s mission and motto. We will win 🏆
中文: 你不能假装你在乎的东西! 我真心关心通过技术帮助人们。这就是我人生的使命和座右铭。 我们将获胜🏆
More public tests coming through 🚀
中文: 更多公开测试通过 🚀
The hardest part was benchmarking with one machine 🥲 Each iteration takes 30-1h to validate, so I lost sleep trying to land this ahead of the Gemma 4 launch, but failed. I’ll need one more Maxed out Mac Studio to help me ship faster and test distributed. That’s why I could… https://twitter.com/Prince_Canuma/status/2040463753536327897/photo/1
中文: 最难的部分是使用一台机器进行基准测试🥲 每次迭代需要30到1小时才能验证,因此在Gemma 4发布之前,我试图实现这一点时失去了睡眠,但失败了。 我还需要一台 Maxed Out Mac Studio 来帮助我更快地发货并测试分发。 这就是为什么我可以......
For the MatFormer variants (E2B and E4B) I don’t see memory savings but do see faster generation
中文: 对于MatFormer变体(E2B和E4B),我看不到内存节省,但确实能看到更快的生成速度
Will test compressing RotatingKVCache later today and see it yields better performance overall 🚀 If it works, we might see massive improvements and potentially unlocking 1M context at 50-100B param range
中文: 今天晚些时候将测试压缩旋转KVCache,并使其整体表现更出色🚀 如果它有效,我们可能会看到大规模改进,并可能在50到100B的参数范围内解锁1M环境
Correction: Device: M3 Ultra 512GB
Gemma 4 31B-IT gets 1.4GB memory savings with TurboQuant on MLX-VLM v0.4.4 💾 This one’s a 59GB dense model — all 60 layers use full attention, but 50 of them use RotatingKVCache with a fixed 1024-token window. TBQ only compresses the 10 full-attention layers (every 6th),… https://twitter.com/Prince_Canuma/status/2040456230737453301/photo/1
中文: 使用 MLX-VLM v0.4.4 的 TurboQuant,Gemma 4 31B-IT 可节省 1.4 GB 内存 💾 这款机型采用59GB的密集模式——全部60层都采用全神贯注,但其中50层采用带有固定1024个令牌窗口的RotatingKVCache。TBQ 仅压缩了10个全注意力层(每6个)......
Shout out to @no_stp_on_snek for his awesome llama.cpp turboquant implementation and tip to skip QJL. One of the many improvements of the latest release was to skip QJL and it worked well with no noticeable loss in coherence 🚀
中文: 向@no_stp_on_snek大声呼喊,感谢他出色的 llama.cpp 涡轮增压器实现效果,并提示他跳过 QJL。 最新版本的众多改进之一是跳过QJL,且效果良好,且一致性没有明显损失🚀
Gemma 4 26B-A4B is now ~2x faster at 375K context with TurboQuant on MLX-VLM v0.4.4 🚀 The model's official max context is 262K but I pushed it to 375K anyway. That's roughly 5–6 full novels (the entire LOTR trilogy + The Hobbit). Up to ~20K tokens they're neck and neck, but… https://twitter.com/Prince_Canuma/status/2040454774357676344/photo/1
中文: Gemma 4 26B-A4B 在 375K 的 TurboQuant 上使用 MLX-VLM v0.4.4 实现速度快 ~2 倍 🚀 该模型的官方最大值为262K,但我还是将其推送到了37.5K。这大约是5到6部完整的小说(整个LOTR三部曲《霍比特人》)。 最多可达约20K的代币,它们是颈部和颈部,但......
mlx-vlm v0.4.4 is out 🚀🔥 New models: 🦅 Falcon-Perception 300M by @TIIuae Highlights: ⚡️ TurboQuant Metal kernels optimized — upto 1.90x decode speed up over baseline on longer context with 89% KV cache savings. 👀 VisionFeatureCache — multi-turn image caching so you don’t… https://twitter.com/Prince_Canuma/status/2040451789363851350/photo/1
中文: mlx-vlm v0.4.4 已出局 🚀🔥 新模型: 🦅 Falcon-Perception 300M 由 @TIIuae 拍摄 亮点: ⚡️ TurboQuant 金属内核经过优化——在较长的上下文中,解码速度可在基准上加速,可节省 89% KV 缓存。 👀 VisionFeatureCache — 多转图像缓存,让您无法使用......
Well, if this trend continues most open-source projects will become invite only contributions. I’m seeing the same issues my friend @ngxson 😄 https://twitter.com/Prince_Canuma/status/2040366474036936865/photo/1
中文: 如果这种趋势持续下去,大多数开源项目将仅成为邀请捐款。 我看到的同样问题是我的朋友 @ngxson 😄
RT @MLStreetTalk: I couldn't find any benchmarks of folks running the Gemma models on an M4 Max (with Ollama 0.20 and mlx-vlm), so I just g…
中文: RT @MLStreetTalk:我找不到任何使用M4 Max(使用Ollama 0.20和mlx-vlm)运行Gemma模型的基准测试标准,所以我就......
RT @Karmedge: Its happening. 6:30 presidio https://twitter.com/Karmedge/status/2040201718986944528/photo/1
中文: RT @Karmarge: 正在发生。6:30 presidio
Pretty cool, well done 👏🏽
中文: 相当酷,做得不错👏🏽
🚀
Guess they called it "Turbo" for a reason 👀 Model: Gemma-4-26B-A4B-it Precision: BF16 Device: M3 Max 96GB https://twitter.com/Prince_Canuma/status/2040260062963286051/photo/1
中文: 猜猜他们之所以称之为“Turbo”,是有原因的👀 型号:Gemma-4-26B-A4B-it 精度:BF16 设备:M3 Max 96GB
Success is about the reps and dedication to the craft.
中文: 成功在于对工艺的代表和奉献。
My wife says I should post time lapses of me working What do you think?
中文: 我妻子说我应该把工作时间间隔发一次 你觉得怎么样?
Gemma 4 31B running with TurboQuant KV cache on MLX 🔥 128K context: → KV Memory: 13.3 GB → 4.9 GB (63% reduction) → Peak Memory: 75.2 GB → 65.8 GB (-9.4 GB) → Quality preserved TurboQuant compression scales with sequence length, so the longer the context, the bigger the…
中文: 使用 MLX 上的 TurboQuant KV 缓存运行 Gemma 4 31B 🔥 12.8万语境: → KV 内存:13.3 GB → 4.9 GB(减少63%) → 峰值内存:75.2 GB → 65.8 GB(-9.4 GB) → 质量得以保留 TurboQuant 压缩秤,序列长度较长,上下文越长,...
You can now run Ollama using MLX as a backend 🚀
中文: 现在你可以使用MLX作为后端来运行Ollama 🚀