X-Message

Prince Canuma

#100

Absolute killer use case for MLX-VLM!🔥🙌🏽

中文: MLX-VLM绝对杀手使用案例!EE0[EE]🏽

2026-04-11 00:31:57

Prince Canuma

#99

Local grounded reasoning using MLX will power a whole new generation of use cases that were previously only available on the cloud! From satellite imagery analysis, security systems all the way to robotics. I’m really excited for the latter. I spoke at length about these… https://twitter.com/Prince_Canuma/status/2042761667017105517/video/1

中文: 使用MLX进行本地接地推理，将为此前仅在云端可用的新一代使用案例提供动力! 从卫星图像分析，安防系统一直到机器人。我对后者感到非常兴奋。我详细谈到了这些......

2026-04-11 00:28:28

Prince Canuma

#98

RT @osanseviero: Our first successful Gemma 4 Runtime in London with @swyx @patloeber @nick_kango @cormacb and others! 💎Great to go out for…

中文: RT @osanseviero:我们首次在伦敦成功完成Gemma 4 Runtime，与 @swyx @patloeber @nick_kango @cormacb 等同!💎 出去很棒......

2026-04-10 08:55:34

Prince Canuma

#97

RT @adrgrondin: I’m excited to announce that I’ve joined @lmstudio 👾 The team behind the app is amazing and I couldn’t be more proud. I’l…

中文: RT @adrgrondin:我很高兴地宣布，我已加入 @lmstudio 👾 这款应用背后的团队非常出色，我再为此感到自豪。我会......

2026-04-09 14:43:22

Prince Canuma

#96

❤️

2026-04-09 14:12:39

Prince Canuma

#95

🚀🔥

2026-04-09 14:11:39

Prince Canuma

#94

Woohoo, congratulations @adrgrondin! I couldn’t imagine a better match 🚀

中文: 哇，恭喜@adrgrondin! 我无法想象会有更好的比赛🚀

2026-04-09 14:08:17

Prince Canuma

#93

Just implemented TriAttention in MLX and the results are wild! You can get up to 81% KV compression at 60K tokens for Gemma-4-31B-IT in BF16 🔥 Unlike TurboQuant, which quantizes KV cache values, TriAttention prunes low-importance tokens entirely by scoring keys using… https://twitter.com/Prince_Canuma/status/2042021304270819394/photo/1

中文: 刚刚在MLX中实施了TriAttention，结果很疯狂! 在 BF16 中，Gemma-4-31B-IT 的 60K 代币最多可获得 81% 的 KV 压缩 🔥 与 TurboQuant 使用 TurboQuant 量化 KV 缓存值不同，TriAttention 完全通过使用密钥评分来修剪低重的代币......

2026-04-08 23:26:32

Prince Canuma

#92

I’m behind them chatting with @altryne and @marlene_zw 😂🙌🏽

中文: 我跟在他们后面聊天，@altryne 和 @marlene_zw 😂🙌🏽

2026-04-08 23:13:35

Prince Canuma

#91

🚀

2026-04-08 20:02:46

Prince Canuma

#90

RT @ClementDelangue: Anthropic had the most powerful cyber-security model in the history of this world and their internal code based still…

中文: RT @ClementDelangue:Anthropic 拥有当今世界历史上最强大的网络安全模式，其内部代码代码依然存在......

2026-04-08 15:50:56

Prince Canuma

#89

RT @julien_c: We are giving away Safetensors to the @pytorch foundation (shepherded by the Linux Foundation) Our shared goal is to make th…

中文: RT @julien_c:我们将向 @pytorch 基金会(由 Linux 基金会提供)赠送 Safetensors 我们共同的目标是实现......

2026-04-08 15:33:23

Prince Canuma

#88

Ask Mythos to leak its own weights 😂 https://twitter.com/Prince_Canuma/status/2041839027217641750/photo/1

中文: 请让神话泄露自身权重 😂

2026-04-08 11:22:14

Prince Canuma

#87

RT @angeloskath: A long time coming but new mlx-lm is here with better batching support in the server and Gemma 4. pip install -U mlx-lm…

中文: RT @angeloskath:即将到来很长一段时间，但新的 mlx-lm 已在服务器和 Gemma 4 中提供了更好的批处理支持。点胶安装 -U mlx-lm...

2026-04-07 23:16:58

Prince Canuma

#86

I love the internet! 😂 For me the most important part was OSS attempt (humbling experience) and seeing my child hood fav actress show up in an unexpected place. It’s obvious our beloved Milla knows nothing about the space, and honestly didn’t expect her too. Two things that… https://twitter.com/Prince_Canuma/status/2041612468208988354/photo/1

中文: 我热爱互联网!😂 对我来说，最重要的部分是奥西斯的尝试(令人谦卑的经历)，看到我那个童子女女主角出现在一个意想不到的地方。很明显，我们心爱的米拉对这个空间一无所知，说实话也没想到她。两件事......

2026-04-07 20:21:58

Prince Canuma

#85

Ain’t no way your name is…

中文: 你的名字绝不是......

2026-04-07 20:12:37

Prince Canuma

#84

RT @OlivierBachem: Our goal in the Gemma team is to ship models that are useful by generalizing to unseen tasks. Hence, we are extremely s…

中文: RT @OlivierBachem:我们在Gemma团队的目标是通过通用化未知任务来提供有用的模型。因此，我们极其......

2026-04-07 20:06:39

Prince Canuma

#83

I have my visa for the UK for 6 months If you would like me to speak at your event, DMs are open 🚀

中文: 我持有英国签证六个月如果你想让我在活动上发言，DMs 是开放的 🚀

2026-04-07 19:21:33

Prince Canuma

#82

Literally got this at 3pm today and have to fly tomorrow, Thank God!

中文: 今天下午3点终于得到了这个，明天必须飞起来。感谢上帝!

2026-04-07 18:12:58

Prince Canuma

#81

I got my UK visa 😭❤️🙌🏽 UK and @aiDotEngineer here comes the King! https://twitter.com/Prince_Canuma/status/2041574009284767856/photo/1

中文: 我收到了英国签证 😭❤ø�🙌 英国和@aiDotEngineer来了国王!

2026-04-07 17:49:09

Prince Canuma

#80

👀 will you donate the Mac Mini for the cause https://twitter.com/Prince_Canuma/status/2041524355083989460/photo/1

中文: 👀 会捐赠 Mac Mini 以获取该事业的

2026-04-07 14:31:50

Prince Canuma

#79

Medium was once a great place... I wrote my best articles there back in 2018

中文: Medium 曾经是个很棒的地方...... 我在2018年在那里写过我最好的文章

2026-04-07 13:46:30

Prince Canuma

#78

Well done guys ❤️🚀🔥

中文: 干得好，伙计们❤🚀🔥

2026-04-07 09:54:58

Prince Canuma

#77

My favourite action actress from Resident Evil and many awesome movies is doing open source ❤️ First, never saw that coming! Second, that a time to be alive and doing open source! Open source for the win 🚀

中文: 我最喜欢的《生化危机》动作演员以及许多精彩电影都在做开源的❤ 首先，永远也看不到那一幕! 其次，是时候活着并做开源了! 获胜的开源 🚀

2026-04-07 09:30:44

Prince Canuma

#76

If this works well, we are looking into a new era! Well done Anemll 🔥🙌🏽

中文: 如果这效果好，我们正着眼于一个新时代! 干得好，阿尼姆🔥🙌🏽

2026-04-07 06:51:43

Prince Canuma

#75

This example as so much alpha! You can now literally generate vision agent traces and train smaller VLM on it completly on-device 🔥 cc: @TheZachMueller @MaziyarPanahi @ivanfioravanti @ActuallyIsaak https://twitter.com/Prince_Canuma/status/2041286374431633886/photo/1

中文: 这个例子就是阿尔法! 现在，您可以完全在设备上生成视觉代理痕迹并训练较小的VLM 🔥 @TheZachMueller @MaziyarPanahi @ivanfioravanti @AclyIsaak

2026-04-06 22:46:11

Prince Canuma

#74

The best ideas are the simplest, thank you @dahou_yasser! "the idea: Gemma4 looks at the image, decides what to segment, Falcon Perception returns pixel-accurate masks + metadata (centroid, area_fraction, bbox), Gemma4 reasons on the numbers and calls the next tool or answers."…

中文: 最好的想法很简单，谢谢@dahou_yasser! 想法:Gemma4 会查看图像，决定分割内容，Falcon Perception 返回像素准确的 mages + 元数据(centroid、area_fractie、bbox)，Gemma4 对数字进行推理，并调用下一个工具或答案。......

2026-04-06 21:24:57

Prince Canuma

#73

RT @nibzard: added @cohere transcribe to my small transcribing cli running natively on Apple Silicon via MLX-audio from @Prince_Canuma http…

中文: RT @nibzard:通过 @Prince_Canuma 的 @cohere 转录到我通过 MLX-audio 原生运行的 Apple Silicon 小转录中。

2026-04-06 19:57:06

Prince Canuma

#72

🫡❤️

2026-04-06 18:25:00

Prince Canuma

#71

Awesome work by @no_stp_on_snek 🔥

中文: @no_stp_on_snek 的出色作品 🔥

2026-04-06 16:55:35

Prince Canuma

#70

RT @roboflow: here's what you can build for $0.00 with 3 open source models token cost breakdown below and the company getting rich off i…

中文: RT @roboflow:以下是使用3种开源模型，售价0.00美元的版本以下是代币成本的细分，公司从中致富......

2026-04-06 14:04:45

Prince Canuma

#69

RT @TheZachMueller: Well, that's kinda cool https://twitter.com/TheZachMueller/status/2041139872849690789/photo/1

中文: RT @TheZachMueller:嗯，这有点酷

2026-04-06 14:04:12

Prince Canuma

#68

Woohoo 🎉

中文: 伍胡 🎉

2026-04-06 11:57:12

Prince Canuma

#67

RT @MaziyarPanahi: https://x.com/i/article/2041078649185591296

2026-04-06 10:15:24

Prince Canuma

#66

Gemma 4 26B A4B IT (4bit) + M5 Max + MLX-VLM 🚀

中文: Gemma 4 26B A4B IT(4位)+ M5 Max + MLX-VLM 🚀

2026-04-06 06:42:58

Prince Canuma

#65

We exist in a corner of X 🫡

中文: 我们存在于X 🫡的一个角落

2026-04-06 00:21:44

Prince Canuma

#64

Have a new label for certain type of PRs 😤 https://twitter.com/Prince_Canuma/status/2040902161555464504/photo/1

中文: 为特定类型的公关人员(EE0)提供新标签

2026-04-05 21:19:28

Prince Canuma

#63

Awesome work! 🔥

中文: 很棒的工作!🔥

2026-04-05 21:12:51

Prince Canuma

#62

Hopefully this shines a light 💡on anyone trying to benchmark TBQ on MLX but doesn’t know how

中文: 希望这能让任何试图在MLX上对TBQ进行基准测试但不知道如何进行的人大放异彩

2026-04-05 21:08:52

Prince Canuma

#61

@zigelbaum @GoogleDeepMind “Via codex” It’s hard to test something if you don’t know how to test it yourself. I put up a benchmark script in the thread that you can use to test and have codex interpret the results for you. But before you run it, ask codex to install the changes in this branch.…

中文: @zigelbaum @GoogleDeepMind “通过密码” 如果你不知道如何自己测试，就很难测试它。我在帖子中设置了一个基准脚本，供你用来测试，并让密码为你解释结果。但在运行之前，请让 Codex 安装此分支中的更改。......

2026-04-05 21:05:24

Prince Canuma

#60

This is benchmark is multimodal (images + text) It has from 1 up-to 26 images in a prompt Use this PR, it has patch to enable Gemma 4 to support multiple images: https://github.com/Blaizzy/mlx-vlm/pull/938

中文: 基准是多式联运(图片 + 文本) 提示中最多可显示1到26张图片使用此PR，它具有补丁，使Gemma 4能够支持多种图像:

2026-04-05 20:45:49

Prince Canuma

#59

Why TBQ only quantizes full-attention layers in Gemma 4 31B, not the sliding-window ones: TLDR, It’s a bad idea because the sliding layers are already memory-efficient by design. 😂 → 50 sliding layers hold a fixed ~400MB regardless of context length → 10 full-attention…

中文: 为何TBQ仅量化Gemma 4 31B中的全图层，而不是滑动窗口层: TLDR，这不太好意，因为滑动层在设计上已经具有了高能效。😂 → 50个滑动层，无论上下文长度如何，都固定在约400MB → 10个全心......

2026-04-05 20:22:05

Prince Canuma

#58

Alongside MM-NIAH I’m also running LongBench-V2 to truly showcase where TurboQuant shines which is large context ( above 60K) Running will take around 24h to complete. Meanwhile, here is a sneak peak of 6 samples across different context sizes. See you in a day or two 🫡 https://twitter.com/Prince_Canuma/status/2040881635449598238/photo/1

中文: 除了MM-NIAH之外，我还将运行LongBench-V2，以真正展示TurboQuant在大背景环境中的亮点(高于60K) 跑步大约需要24小时才能完成。与此同时，以下是不同上下文大小的6个样本的快速峰值。一两天内见 🫡

2026-04-05 19:57:54

Prince Canuma

#57

TurboQuant: Open Evals on MLX 🔥 Yesterday I launched mlx-vlm v0.4.4 with major TurboQuant performance improvements. Today, the open benchmark results on MM-NIAH (val, 520 samples) using Gemma 4 26B IT by @GoogleDeepMind on M3 Ultra: → 0 quality loss — 78% accuracy for both… https://twitter.com/Prince_Canuma/status/2040877782922649865/photo/1

中文: TurboQuant:在MLX上打开Evals 🔥 昨天我推出了 mlx-vlm v0.4.4 ，带来了 TurboQuant 的显著性能改进。今天，使用 @GoogleDeepMind 的 Gemma 4 26B IT 在 M3 Ultra 上对 MM-NIAH(val，520 个样本)的公开基准结果进行: → 0 质量损失——两者准确率均为78%......

2026-04-05 19:42:35

Prince Canuma

#56

RT @jtdavies: TurboQuant from mlx-vlm seems to help with larger context (64k and above). I ran the full 4, 8-bit and bf16 of the Gemma 4 26…

中文: RT @jtdavies:来自 mlx-vlm 的 TurboQuant 似乎有助于更大的上下文(64k 及以上)。我运行了Gemma 4 26的完整4、8位和bf16。

2026-04-05 19:24:41

Prince Canuma

#55

RT @osanseviero: See you there! Excited to share about Gemma 4 and what the team has been cooking for the last few months

中文: RT @osanseviero:见!很高兴能分享关于Gemma 4以及团队过去几个月所烹饪的内容

2026-04-05 18:49:40

Prince Canuma

#54

Falcon Perception by @TIIuae on MLX-VLM 🚀

中文: 《猎鹰感知》，@TIIuae，MLX-VLM，EE0

2026-04-05 18:38:57

Prince Canuma

#53

RT @osanseviero: Gemma 4 is now in Android Studio! You can use Android Studio Agent mode to develop features, vibe code Android apps, refa…

中文: RT @osanseviero:Gemma 4 现已在 Android Studio 中! 您可以使用 Android Studio Agent 模式开发功能、 vibe 代码 Android 应用、refa...

2026-04-05 14:28:30

Prince Canuma

#52

Another really awesome visual grounding example powered by a couple vision language models (Gemma 4 + Falcon Perception) on a M1 Pro with 32GB using mlx-vlm 🚀 Well done @korale77

中文: 另一款出色的视觉接地示例，采用M1 Pro，配备32GB，采用MLX-vlm，采用多种视觉语言模型(Gemma 4 + Falcon Perceptions)🚀 干得好 @korale77

2026-04-05 11:15:11

Prince Canuma

#51

Goals 🤣🙌🏽

中文: 进球🤣🙌🏽

2026-04-05 09:56:40

Prince Canuma

#50

RT @ivanfioravanti: I spent 3 hours this morning working with coding agents on MacBook 16" M5 Max in LOW POWER mode! 😱 I noticed 0 differe…

中文: RT @ivanfioravanti:今天早上我花了3个小时在低功率模式下与 MacBook 16 英寸 M5 Max 的编码代理合作!😱 我注意到0个不同......

2026-04-05 09:54:13

Prince Canuma

#49

❤️

2026-04-05 09:41:24

Prince Canuma

#48

😂 oh my

中文: 😂 天啊

2026-04-05 08:44:18

Prince Canuma

#47

❤️

2026-04-05 01:06:38

Prince Canuma

#46

❤️ https://x.com/Prince_Canuma/creator-subscriptions/subscribe

2026-04-05 00:37:26

Prince Canuma

#45

RT @awnihannun: Because of AI people are starting to value experience in a domain more than they used to. It feels short sighted. - Many (…

中文: RT @awnihannun:由于人工智能，人们开始比过去更重视在域名中的体验。感觉视线很短。 - 很多(......

2026-04-04 23:23:58

Prince Canuma

#44

First time lapse of Gym Geeks 🤣 Where wifey and I train hard, and maybe discuss latest updates in the AI space. @MaziyarPanahi https://twitter.com/Prince_Canuma/status/2040564084576281022/video/1

中文: 健身房间歇🤣 在我们和妻子进行艰苦训练的地方，或许可以讨论人工智能领域的最新动态。 @MaziyarPanahi

2026-04-04 22:56:04

Prince Canuma

#43

Awesome work by Yasser from @TIIuae 🚀

中文: 雅瑟尔从 @TIIuae 🚀 出的作品精彩

2026-04-04 22:16:15

Prince Canuma

#42

This demo is such powerful example of what’s possible on your Mac using MLX-VLM! It joins two of my favourite latest releases from @GoogleDeepMind and @TIIuae 🚀

中文: 这个演示是使用MLX-VLM在Mac上实现的强大示例! 它与我最喜爱的两个最新版本来自 @GoogleDeepMind 和 @TIIuae 🚀

2026-04-04 22:15:44

Prince Canuma

#41

If you quantize the model you have even more memory savings and speeds ups 🚀 Thanks to @jtdavies for testing it out!

中文: 如果你量化模型，你的内存节省更多，并加快了速度 🚀 感谢@jtdavies 进行测试!

2026-04-04 21:53:26

Prince Canuma

#40

Woohoo Gemma 4 in your pocket thanks to @adrgrondin MLX-Swift port 🚀 Download and try out on his @LocallyAIApp

中文: Woohoo Gemma 4 在您的口袋里，感谢 @adrgrondin MLX-Swift 端口 🚀 下载并试用他的@LocallyAIApp

2026-04-04 21:29:05

Prince Canuma

#39

RT @Prince_Canuma: @ptremblay I know you weren’t, we have interacted before 😊 Just been a long day for me since none of the other guys wan…

中文: RT @Prince_Canuma:@ptremblay 我知道你不是，我们之前有过互动😊 对我来说，一整天都过得不久，因为其他人都不想......

2026-04-04 21:21:51

Prince Canuma

#38

RT @phonezawphyo: @Erik0XAi @Prince_Canuma This is what I’m running python3 -m mlx_vlm.server --model gemma-4-26b-a4b-it-4bit --port 8086…

中文: RT @phonezawphyo:@Erik0XAi @Prince_Canuma 这就是我正在运行的 python3 -m mlx_vlm.server --- 型号 gemma-4-26b-a4b-it-4位 - 端口 8086...

2026-04-04 21:07:38

Prince Canuma

#37

@ptremblay I know you weren’t, we have interacted before 😊 Just been a long day for me since none of the other guys wanted to reason about this. In short and simple terms, I think current models have significantly higher usable context, most around 128K to 256K. But we are now seeing…

中文: @ptremblay 我知道你不是，我们之前有过互动 😊 对我来说只是漫长的一天，因为其他人都不想为此感到拋子。简而言之，我认为现有模型的可用环境显著更高，大多数约为128K到256K。但我们现在看到了......

2026-04-04 21:04:58

Prince Canuma

#36

RT @ollama: @Prince_Canuma @spark_arena @WesEklund @Prince_Canuma Thank you for all the work you do! Here to just give you ❤️❤️❤️❤️

中文: RT @ollama:@Prince_Canuma @spark_arena @WesEklund @Prince_Canuma 感谢你们所做的一切工作!在这里，请给你❤️❤️❤️❤️

2026-04-04 20:17:58

Prince Canuma

#35

This is incredible work by my great friend Zach 🔥 Generating synthetic data using an OSS models and agent harness. The data is all open too. Check it out!

中文: 这是我的好朋友扎克🔥的这项出色作品使用OSS模型和代理线束生成合成数据。数据也全部开放。来看看吧!

2026-04-04 19:53:39

Prince Canuma

#34

Don’t understand people that try to gaslight others when they start losing an argument, it rarely works… One of my mentors and former boss always used to asked me: “Is that opinion backed by data or intuition? It’s ok if you don’t have data, just make sure you don’t make…

中文: 不要理解那些在别人开始失去争论时试图点燃他人的人，这种事很少奏效...... 我的一位导师和前任老板总是问我: 这种观点是由数据还是直觉支持的? 没有数据就没问题，只需确保你不要做......

2026-04-04 18:44:26

Prince Canuma

#33

🤣

2026-04-04 18:30:56

Prince Canuma

#32

Lol I’ve been doing ML research for a decade and helped the field progress fam… https://twitter.com/Prince_Canuma/status/2040489309040423100/photo/1

中文: 哈哈我从事机器学习研究已有十年，并帮助该领域取得进步......

2026-04-04 17:58:56

Prince Canuma

#31

Thanks Rojan! More public tests on the improved TurboQuant 🚀 You should see improvements across the board Here you see a slightly improvement in speed and peak memory even at lower context settings between v0.4.3 and v0.4.4 It should be much larger as context grows

中文: 谢谢罗扬! 改进后的TurboQuant 🚀 上的更多公开测试你应该全面看到改进在这里，即使在 v0.4.3 和 v0.4.4 之间的较低上下文设置下，您也能看到速度和峰值内存的略微提升随着背景的增长，应该会大得多

2026-04-04 17:36:38

Prince Canuma

#30

Can’t wait 🚀

中文: 迫不及待了 🚀

2026-04-04 17:27:30

Prince Canuma

#29

RT @ivanfioravanti: BOOM! Let's test this magic!

中文: RT @ivanfioravanti: BOOM!让我们来测试一下这个魔法!

2026-04-04 17:27:21

Prince Canuma

#28

Do your homework before speaking or forming opinions…

中文: 在说话或形成意见之前先做好功课......

2026-04-04 17:21:50

Prince Canuma

#27

@spark_arena @WesEklund I have been a contributor on MLX-LM since 2024 I know everything there is about that project. It has real benchmarks that work and it’s the inspiration for MLX-VLM and all my projects. You confuse model tests to guard against regressions with benchmarks. Those tests are…

中文: @spark_arena @WesEklund 自2024年以来，我一直是MLX-LM的撰稿人，我了解该项目的全部内容。它具有真正有效的基准，也是MLX-VLM及我所有项目的灵感来源。将模型测试与基准测试混淆，以防止回归。这些测试是......

2026-04-04 17:19:42

Prince Canuma

#26

Well done 🔥🙌🏽

中文: 干得好🔥🙌

2026-04-04 17:10:34

Prince Canuma

#25

My brother is pushing his Mac to the max using MLX and torch🚀 Don’t know why he is using torch 😭 when the entire pipeline exists in MLX-VLM Sam 3 ✅ RF-DETR ✅

中文: 我哥哥正在用MLX和火炬将他的Mac推向最大值🚀 不知道为什么在MLX-VLM中存在整个管道时，他使用了火炬😭 萨姆 3✅ 射频-DETR ✅

2026-04-04 17:08:11

Prince Canuma

#24

I hope I can make in time for AI Engineer next week ❤️ Still waiting for the VISA

中文: 我希望下周能及时为人工智能工程师工作❤ 仍在等待签证

2026-04-04 16:44:59

Prince Canuma

#23

Haha 😎 I will share a detailed post later about all the improvements

中文: 哈哈 😎 稍后我会分享一篇关于所有改进的详细文章

2026-04-04 16:37:45

Prince Canuma

#22

Awesome testimony 🫡 It makes me happy to hear stories like this

中文: 出色的证词 🫡 听到这样的故事让我感到很开心

2026-04-04 16:36:47

Prince Canuma

#21

Now let’s pull some heavy weights Back day 🏋🏽‍♂️ https://twitter.com/Prince_Canuma/status/2040467019879854104/photo/1

中文: 现在让我们减轻一些沉重负担返回日 🏋🏽 ♂️

2026-04-04 16:30:22

Prince Canuma

#20

You can’t fake what you care about! I truly care about helping people through technology. That’s my life’s mission and motto. We will win 🏆

中文: 你不能假装你在乎的东西! 我真心关心通过技术帮助人们。这就是我人生的使命和座右铭。我们将获胜🏆

2026-04-04 16:22:44

Prince Canuma

#19

More public tests coming through 🚀

中文: 更多公开测试通过 🚀

2026-04-04 16:18:31

Prince Canuma

#18

The hardest part was benchmarking with one machine 🥲 Each iteration takes 30-1h to validate, so I lost sleep trying to land this ahead of the Gemma 4 launch, but failed. I’ll need one more Maxed out Mac Studio to help me ship faster and test distributed. That’s why I could… https://twitter.com/Prince_Canuma/status/2040463753536327897/photo/1

中文: 最难的部分是使用一台机器进行基准测试🥲 每次迭代需要30到1小时才能验证，因此在Gemma 4发布之前，我试图实现这一点时失去了睡眠，但失败了。我还需要一台 Maxed Out Mac Studio 来帮助我更快地发货并测试分发。这就是为什么我可以......

2026-04-04 16:17:23

Prince Canuma

#17

For the MatFormer variants (E2B and E4B) I don’t see memory savings but do see faster generation

中文: 对于MatFormer变体(E2B和E4B)，我看不到内存节省，但确实能看到更快的生成速度

2026-04-04 15:59:26

Prince Canuma

#16

Will test compressing RotatingKVCache later today and see it yields better performance overall 🚀 If it works, we might see massive improvements and potentially unlocking 1M context at 50-100B param range

中文: 今天晚些时候将测试压缩旋转KVCache，并使其整体表现更出色🚀 如果它有效，我们可能会看到大规模改进，并可能在50到100B的参数范围内解锁1M环境

2026-04-04 15:55:45

Prince Canuma

#15

Correction: Device: M3 Ultra 512GB

2026-04-04 15:48:05

Prince Canuma

#14

Gemma 4 31B-IT gets 1.4GB memory savings with TurboQuant on MLX-VLM v0.4.4 💾 This one’s a 59GB dense model — all 60 layers use full attention, but 50 of them use RotatingKVCache with a fixed 1024-token window. TBQ only compresses the 10 full-attention layers (every 6th),… https://twitter.com/Prince_Canuma/status/2040456230737453301/photo/1

中文: 使用 MLX-VLM v0.4.4 的 TurboQuant，Gemma 4 31B-IT 可节省 1.4 GB 内存 💾 这款机型采用59GB的密集模式——全部60层都采用全神贯注，但其中50层采用带有固定1024个令牌窗口的RotatingKVCache。TBQ 仅压缩了10个全注意力层(每6个)......

2026-04-04 15:47:29

Prince Canuma

#13

Shout out to @no_stp_on_snek for his awesome llama.cpp turboquant implementation and tip to skip QJL. One of the many improvements of the latest release was to skip QJL and it worked well with no noticeable loss in coherence 🚀

中文: 向@no_stp_on_snek大声呼喊，感谢他出色的 llama.cpp 涡轮增压器实现效果，并提示他跳过 QJL。最新版本的众多改进之一是跳过QJL，且效果良好，且一致性没有明显损失🚀

2026-04-04 15:44:15

Prince Canuma

#12

Gemma 4 26B-A4B is now ~2x faster at 375K context with TurboQuant on MLX-VLM v0.4.4 🚀 The model's official max context is 262K but I pushed it to 375K anyway. That's roughly 5–6 full novels (the entire LOTR trilogy + The Hobbit). Up to ~20K tokens they're neck and neck, but… https://twitter.com/Prince_Canuma/status/2040454774357676344/photo/1

中文: Gemma 4 26B-A4B 在 375K 的 TurboQuant 上使用 MLX-VLM v0.4.4 实现速度快 ~2 倍 🚀 该模型的官方最大值为262K，但我还是将其推送到了37.5K。这大约是5到6部完整的小说(整个LOTR三部曲《霍比特人》)。最多可达约20K的代币，它们是颈部和颈部，但......

2026-04-04 15:41:42

Prince Canuma

#11

mlx-vlm v0.4.4 is out 🚀🔥 New models: 🦅 Falcon-Perception 300M by @TIIuae Highlights: ⚡️ TurboQuant Metal kernels optimized — upto 1.90x decode speed up over baseline on longer context with 89% KV cache savings. 👀 VisionFeatureCache — multi-turn image caching so you don’t… https://twitter.com/Prince_Canuma/status/2040451789363851350/photo/1

中文: mlx-vlm v0.4.4 已出局 🚀🔥 新模型: 🦅 Falcon-Perception 300M 由 @TIIuae 拍摄亮点: ⚡️ TurboQuant 金属内核经过优化——在较长的上下文中，解码速度可在基准上加速，可节省 89% KV 缓存。 👀 VisionFeatureCache — 多转图像缓存，让您无法使用......

2026-04-04 15:29:51

Prince Canuma

#10

Well, if this trend continues most open-source projects will become invite only contributions. I’m seeing the same issues my friend @ngxson 😄 https://twitter.com/Prince_Canuma/status/2040366474036936865/photo/1

中文: 如果这种趋势持续下去，大多数开源项目将仅成为邀请捐款。我看到的同样问题是我的朋友 @ngxson 😄

2026-04-04 09:50:50

Prince Canuma

#9

RT @MLStreetTalk: I couldn't find any benchmarks of folks running the Gemma models on an M4 Max (with Ollama 0.20 and mlx-vlm), so I just g…

中文: RT @MLStreetTalk:我找不到任何使用M4 Max(使用Ollama 0.20和mlx-vlm)运行Gemma模型的基准测试标准，所以我就......

2026-04-04 07:36:49

Prince Canuma

#8

RT @Karmedge: Its happening. 6:30 presidio https://twitter.com/Karmedge/status/2040201718986944528/photo/1

中文: RT @Karmarge: 正在发生。6:30 presidio

2026-04-04 04:22:09

Prince Canuma

#7

Pretty cool, well done 👏🏽

中文: 相当酷，做得不错👏🏽

2026-04-04 04:19:15

Prince Canuma

#6

🚀

2026-04-04 03:34:50

Prince Canuma

#5

Guess they called it "Turbo" for a reason 👀 Model: Gemma-4-26B-A4B-it Precision: BF16 Device: M3 Max 96GB https://twitter.com/Prince_Canuma/status/2040260062963286051/photo/1

中文: 猜猜他们之所以称之为“Turbo”，是有原因的👀 型号:Gemma-4-26B-A4B-it 精度:BF16 设备:M3 Max 96GB

2026-04-04 02:47:59

Prince Canuma

#4

Success is about the reps and dedication to the craft.

中文: 成功在于对工艺的代表和奉献。

2026-04-04 02:15:28

Prince Canuma

#3

My wife says I should post time lapses of me working What do you think?

中文: 我妻子说我应该把工作时间间隔发一次你觉得怎么样?

2026-04-03 21:58:49

Prince Canuma

#2

Gemma 4 31B running with TurboQuant KV cache on MLX 🔥 128K context: → KV Memory: 13.3 GB → 4.9 GB (63% reduction) → Peak Memory: 75.2 GB → 65.8 GB (-9.4 GB) → Quality preserved TurboQuant compression scales with sequence length, so the longer the context, the bigger the…

中文: 使用 MLX 上的 TurboQuant KV 缓存运行 Gemma 4 31B 🔥 12.8万语境: → KV 内存:13.3 GB → 4.9 GB(减少63%) → 峰值内存:75.2 GB → 65.8 GB(-9.4 GB) → 质量得以保留 TurboQuant 压缩秤，序列长度较长，上下文越长，...

2026-04-02 23:00:03

Prince Canuma

#1

You can now run Ollama using MLX as a backend 🚀

中文: 现在你可以使用MLX作为后端来运行Ollama 🚀

2026-03-31 06:50:23