Lilian Weng / @lilianweng: Besides the fun fact that Connectionism is connected with the early days of the AI field and highlights similarities between neural networks and human brains, the flagship product of the (first) Thinking Machines is named Connection Machine. — 🧑🎓Enjoy reading and more is coming!Ben Fielding / @fenbielding: strongly agree with the need for determinism in model execution outlined in @thinkymachines first blog post we take this further at @gensynai and build for reproducibility (determinism across devices) more to announce soon but in the meantime, a demo: https://github.com/...Bert Maher / @tensorbert: Read this to the end — the last section is mind-blowing@timlantin: life imitates art (westworld) [image]Sai Yashwanth / @yashwanthsai29: Learnt a lot reading this study. The quality is exceptional! I aspire to contribute at the same level in the coming future.Suha / @suhackerr: awesome research! interesting implications for correctness and assurance here. excited that folks are doing deep dives into this problem and im excited to see what they release nextBilal / @bilaltwovec: I've been using FlexAttention a few months now and I've never felt better. I have more energy. My skin is clearer. Tpot has stopped complaining about my models being quantized 🔥 [image]Jose Lopez / @dl_insider: Thinking Machines has found the reason for the non-deterministic in LLM. Big deal for: * Scientific Reproducibility * High-Stakes & Safety-Critical Fields * Advanced AI Research * Testing and Validation But it has a cost so I expect a toggle : 1. Fast non-deterministic,An Vo / @an_vo12: This blog makes me wonder about the OPPOSITE problem: 👉 Can we make LLMs give uniform random answers when asked (e.g., “randomly pick 0-9")? So far, our work ( https://b-score.github.io/) shown that we can hack it with multi-turn, but I'd love to see this activated in single-turn. [image]@alth0u: the field...worked on this problem for years...and...they just...they just tweeted it out.Frank / @okfrankco: The power of a good question: “Why aren't LLMs deterministic?” In short: computers run floating-point operations (math😝) in parallel, and the order in which each operation comes together can shift. This makes results slightly different each time. This small variance cascadesJeremy Bernstein / @jxbz: Excited about our new research blog!Urmish Thakker / @urmishthakker: Loved this post from @cHHillee ! The posts masks how hard these debugs are. Posts make story telling linear, reality is messy. Lot of wrong paths triaged with no return, sometimes revising old paths. Patiently grinding until you get to the answer. Great read, learnt a lot!Barret Zoph / @barret_zoph: Excited to share our first blog post — one of many to follow!Matthieu Meeus / @matthieu_meeus: I was recently asked during an interview why LLMs were not deterministic even when temperature is 0. This very carefully answers it! My interviewer's answer was different though, that there likely is non determinism in MOE routing for load balancingDavid Yin / @davidyin0609: Still remember earlier this year, I tried very hard to figure out why vLLM can give very different outputs than huggingface models even with greedy sampling (not sure if it is fixed now). Twisting batch size or number of gpus also makes the output differentMarc Marone / @ruyimarone: This post is awesome! Explains why non determinism comes from a lack of batch invariant kernels and why that's hard “Surprisingly, we generate 80 unique completions, with the most common of these occurring 78 times” and then shows a truly deterministic vLLM runZihao Ye / @ye_combinator: Awesome work from @thinkymachines and @cHHillee! The importance of determinism might be underestimated. Like with LLM-based compression ( https://bellard.org/... - you really need things to work the same way whether you're doing prefill/decode or different batching setups. Here'sSam Schoenholz / @sschoenholz: Looking forward to seeing research / writeups from Thinking Machines get shared with the community. @cHHillee et al.'s determinism work has been fantastic and was fascinating to watch evolve. Our infrastructure keeps getting better!Ehsan Shareghi / @ehsanshareghi: One interesting thing we found was that safety in LLMs is sometimes by pure luck! If the LLM/sampling method choose to generate “I am sorry ...” as initiating tokens, it will reject an unsafe request. Otherwise, it may not. This type of nondeterminism has many implications.Bardia Khosravi / @brdkhsrv: Amazing post on why LLMs are non-deterministic even with a temperature of zero. TLDR; Essentially it boils down to different inference batch sizes, as some kernels are not batch-size invariant. Meaning that their output is influenced by the number of requests🤯Moll / @mollehilll: Thinking Machines published an excellent piece explaining the phenomenon of «nondeterminism in inference». The authors show that the reason isn't «probability magic», but how the server engine itself is built. Requests don't go one by one, they are batched, and the batch sizePedro Domingos / @pmddomingos: Wow, Thinking Machines' research is so varied. It ranges all the way from kernel numerics to prompt engineering. I can't imagine needing anything more to solve AI.@alibaba_qwen: Awesome work by the @thinkymachines team! Thrilled that Qwen models (Qwen3-235B-A22B & Qwen3-8B & Qwen 2.5-VL) served as a foundation for this experiment. This is exactly why we build—to empower researchers tackling hard problems & unlocking new scientific insights. Can't wait toShumo Chu / @shumochu: Such a great read! Also the report shows the @thinkymachines folks are not just pure “researchers”, they are also on the ground engineers who deeply understand the system issues of LLMs. We are in the stage where the system side and research side are deeply intertwined.Varsh Sridharan / @varshinesri: One of the best research blogs I've read in the recent times- learned a ton! Such diverse insights ranging all the way from Kernel numerics and reasons for non-determinism! So looking forward to dive into the root causes of our nondeterminism and even solving them!Krish Maniar / @krinetix1234: Very well written blog on how the LLM forward pass is actually deterministic (rarely need atomic_add), but the core source of nondeterminism comes down to variance in batch size Clearly there is still a performance gap (26 vs. 42s on vlllm), but once there's enough eyes onNoémi Éltető / @eltetonoemi: I enjoyed this blog post a lot! I also decided that I will dress as floating-point non-associativity this Halloween and give everyone a good scare. [image]@statusfailed: super nice to finally see other people give a shit about the “original sin” (nonassociativity). Didn't realise the implications for RL either, this is great stuff.Liyuan Liu / @liyuanlucas: appreciate @thinkymachines taking an open research approach! excited to see the first blog mentioned our work! truly on-policy RL is like RTX3090 for gamers in 2020 - you really want it, but the blockers make your head itch... kernel mismatches, parallelism mismatches, etc. etc.Bidhan / @bidhanxyz: thinky machine has become an intellectual peer to bagel labs by starting a blog blog dot bagel dot comHorace He / @chhillee: Apologies that I haven't written anything since joining Thinking Machines but I hope this blog post on a topic very near and dear to my heart (reproducible floating point numerics in LLM inference) will make up for it!Ashutosh Kumar / @ashu_1069: Learnt a lot while working through this, took a lot of time to really understand the details and not just stay fond of the idea, but get to know it properly. Here's the blog link: https://ashu1069.substack.com/ ... [image]Mira Murati / @miramurati: A big part of our mission at Thinking Machines is to improve people's scientific understanding of AI and work with the broader research community. Introducing Connectionism today to share some of our scientific insights.Ahmad Beirami / @abeirami: This is a great example of what good research looks like. You start with a real problem. You peel it layer by layer to find the root cause. You form a new hypothesis and keep digging. At the end, you have something insightful to share!Manuel Faysse / @manuelfaysse: A big issue we had when serving ColQwen is the non-deterministic output embeddings. More specifically, the embeddings produced for the same images would differ when batch sizes changed at inference, leading to non-zero performance variations. This was surprising to us... IDJ Pardis / @djpardis: I'm excited to read this highly relevant, @thinkymachines post. Also, can we talk about how it's designed and typeset like a Knuth book? [image]Jiayi Yuan / @jiayiyuan99: Thanks for the shout-out to our work—it's great to see more focus on this important problem. Horace's approach is a truly elegant solution. Fantastic work! More reading: https://arxiv.org/... [image]Ruiqi Zhong / @zhongruiqi: I learned so much from this as an ML (not-system-ish) researcher. highly recommend a read!!@thinkymachines: Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to [image], Comprehensive coverage of artificial intelligence and machine learning. Insights, analysis and opinions on how AI technology is shaping the future of business and finance., i.AI, a UK government unit offering a median salary of £67K to develop tools to improve civil service efficiency, is struggling to attract top talent — The Incubator for Artificial Intelligence unit spent less than half of its budget last year because of recruitment delays.