the noise, the doubt, and the next phase
a few weeks ago, magnus completed its phase 1 pretraining run.
i did not expect what happened next.
the media thing
within a day or two, several outlets picked up the story. most of them got it wrong in the same direction. the framing was roughly: small nepali lab builds model that beats top open source. which sounds exciting and is also not what happened.
magnus completed a pretraining run. pretraining is the beginning of a long process, not the end of one. and magnus is a sparse mixture of experts architecture, not a dense model — which matters quite a bit for how you interpret any benchmark numbers that come out of it. 12 billion active parameters per token, routed across a much larger total parameter count. the architectural choice was deliberate, made early, specifically because training a large dense model from scratch was not viable for a three-person lab without hyperscaler backing. we built something we could actually build.
the human eval numbers were real. outperforming qwen on certain tasks at the pretraining stage is a meaningful signal. but a pretraining run is not a finished model. it still needs alignment, RL, extensive evaluation. the distance between where magnus is now and where it needs to be before deployment is significant and we know that clearly.
we rode the wave for a little while. i am not going to pretend we did not. when coverage happens you do not immediately reach for the correction. but we clarified eventually, in full, because the exaggerated version is not the story we want attached to this work. we put out a proper technical breakdown at sagea.space/news/magnus-phase-1-complete. the architecture, the dataset, the compute sourcing, all of it.
the honest version of the story is already interesting enough. we do not need the inflated one.
on the skepticism
alongside the coverage came skepticism. most of it took one of a few forms: this is fake, a lab this size cannot do this, where did you get the compute.
i find this kind of skepticism useful. not because i enjoy being doubted, but because the doubt is a precise articulation of what we actually have to prove. every specific version of "you cannot do this" is a hypothesis we can test and answer with evidence.
the compute question is the most reasonable one. the honest answer is infrastructure partnerships, MoE routing optimization to minimize waste, and a lot of careful architecture choices made early precisely to stay within tractable compute budgets. we are not backed by a hyperscaler. we are not pretending to be. we built within the constraints we had, and the architecture reflects those constraints.
the "it is fake" version is less interesting to engage with. the work exists. it will be visible over time. that is the only answer to that kind.
phase 2 is already in motion. skepticism does not pause the timeline.
helios and the deployment work
while magnus has been getting the external attention, helios has been quietly getting more real.
deployments have picked up pace. the conversations we are having with clients have changed character. a few months ago the question was usually what can this do. now it is more often what happens when something goes wrong, what is the SLA, how do we handle edge cases at volume. those are better questions. they mean the people asking them are thinking about actually depending on the system.
the research component of helios has stayed active alongside the deployment work. there is a tendency to treat research and deployment as sequential, research first then ship. in practice they run together. deployment surfaces problems that research has not thought about yet. research produces fixes for problems deployment does not know it has. the feedback loop is fast and the team has gotten good at operating in it.
i think helios is going to be one of the more important things sagea has built over the next few years, in ways that are not yet obvious from the outside. the identity infrastructure problem in nepal is genuinely unsolved at scale, and the combination of on-premise deployment requirements and local regulatory understanding is not something a global provider is going to build for this market.
the workshop
we ran the first workshop under sagea for learning.
i was not at my best. i had been up most of the night before, the kind of situation where you spend more energy staying upright than you do thinking clearly, and at a certain point the two activities become indistinguishable.
it went fine anyway. the students were engaged. the content held. i said a few things that made sense and a few things i would phrase differently with more sleep. the format worked and we will do it again.
the hamrocsit partnership is the broader structure for this. one of nepal's most established computer science platforms, a community that has been doing real educational work with students for years. the mou gives us a foundation to run more workshops, to build curriculum together, to put frontier AI in front of students who would not otherwise have hands-on access to it. that matters independently of everything else sagea is doing.
there is a version of frontier AI development that stays inside the lab. produces papers. releases models. operates at the level of the research ecosystem and does not directly touch the people it is supposed to benefit.
we are trying to not be that. the for learning work is part of that attempt. it is early and imperfect, and i ran the first session on no sleep, but the direction is right.
what is actually happening
some weeks at sagea feel like maintenance. keeping the pace, shipping incrementals, executing on what is already defined.
the last few weeks have not felt like that.
magnus phase 1 is done and phase 2 is running. helios is deploying at a pace we have not seen before. the for learning program has its first workshop behind it and the hamrocsit partnership ahead of it. the media got the story wrong and we corrected it. the skeptics are asking reasonable questions and we are answering them with code and commits.
i have been thinking about what it means to build something that is genuinely hard and genuinely uncertain and keep building anyway.
it means you have to be comfortable with the gap between what is visible externally and what you know internally. the external version of magnus right now is somewhere between hype and doubt depending on who is talking. the internal version is a phase 1 pretraining run, a clean architecture, a set of benchmark signals that mean something to people who understand what the numbers represent, and a phase 2 that has already started.
the internal version is the only one we can actually act on.
so that is what we are doing.