the week things started moving faster
some weeks feel like maintenance. you are executing on what already exists. fixing things. keeping the pace. it is necessary work, but it does not change the shape of what you are building.
then there are weeks where several things shift at the same time. not dramatically. not all at once. but you can feel the trajectory changing. this has been one of those weeks.
what we shipped
we have been working on something called ACUMEN for a while now. i am not going to over-explain it yet because it is still early and i have learned that talking too much about things before they are fully formed usually hurts more than it helps.
but what i can say is that ACUMEN started as a question about evaluation. how do you actually measure whether a reasoning system is getting better. not on a benchmark. not on a leaderboard. in a way that reflects something real. most evaluation frameworks are designed around convenience. they measure what is easy to measure. and what is easy to measure is almost never what actually matters.
we have been trying to build something that closes that gap. and this week it moved from being a research exercise into something more defined.
that transition is always strange to experience. the work looks the same from the outside. the same conversations. the same notebooks. but something internally crystallizes and the system starts to feel like it has a shape.
things that are close
alongside ACUMEN, a few other systems we have been building are getting close to the point where they are ready to exist outside of our internal infrastructure.
i wrote a few weeks ago about identity systems and how most of them were designed for a distribution that no longer matches reality. about how the moment volume increases, the assumptions underneath start breaking and the manual interventions multiply.
Helios has been our long answer to that problem. and it is almost ready. the thing we kept coming back to during the build was a simple constraint: it has to run on your infrastructure, not ours. no data leaving the premises. no dependency on external uptime. just a system that works, reliably, under your own roof.
that constraint made the engineering harder. but it made the product more honest.
on open source
we are also getting close to safely releasing a much larger model we'd built back while as an open model. something in a weight class we have not shipped before. the reasoning behind open source has always been the same for us. if the model is genuinely good, releasing it builds more credibility than keeping it closed. and credibility is actually the scarce resource for a small lab. not the weights.
there is also something philosophically consistent about it. we spend a lot of time thinking about how AI infrastructure should be less concentrated. releasing good open models is one way to act on that belief instead of just stating it.
the thing i keep thinking about
lately i have been thinking about the gap between research and deployment. most labs treat them as separate phases. you research, you develop, you deploy. sequential. clean.
in practice, the interesting stuff happens when those phases overlap. when a research finding immediately changes what you are building. when a deployment problem sends you back to a question you thought you had answered. that compression is uncomfortable. it creates a kind of productive chaos that is hard to manage but usually produces better systems than the clean sequential version.
we are living in that compressed zone right now. several things running in parallel. some of them close to ready. some of them still early.
it is one of those periods where you are not entirely sure what the next few weeks will look like, but you know the direction is right.
that is enough for now.