Military historians will forever debate the relative importance of grand strategy (which provides the resources required to fight), strategy (which determines where you fight) and doctrine (which determines how you fight). I think you want to get all three right.
I love Stephen Biddle’s book Military Power, which explains how armies have used the modern system of force employment (effectively tactical doctrine) to win battles. He explains that technology innovations since 1914 have requires armies to learn how to use:
Dispersion
Cover and concealment
Suppression
Small-unit maneuver
Combined arms
Defense in depth
Decentralized initiative
Biddle points out that
Technology provides nothing; the effective translation of technological change into coordinated tactics (via the modern system of force employment) provides everything, or least the ability to win battles and possibly wars
Adopting the modern system of force employment requires a very different culture than more traditional forms of war-fighting, especially the ability to trust non-commissioned officers with a lot of responsibility
This tells us a lot about agentic software development
Giving your engineers access to new tools buys you nothing; building a new method software engineering using some of those tools buys you 2x, or perhaps 10x, the productivity
Adopting new methods of software engineering requires rethinking skills and mindsets
The results could have strategic impact. For most companies, software development is the rate limiting factor for every new product, every new channel, every new market and most operational improvements. Enhancing software development productivity accelerates the metabolism of the enterprise.
I was excited to speak, with my colleague Matt Linderman, with Tessl CEO Guy Podjarny because of his ideas about how to create the scaffolding for agentic software engineering -- how to do it at scale. As always, Prosaic Times never endorses any product or offering, but we find great insight in hearing directly from builders bringing new capabilities to market.
1. The founder is a addict entrepreneur
In this section:
Guy is x-Akamai CTO and Snyk founder, now founder/CEO of Tessl.
He frames himself as an “official addict entrepreneur.”
Matt is a McKinsey partner in New York leading the firm’s software practice — three to four years building an AI-first, now agentic, product and engineering practice. Grew up near an upstate-NY Air Force base that did AI research.
James Kaplan: Currently hanging out in southern Rhode Island with another Prosaic Times video podcast. Today we have my colleague, Matt Linderman from McKinsey, and Guy Padjarni from Tessl. Did I pronounce that correctly, Guy?
Guy Podjarny: Good.
James Kaplan: Guy, why don’t you tell us a little bit about yourself, and we’ll bring you to Tessl.
Guy Podjarny: Sure. I’m at this point an official addict entrepreneur. After being a developer and turning product and such through a few acquisitions, I founded my first company in the web performance space. I sold that to Akamai, where I became CTO, and did that for about three and a half years. After about three and a half years, I got the itch to found another company, and I went on to found Snyk, which I think had a good dent of impact on the application security world.
James Kaplan: Maybe just a little
Guy Podjarny: It had a couple of years of wandering the desert and near-death experiences that nobody remembers, because everybody remembers what happened post the sort of two years.
But then it grew nicely, formed the DevSecOps movement and the developer security space. So that grew nicely. And about two and a bit years ago, I accepted I’m an addict and fell in love with AI, and left Snyk, where I’m still chairman of the board, to found Tessl, focused on reinventing software development.
What is the new software development paradigm? That is what Tessl is focused on.
James Kaplan: You’re in London now?
Guy Podjarny: I am based in London. Born and raised in Israel, spent a decade in Canada, and I’ve been in London for the last 13 years.
James Kaplan: Matt, you wanna introduce yourself and tell us a little bit about your journey?
Matt Linderman: I’m a partner in McKinsey’s New York office, and in particular work in our software practice on engineering and product management topics. For the last three or four years, we’ve been building a practice around AI-first and now agentic product management and engineering practices, and lead that more broadly for McKinsey across software companies and across a number of companies outside of software — banks, telcos, et cetera — who are all going through this journey together.
And so Guy, very excited to chat with you about what we’re seeing, what you’re seeing, and mash this together as we go through the conversation here.
James Kaplan: And you grew up in upstate New York?
Matt Linderman: Grew up in upstate New York near an Air Force base that did AI research, actually. So I grew up with this very close to heart, and now live in the New York City area.
James Kaplan: Great. And we appreciate you classing up the joint here on the podcast by wearing a shirt with a collar, Matt. You’ve raised the bar for Guy.
Matt Linderman: I was forced to do it due to some bank conversations, but my T-shirt is available for later.
2. Code is disposable; context is the new code
In this section:
Guy reframes “vibe coding” as the first leg of a journey toward agentic engineering — guiding the agent, not writing the code.
The Tessl thesis, two years old now: code becomes disposable; context becomes the unit of work.
Guy sketches the emerging stack — models → tools → context (skills, rules) → harnesses → factory lines → factories.
Harnesses are deterministic guardrails on a probabilistic model: OpenAI’s no-commit-without-coverage rule; Intercom gating PRs on loaded guidelines.
James Kaplan: You’re allowed when it’s warm in London. Guy, give us a little bit of the big picture on the evolution of software development and the adoption of agentic — or spectroven — development.
I admit, I don’t love the term vibe coding. But tell us a little bit about where we are and where we’ve been and where you think the world is going.
Guy Podjarny: Vibe coding is interesting. I think about vibe coding to agentic engineering as a journey: there is vibe coding, where you just vibe away. But really what we want is agentic engineering. I’d say, thinking a little bit about Tessl’s journey, two and a half years ago — or just less than that, when we founded Tessl — we already had the conviction, which is more clear today, that software development will, at a high level, transform from revolving around code and implementation to revolving around intent and instructions.
It is more about guiding the LLM — at the time, and today the agent — to do what you believe to be right, versus writing the code yourself. At the time, we talked about how code will become disposable. It was a bit more heresy at the time. Today it’s well accepted that implementation will become something that is regenerated, and it’s less important.
We’re not fully there yet. We believe this will drive a new software development paradigm, and we didn’t really know what that is. What are the units? Today we’re starting to see the outline of what it is that you develop in this world. We’re still developing software, still creating something that will evolve, that we debug if it doesn’t work, that we observe in production, that we collaborate on.
So there’s still a thing we’re developing — what humans are developing — what I think of as the context development life cycle. I can describe a little bit of the stack that is shaping up, and I’m sure this will continue to morph and modify.
At the bottom of it, there’s the models. This is the new primitive we’re building on. Think of those like architecture. They’re like operating systems — whatever it is that you build, is it compatible with that layer?
A layer on top of that is a layer of tools that perform various actions. They might be your grep and FFmpeg. They might be custom built.
Above that is the layer of context. I think of context as the new code. We can dig into that a little bit more, but context is where most developers spend most of their time.
It’s conveying what they want the agent to do. We have skills, we have rules.
Further up the line, you start seeing harnesses. Harnesses are really about constraining the model — they’re harnessing the model. This is deterministic software that decides when to make decisions that are not delegated to the probabilistic model. Hooks, for instance — OpenAI saying, “You cannot commit, you cannot run Git commit until the test coverage is above a certain bar.”
Or Intercom saying, “You cannot open a GitHub pull request unless you loaded the relevant context around our PR guidelines.” These are examples where the harness — the configured harness — is telling the model, “You do not get to choose; this has to happen,” because it harnesses the model. And then in turn, harnesses compose up into factory lines.
These are more like pipelines. Harnesses are kind of like frameworks, if you will. Most organizations will choose and highly customize a framework, or they will build their own framework. They probably won’t have a gazillion of them, so it’s not the same as a dev. Up to the factory lines, they’re more like pipelines.
You want some consistent input. With a certain type of input, you have a successful output. You want to go all the way to factories, which are more like your development process. So this is a new software stack, and these analogies are helpful when we think about, okay, as we scale, what do we need?
What tools do we need? What practices do we need to build that out?
3. Stop spec-ing the product; start spec-ing the programmer
In this section:
James offers three historical analogies for the shift — assembler to C, procedural to declarative, and entropy reduction down the traditional design chain.
Guy takes the last two but insists this leap is larger than any prior transition, and is happening faster.
The right comparison is to instructing humans: probabilistic, resilient, interpolating within boundaries — except agents can’t lose their jobs.
Spec-driven engineering is a practice, not a product; specs are only one slice of the broader context-engineering problem.
“Speccing the programmer” — encoding constraints, API choices, framework preferences, and billing limits alongside the spec itself.
James Kaplan: Which metaphor resonates most with you as we think about the movement to agentic development, or spectrum development? The first historical metaphor I sometimes think about is the transition from assembler to third-generation languages like C, which we went through in, say, the late ’80s, early ’90s.
And I’m old enough to remember a lot of rent garments and concern that, “Oh my God,” you know, third-generation languages would be slow and unwieldy compared to programs written in assembler. The second metaphor I sometimes think of is the transition, or the distinction, between procedural languages and declarative languages. And sometimes I think about when the world went from building database management systems by hand, or building databases by hand, to using SQL — which I think you can argue is a declarative language. And when you say spectrum development, that to me sounds like it’s declarative. And the third is entropy reduction — in the sense that in a traditional software engineering process, you went from, say, a senior business executive’s declaration of intent through to a business case, to a conceptual design, to a set of business requirements, to a set of technical requirements, to code.
You had humans at every step in the process, in effect moving entropy until you had deterministic code. Which of those metaphors resonates with you, or do they resonate in different ways?
Guy Podjarny: Yeah, I’d pick a spot if that was a continuum.
James Kaplan: Please.
Guy Podjarny: The last two. The analogy for software evolution — this is a bigger leap than any one of the changes that we’ve done. So even if it is directionally analogous, and that’s useful, ’cause we’re humans and we like analogies, and they help us reason about the world — but I think it is a bigger jump, so we have to acknowledge that, and it’s happening faster than the previous ones.
I’d say it’s somewhere between changing — taking that sort of entropy, or how do we instruct humans. As we cascade, in which we expect probabilistic behavior, we expect when we guide our reports — people that work for us — not to get perfect, but to get resilient. You want them to understand the spirit of what you’re saying and to have some judgment around when to bend the rules or expand the rules, to build it.
James Kaplan: Interpolate, often interpolate, right?
Guy Podjarny: Interpolate. And you want it within boundaries — which is always a tricky thing — and accountability. We should get back a little bit to accountability, ’cause the key distinction between that and the others is: a person who repeatedly makes bad decisions might lose their job, and an agent doesn’t really have that. But on the other side, where they’re meeting the other pull from software, is just the increased desire for abstraction. If you think about a bunch of these transitions you talked about — but even going into Java, and then in infrastructure going into the cloud — anything that became software-defined, suddenly you’re saying, “Hey, get me ten more servers.” You’re not dealing with anything. It’s just spin up ten more servers. Before, you said, “Give me ten more megabytes of memory.” You didn’t even say that. You said, “Give me an array of a thousand objects long.” And Java —
In each one of these things you’ve delegated decision-making to something downstream that you can configure, you can control, but is not entirely yours. The entropy piece is interesting because in all of those cases you still expect a lot more determinism, and a lot less resilience — a lot less adaptability — than you do with humans.
We meet somewhere between human instructions and a higher-level abstraction, and those are probably comparison tools as we think about what works and doesn’t. I’ll just refer one thing to the spec thing: at Tessl originally we talked about spec-centric software development. Spec-centric evolved into spec-driven development, which I believe to be a real practice.
I don’t think spec-driven development is a product. It is a practice.
James Kaplan: Of course. Yes.
Guy Podjarny: We should go from speccing the product, or speccing the program, to speccing the programmer. It’s important within the team — when you think about instructions, specs are just a part of the puzzle. What you expect from people in the team is not just to always update the information about the product and how it operates when they modify it.
That’s one of the things you want. But you generally expect them to make good decisions, and it includes that, but it also includes understanding your constraints and your preferences — how you chose what your API design is, how we chose to use this framework, what your billing constraints are — all of these other pieces. So I use spec-driven development, but I think it is a subset of agentic engineering and/or context engineering. Specs are just a piece of it.
4. Tech debt is becoming deflationary
In this section:
Matt opens with the historical default — entropy increases in codebases over time, and tech debt is what we call it.
Guy inverts the frame: tech debt is deflationary. If the modification will be cheaper in six months, accumulating debt can be the rational move.
The carve-outs are one-way doors — architecture debt and data debt still warrant the cycles to keep fresh.
James’s college-era friend said a program could be rewritten from scratch three or four times and made better; agents make that cheap enough to do routinely.
At Tessl the team is pushing to eliminate interactive coding sessions entirely — one-shot prompts off a well-formed Linear issue, with tests as the enduring artifact.
James Kaplan: Matt, I’m sure you have questions. Let me not hog the mic here.
Matt Linderman: Well, just to add on to the entropy point, Guy — to your point, it’s a really interesting question, because in the past, at least, what we’ve often seen is that entropy increases in code bases over time. And you end up with tech debt, or different words to describe that. Now there’s a real interesting opportunity to think about how do we keep code bases evergreen and actually avoid the entropy degradation over time as you go forward.
Matt Pocock gave an interesting talk on this the other day about basically always doing your architectural reviews on a more frequent basis, keeping the code base more up-to-date and clean. There’s an interesting evolution there that I think historically was a one-way ship, and now we may actually be able to steer that in a slightly different direction, to maintain code bases far better than we have in the past.
So I think the entropy one is a really fascinating thing to look into, and we’ll see how it all evolves.
Guy Podjarny: I agree that you can maintain, because labor has become cheap. So you can do all these things that before were nonsensical financially. Now suddenly maybe they are. At the same time, there’s another view that is almost counter to that a little bit, and that is that tech debt is becoming deflationary. Whatever modification you’re gonna do in your code right now, in six months’ time it’ll be easier to make that modification.
The agents will be more able to help you resolve that. To an extent, this is an amazing time to accumulate debt because it’s deflationary. It’s gonna be cheaper. So if there’s a good ROI in terms of you not bothering with this, then you can do — heck, you’d be able to rewrite the whole thing in a path. There are types of debt that you need to be careful still of — maybe architecture debt, maybe data debt.
Maybe things that are one-way doors that are very hard to change — for those you might wanna invest the extra cycles, which are now cheaper and reasonable to do, to constantly keep it fresh. And then there’s the type of debt where, if you go even the step beyond, you might actually care less. It might be fine, because you would just be able to undo that. So it’s not worth the delay. It’s always about what’s on the other side of the equation — it’s not worth the delay or the opportunity cost to accumulate it.
Matt Linderman: Yeah, 100%.
James Kaplan: In college, I had a friend who’s a very good computer scientist, computer programmer, who liked to say that a software program could be rewritten from scratch three or four times and made better. I think what you’re articulating, Guy, is that now it’s a hell of a lot easier to do that.
We can imagine everything gets rewritten multiple times with what we’ve learned in the process.
Guy Podjarny: Yeah, absolutely. You have to work and adapt to get to that point. One barrier to that, for instance, is interactive coding sessions. At Tessl, when we develop software, we aim — we’re not fully there yet, but we aim to eliminate as much as we can interactive coding sessions.
Instead, you can say, “Fine, play around with Claude or Codex or whatever it is, build the thing that you want, help yourself shape the product to what you want.” ’Cause oftentimes as you build, you figure it out. Now translate all of those into the Linear issue that provides the right information, throw whatever it is — the code you’ve just created in the prototype that doesn’t go anywhere. That gets thrown away.
But the information, the learning out of that, gets done. And then it gets one shot. And if it fails one shot, you modify the information, you provide the relevant commentary, and you create that again. What that does is it puts you in a place in which the agent is, almost by definition, sufficiently informed to be able to build that.
Of course you want to then curate that context over time so that it doesn’t rot, so that it remains relevant. But yes, you’d be able to build it again and make it adaptable. Increasingly, code generation should become like compilation: “It’s okay, I don’t care if there’s a new version. I can compile this for this new version of Linux or whatever it is that I just have here.” You do come back to some sort of regular principles, which is you need to capture tests. You need to capture some definition of what good looks like, what correct behavior looks like. And you’re never gonna test everything — definitely not with agents — but you need enough test coverage, otherwise you cannot scale.
5. Taste is a preference you forgot to write down
In this section:
In real enterprises the factory layer is mostly aspirational; for now the live action is in context.
“Negligent skills” — those without safety instructions — force enterprises into governance: central registries, supply-chain controls, dedup, versioning.
Skills rot like software; the carrot is auto-extraction from agent logs and PRs, the stick is the maintenance burden.
“Taste is just a preference you didn’t bother writing down” — skills are how that preference becomes a rule, used both at development time and at code review.
Three tiers of evals — regression, skill, project (unit / functional / end-to-end) — and LLM-as-Judge is good enough to make them tractable.
Matt Linderman: So Guy, you started to introduce the stack — models, tools, context, skills, harness, factory — which kind of grows in abstraction as you go down. I’d love to hear from you. There’s been a lot of talk on the upper half of that stack, all the way down to harness. The factories piece is really emerging.
I’d love to hear just what you’re seeing in practice in terms of the workflows people are building around the harness, and what impact you’re seeing day-to-day with the folks you’re working with.
Guy Podjarny: For sure. The reality is there’s the sort of AI-native tiny companies, full kind of greenfield world, and then you go all the way to the enterprises.
In enterprises, the reality is that there’s a massive chasm — both between the companies, and within the company as they grow. When you think about factories and factory lines, those are, in almost all companies of medium size and up, not the norm. They are a specific prototype, specific project, specific sections — they’re the forerunners as opposed to the majority. The majority of interest, or activity, that we’re seeing right now is around context.
Eventually the agent executes this stuff. It might be malicious. It might be vulnerable — vulnerable being things like it guides the user to put API keys in plain text, or things like that. Or it might be what I’ve come to call negligent skills. Negligent skills are skills that lack safety instructions — “add this to the database, update is needed, do not drop the table, delete the database,” or some sort of basic safety instructions. Once you get into risky skills, you naturally need the governance: who’s installing, what do I even have in my inventory, do people use it. To control, so they create a central registry. They control those, and all that. So that’s one pier. It’s the least sexy, but it is important for supply chain security, and it’s a blocker to roll things out.
The second thing that we see is challenges around standardization, reuse.
I heard a story that articulated this very well — a unicorn with about 1,000 developers — describing how everybody was creating skills. That’s wasteful because everybody’s creating the skill.
They’re wasting tokens. They’re wasting time. They’re creating a lot of the same thing.
So they put together a repo to be able to upload and share those skills, so everybody’s sharing those skills. Very quickly it becomes a mess — there’s a whole pile of duplicates: which one do I choose? They had compatibility issues — like compilation — one developer is using one agent, they publish the skill, and it doesn’t work well on another agent. To be able to collaborate, you need some basic software-like tools.
You need some quality barometer, and a means of knowing that it’s quality, some deduplication to be able to identify those, some versioning of the stuff that you roll out.
All basic stuff that we have for software.
Most organizations are not yet past that point.
There’s a carrot and stick over here. There’s the fact that skills rot just like software. They will get out of date. They live in a dynamic environment.
The software around them changes, the practices change, the learnings change. You have to maintain them, per the debt conversation we just had. And that’s the stick — you better maintain them, otherwise they’ll break.
And then the carrot is: can I look at agent logs? Can I look at the PRs? Can I auto-extract things that will improve that? That comes along. So that’s the exciting bit. What we’re seeing is that in organizations they do that on a nascent project. They do it on things where the blast radius, if the agent misbehaves, is relatively controlled.
James Kaplan: Fascinated by the ability to use context to enforce or encourage engineering standards. And I’ll give a very simplistic example. In the history of software development, nobody has been worse at naming variables consistently than myself. I am horrible about it.
I am the worst person in the world at it. And it was really interesting — as I started playing with Cursor and Claude Code, it’s, oh, I can set up some rules that determine how variables should be named. That’s an incredibly simple example, but it translates to a million things in terms of architectural standards and non-functional requirements. And we can be a lot more precise about how we engineer code and structure code, compared to a set of guidelines we would give to a new engineer or a relatively early-tenure engineer. And that to me is pretty exciting.
Guy Podjarny: It requires you to do something that many people don’t like, which is take the time to sit down and write down what good looks like.
I think with software, oftentimes we just don’t do the hard thing.
The word taste is thrown around a lot in the world of AI. And while it’s important, taste is just a preference that you didn’t bother writing down.
Skills are a very good way to enforce that. There are technical constraints right now — skills don’t always activate. So what we see in practical terms is there are three parts that you need to do.
One: you need to create the skill and write it down.
Two: you need to make sure that it is distributed. You need to make sure that it’s installed in various cases. On the Tessl side we help with that — both the tracking and the mandating of skills. But you need to know the skill was present when it was needed.
Third: you need to invest in verifying that it’s been acted on.
The beauty of skills is that you can use the same skill in two agent contexts. You can use the skill as part of the development process to say, “Hey, this is available to the agent to load,” and you’re trying to entice it to use it.
But then you can use literally the same skill in the code review process to say, “Hey agent, check if these practices have been applied,” because you’ve written it down once.
And that is actually a beautiful thing, because you can do it in both cases. And you can even further go on and say, “Look, historically, I learned that now this data pattern is not good.”
Not only from here on do you change that, but — Matt, to your comment on tech debt — go back in history and find all the cases in which that’s there, and set up a mini migration of something that you wouldn’t have bothered doing.
Matt Linderman: I’d love to maybe stick on this topic of context, ’cause obviously you’re a real expert in this space. One of the — I think there’s a two-part question. One is, when you look at the evolution, obviously the tools themselves have gotten a lot better at pulling in context, but what are you finding in addition to the standard, like grep and code-based awareness, that’s really critical to pull in?
I’d love to hear your thoughts on that. And then the second piece, which maybe we can go to after, is how do you then experiment and measure what context is more or less effective, and what should you be pulling in? We have a number of clients asking those questions and trying to wrestle through what is the information that they should connect via MCP, how should it be structured, et cetera.
And that comes down to some version of experimentation. Would love to hear your thoughts on how to think through that.
Guy Podjarny: I love the question.
The context window is the scarce resource that we optimize for. Everything is context. Your code is context. It’s important to separate between reusable context and real-time context.
In the real-time context you have things like prompts, tool outputs, and things like that. Those are more interactive context. It’s important to do prompt engineering and things like that — those are still competencies.
The models clearly need to make it easier, and they have interactive modes about knowing when to ask you questions.
There’s passive and active context. Passive context being the code itself. So you can do things like keep the code clean, add proper documentation, add some passive documentation inline — like MD files at the right spots.
There are advantages to continuing with a thing that the agent has regenerated. There’s a real advantage to having the agent rewrite the file, because when it rewrites the file it tends to follow a certain pattern that matches the training data that exists.
Its future decisions reasoning about that file are more likely to be successful. Clearly, it’s not practical to rewrite every file every time, but it’s still a useful guideline.
The third bit, though, that relates to all of those, is this reusable context — and how do you evaluate it, or how do you know that it’s good? This is really where the world is now evolving. Reusable context is mostly done with skills.
Rules are a more forceful set — your Claude MD, your Agents MD — information that is always shoved in.
But you have to be very careful about how much you put in there. If you put a lot in there, you basically make the agent dumber for everything else that it does.
Skills are more on demand — whether the user has invoked them or the agent has the hints to do them.
Define what good looks like for a skill.
Right now the most common barometer is the Anthropic best practices. Does it have progressive disclosure? Is it sufficiently concise?
But we see customers modifying it to their own barometer. So at least start by defining what a good skill is in your organization.
Does it have safety instructions?
Does it refer to data privacy?
The second quality measure is tests.
In skills, the equivalent is evals. In evals you define a scenario: here’s an environment, here’s the files that are involved.
Pull from this commit, modify this context file like this, install these different tools, and then you have a task.
In practice, running tests is just a lot harder to maintain, and LLM-as-Judge is pretty good. It’s like an expert reviewer.
So you define the criteria, you run the agent through it.
Test coverage is notorious: you can build amazing test coverage and have really poor quality controls with it, just because you’ve created useless tests.
State-of-the-art, which not many people are doing, is you have three tiers of evals.
You have things that are more regression tests — just a few samples to say this works.
Mostly they now serve as running something even in the CI, so when you’re modifying the skill it doesn’t break, and you can have some sanity checks on new models and things like that.
Skill evals — you’re evaluating the skill, like a unit test or a library test. You’re evaluating that unit of context on its own.
The next one up is more like project evals.
In this case you’re doing something that evaluates the entire project context. You now have 20 skills installed, and a bunch of rules, and a bunch of files, and maybe you’re giving a bigger task.
Those are heavier to run. You’re not gonna run them on every PR, probably, but you might run them on a weekly basis, on a monthly basis, to see that your context remains fresh.
And then the comprehensive test — the end-to-end test equivalent.
I think of the first one as unit tests. I think of the second one as functional tests. And I think of the third one as end-to-end tests or integration tests — which are comprehensive, so you can make strategic decisions based on them.
So, can I switch models over here? Can I run this on a different environment? Typically cost- or efficiency-related, but for things that matter.
6. The bottleneck is no longer coding; it is learning.
In this section:
Guy’s diagnosis of what enterprises miss: how much they can actually steer agents — the binary “accept the risk or don’t” framing is the wrong question.
At Tessl, 20% of engineering is dedicated to the factory itself; Guy doesn’t think that’s an overinvestment.
Matt: the change-management piece — getting “I engineer the system” past the early-adopter teams to the rest of the org — is harder than the proof of concept.
10X productivity in five years is plausible, but the right metric isn’t coding speed — it’s iteration speed and time-to-market. The new bottlenecks are marketing throughput and user attention.
Closing coda on the CS-degree question: don’t push the kid into the degree, push the kid to build something. Software matters more than ever; the university is the doubtful vehicle.
James Kaplan: So based on your experience, when you talk with enterprise CIOs and CTOs — banks, pharma companies, what have you, manufacturing companies — what do they not get about the future of agentic engineering? What do you think most people working in enterprise IT organizations need to understand about how agentic engineering will evolve over the next few years?
Guy Podjarny: A good question. People are confused between these two perspectives that feel binary.
I think people underestimate how much they can steer and guide the agents — both to success and to control — and that requires investment.
I’ve seen a mistake, and now I cannot deal with this creature.
They’ve seen the wonder around the vibe coding, so it’s: no, you just need to let the agent be and let it roam free.
They don’t appreciate just how much they can control it.
I think the companies that are at the cutting edge — they spend a lot of time on how they get the agents to build right, on enabling the agents.
And people underappreciate the importance, so they delay embracing the agent because they think of it as this absolute — all they need to decide is whether to accept the risk or not accept the risk.
James Kaplan: They can actually control a fair bit of that risk if they manage those agents.
Guy Podjarny: We have — at Tessl, 20% of my engineering team is primarily dedicated to improving the factory, to agent enablement. And I don’t think that’s an overinvestment.
James Kaplan: Matt, this accords quite a bit with what I’ve heard you say about operational change in software engineering — that it’s not just a set of tools, but a set of tools that fit into a broader system.
Matt Linderman: Yeah, to build on what you’re saying a little, Guy, we do a number of coaching initiatives with different folks, helping folks understand how do you move toward a more modern engineering stack. And one of the biggest things is actually more of a mindset shift: I’m not using a bunch of agents to just generate code, but I’m actually responsible for improving the factory, to use your word.
If I’m not getting exactly what I want, how do I go back into the skill, or the series of skills stitched together into some workflow, and re-engineer it in a way that gets me closer to that? And then that brings people, once you have that mindset, into all sorts of directions. How do we do the architecture, the context, better?
How do we feed InfoSec better into the different agents, so that they bring it in from the first time around? That simple mindset shift — from “I’m using the tools that I’m given” to “I am actually engineering the system that is then generating, and my job is to make that system better over time” — has been a massive shift.
You get a few teams usually that figure it out first. They actually are the ones building the agents, the workflows, et cetera. But what we found is if you can then get that mindset amongst the rest of the engineers, even if they’re already using what other folks have built as the baseline, they can then adapt it, they can shape it to work in their specific parts of the code base, and the types of work they’re doing.
But what’s quite interesting, at least from our point of view, is that a large share of that challenge is the change management piece that comes after defining and showing that it can work. Then it’s: how do you actually get everyone else working in that same way? And a lot of that’s the mindset shift that I was talking through.
Guy Podjarny: Yeah, no, I fully agree with that.
Look, it’s hard because — it is a change in the craft of what you’re operating. It’s a change in the types of mistakes that can happen.
With self-driving cars, when they make mistakes that are nonsensical, that a human would never make, people get actually mad. When there’s a leaf in the middle of the road and it thinks it’s a person and it would not continue, people are actually upset about it.
And I think we’re seeing things like that. There’s a new type of error that happens, and it’s hard for people to acknowledge.
You basically have this combo of: you see a problem, and you’re told, correctly, “Don’t fix the problem — don’t fix the symptom. Go upstream and get the agent to fix the problem.” Which loses a bunch of their craft. And is a new type of work that they might not want to do.
So yeah, a lot of this is culture change. I’m sufficiently a gray beard to have gone through the DevOps transformation. And DevSecOps, and the movement of security responsibilities.
These things are unsettling, and it doesn’t help that agents are happening maybe ten times faster.
It is a big change. So a lot of cultural — it always comes down to people.
James Kaplan: So for the companies that implement the model successfully — the operating model — is this a doubling of engineering throughput, a tripling, 50% improvement? Think five years down the road — what do you think the companies that enthusiastically embrace agentic engineering will be able to achieve?
I think it’s an order of magnitude. 10X.
Guy Podjarny: And potentially more. Not — again, maybe a bit of a common mistake — it’s not about the speed of coding.
James Kaplan: Of course.
Guy Podjarny: It’s the speed of iteration.
When you launch a product, you still should work iteratively. You still should build a minimal product so you get it out there and get people to validate it.
The amount of time it takes you to get a user, to get them to try the product, to give you the feedback and internalize it — those are all still relatively fixed.
You can only lightly optimize those.
But the type of product you can provide to them now can be a lot more comprehensive. You can bring them a product that is actually a lot more thought through.
Your ability to analyze and apply learnings from whatever it is that they did with the product is a lot faster.
And the number of people you need involved in each one of these iterations is a lot smaller, so you can learn more in parallel.
There are enormous opportunities to improve, and they compound. That sort of 2X improvement pace implies that if you’re at it and you’re progressing, you’re gonna be way ahead, ’cause of your pace of learning.
James Kaplan: You’re getting down the learning curve.
Guy Podjarny: It’s interesting to identify the new bottlenecks.
At Tessl we generally work in pairs most of the time.
That’s more because of organizational resilience. If someone’s on vacation or somehow cannot do it, then the other person can continue the work. They collaborate.
It’s mostly independently, and then they review each other’s work, so someone can step in a little bit more easily.
We’ve had a problem that we’re still working on fixing, which is product marketing is struggling to keep up with the pace of new capabilities that we have.
The answer is agents all the way down. You need to build more agentic analysis of what got built, move a few of the decision documents that happened earlier on, to be able to produce marketing material in parallel.
Once you have that, there’s still a scarce resource that we need to understand, which is the attention span of our users. You can’t email them ten times as many emails. You still have to send them a confined amount.
So it’s interesting to understand what the limiting factors are, what the new constraints are. And alignment is one of those.
But within each of those departments, more empowerment, more autonomy, less dependencies is a critical movement, because the cost of alignment relative to the cost of building is so much higher.
James Kaplan: Five years, 10X improvement in productivity via agentic engineering at best enterprises. You agree with that? More or less — what’s your view?
Matt Linderman: I think we need two things. One: it depends on the metric you look at, but I would say 10X order of magnitude makes a lot of sense. It could even be higher. If you look at your throughput metrics, the historical way of engineering — there are organizations getting five, 10X already, and they’re trying to now figure out how do you get that across the organization.
But I think that’s a bit misleading. To your point, Guy, really the metric to be solving for is more of a time-to-market view. If you look at the bottlenecks — it moves to your code review, it moves to your product management being able to build up the right requirements, it moves to product marketing, et cetera.
So really, I think there’s a measure of: one, if it’s a new product, how fast can you get to market? And then for existing products, how fast can you cycle through to get customer input, figure out what to build next, then go build it, go get more input, et cetera. I think that for sure can accelerate 10-plus X more.
But it requires real process change. You have to think, “Okay, how do I actually go get that customer input?” It used to be releasing it to actual customers, and then we went to alphas. Now can you even have customers on your team who can just test it and give you feedback every single day at 3:00 PM?
James Kaplan: Before I release something, before I write something, I have a panel of 500 virtual CIOs and CTOs who give me feedback. I often go through five or six rounds, and they help contain some of my literary excesses. Guy, final short question for both you and for Matt — would you advise a young person today to major in computer science if he or she were so interested?
Guy Podjarny: I would not advise someone to go do a computer science degree — not because — I think software development as a profession with some modifications will continue to live, and we will build software.
James Kaplan: And I think software will matter more than ever. I have a lot less faith that the universities would be the route — that the universities will be able to adapt. Your trepidation is about the degree, not about the computer science part.
Guy Podjarny: Yeah, exactly. Go build something. Go create a product. And I also feel like this is a world in which a breadth of perspective will come a long way.
Learning how to be a bit of a one-person army goes a long way — around touching product, touching marketing, touching your subject domain.
James Kaplan: Matt, computer science — 22-year-old or 18-year-old. Assuming that person will go to university, computer science or study something else?
Matt Linderman: Maybe building on what you mentioned, Guy — there’s never been a better time to build your own company. Going and doing that at some point will teach you far more of a breadth of experiences. Now, if you do go, I do think there’s really a skill around problem-solving, conceptual problem-solving, that is going to be applicable in any job that you have, in communications, et cetera.
That may come from engineering degrees, you could say math degrees, et cetera. But I would really be looking for something that pushes you in terms of how do you think, how do you structure problems, et cetera — that you can then bring into whatever type of work you’re doing moving forward.
And then go into the workforce and learn, as you said — entrepreneurship or not.
James Kaplan: Thank you so much. It was a great discussion.
Guy Podjarny: Thank you.
Matt Linderman: Thank you everyone.
7. What would Biddle say about agentic software engineering?
No single element of the modern system of force employment wins a battle—not combined arms, not suppression, not decentralized initiative. The doctrine wins.
The same is true of Podjarny’s stack: models, tools, context, harnesses, factories. None of them buy you anything in isolation. The factory is the doctrine.
And Biddle’s harder lesson—that adopting the modern system requires trusting non-commissioned officers with judgment they were not historically given—is the agentic problem in another voice. Just as you have to trust NCOs with fire teams, you have to trust engineers with agents.
The companies that learn it will not be 10X more productive in any narrow sense. They will be 10X faster at learning — which is perhaps the discriminant between victory and defeat in competitive markets.









