Everything is a graph!

Playback speed

Share post at current time

Share from 0:00

0:00

Transcript

Everything is a graph!

A discussion with Neo4j CTO Philip Rathle

James Kaplan

Mar 01, 2026

A few learnings from my trip to Australia last week.

< 1 > Despite what every says, I sort of like Canberra, even I got some sort of bug there. Being under the weather seventeen time zones from home is not wise.

< 2 > Many corporate executives see two mechanisms from using large language models: bottom up innovation with front-line employees using chat interfaces to enhance their work and building agentic workflows (or workflows using agentic software engineering)

< 3 > I worry about the perspective in LLM chat as bottom up innovation.

We know that context is everything, so you don’t want to have to depend on the user remembering to provide all the right context in every prompt
Naturally, copy-paste is a poor mechanism for agentic learning
So chat-based LLM usage quickly devolves to a combination of fancy web search, summarize this document (which probably didn’t need to exist in the first place) and write me a email (which possibly didn’t need to exist either). Thin gruel.

< 4 > All the exciting things people are doing with Cursor, Claude Code, Obsidian and other tools? That’s the real innovation -- and it reminds me of the ferment we saw in the PC revolution in the 1980s and early 1990s.

< 5 > Of we need agentic software engineering will be important. Very few people, though, could answer this question when I posed it: if you could develop software functionality at half the cost you could today, what would you build. This will become an existential question for many companies.

I’m so excited about this interview with neo4j CTO Philip Rathle because context is everything in the age of AI, making the architectural synergies between AI and knowledge graphs very important.

At first, I thought: wow, you could use GenAI to populate a knowledge graph
Then my colleague Zubin Ghafari showed me how you could use GenAI to accelerate ontology definition, dramatically reducing time-to-value for knowledge graphs
And finally: a graph is the ultimate context repository because of its ability to depict a dense network of relationships

Thanks for reading Prosaic Times — share it with a friend!

Everything is a graph — a discussion with Neo4j CTO Philip Rathle

What you need to know: Knowledge graphs treat relationships as insight, not overhead. They complement vector search: vectors get you to relevant chunks; graphs give you deterministic, explainable answers and deep context.

What you need to do: Identify where your data is inherently a graph—multi-level hierarchies, supply chains, B2B CRM, digital twins, CMDB-style infrastructure — and start small: let the model evolve as your needs evolve.

What you need to decide: Which problem is the right first use case for a minimum viable graph; and how much to formalize your ontology upfront versus letting it emerge with human-in-the-loop curation.

James Kaplan: Hi there. This is James Kaplan from McKinsey, and this is another episode of the Prosaic Times podcast. Greetings from New York City, where it’s cold and miserable and the frozen snow mixed with truck exhaust is still on the sidewalks. Fortunately I’ll be in Australia next week. We’re expecting warmer. I’m thrilled that Philip Rathle, the CTO of Neo4j, the knowledge-graph database company, is with us. Philip, thank you for joining us.

Philip Rathle: Pleasure to be here, James.

A background in databases

James Kaplan: Alright, so why don’t we start out with a little bit of background. Do you mind just telling us a bit about your journey? I recall from our email exchanges there may have been a Commodore 64 somewhere deep in the background. We’d love to hear about that as well.

Philip Rathle: I certainly started dabbling with computers probably the same time you did—mid-eighties or so.

James Kaplan: We’ve got the New Wave soundtrack playing in the background.

Philip Rathle: Yeah, something like that. And then if I fast forward past college—after lots of computer dabbling—I ended up studying chemical engineering, ’cause I was really into physics, math, chemistry, and just hard challenges and quantitative stuff.

James Kaplan: That’s closer to what we’re doing today than being a history major as I was.

Philip Rathle: Okay. Exactly. So from there I fell straight into software—working at Accenture, which was then Andersen Consulting—and really pretty quickly got into databases. I kind of fell in love with the notion that I can represent a business with data and then manipulate business outcomes with data. Of course we do that in a much more sophisticated way today than we did in the mid-nineties when I actually started doing this.

James Kaplan: Yeah. When you got started on SQL databases in the mid-1990s, we should remember how relatively novel that was. I think the first commercially available SQL database was, what, late seventies—about 1980 or so. So you’re talking about a world that was 15 years old.

Philip Rathle: So organizing data was really interesting to me. It’s a set of rules, much like a geometric proof.

I got into data modeling pretty early on, then got into a bunch of programming around data—data-warehouse kind of work. And then I had this eye-opening experience where I had built an application that functioned perfectly on paper, and when you started running tens of millions of call detail records through it for a telephony application—

James Kaplan: Call detail records. Kids today don’t know CDRs. I suppose wireless telecom companies still have CDRs.

Philip Rathle: I realized that it was possible to build programs that looked beautiful and functional and actually worked and passed unit and system test—and then you get into the physics of the next level of data: how do I treat it and manipulate it? Anyhow, I’ll just fast forward past 20 years of that. Lots of work with high-performance relational database systems. Then I encountered the graph way of doing things.

Emil Eifrem, Neo4j’s founder and CEO, and I met in late 2011. He had this really interesting idea that was very counter to the trends in databases at the time—at the time it was “let’s simplify the data model and remove things like pesky relationships, which get in the way when you’re trying to store and retrieve at scale to support e-commerce.” He was saying no, actually those things are pretty valuable; I’m building an entire database company around it. So there was an interesting combination of something I really believed in—the model is something I always saw as really valuable. Relationships, data context, causality are things that emerge once you understand connections—and the opportunity to help pioneer and grow a new category in databases.

Knowledge graphs treat relationships as insights, not overhead

James Kaplan: I was wondering if you could just give us a brief definition of a knowledge graph.

Philip Rathle: It’s a graph-based structure—nodes, relationships—that represents and reflects some part of the world that I care about, be it real or digital. So think of it as a digital twin; that term is often used to describe these kinds of things, and you may have heard it more in a manufacturing context—physical objects could be a digital twin of part of your enterprise. It’s taking a real-world system or a digital-world system that shows up naturally as a network—computers, biology, ecology, payments, et cetera—or as some kind of hierarchy: reporting structures, HR ownership structures, financial services, et cetera. And just representing those in their natural way.

So I look at a knowledge graph as a logical description, and then I have a choice: Do I put my knowledge graph into a database that is a graph database, naturally designed to take that data and represent it, query it, evolve it as a graph? But I can also take a knowledge graph and put it anywhere—in a spreadsheet, in tables. The important question is, where do you put your graphs? Arguably some of them already live in relational databases, but you probably have a lot of limitations in how you can use them, because the distance between the representation as a graph and the representation in tabular form is quite great. If I’m trying to understand context several levels out, or causality multiple levels out, the ability of that system in tabular form ends up being limited.

James Kaplan: When people ask me why I think knowledge graphs are interesting, the reasons I tend to give are: one, it almost mirrors what somebody would draw on a whiteboard—even a business user. It’s intuitive for many types of data. Second, the relationship is a first-class citizen of the database, which makes it much easier to model domains where there are many types of relationships in many directions. And finally, the fact that the relationship is a first-class citizen means you can model causality over multiple steps in the chain. It’s much easier to model, for example, counterparty risk via a graph than in a traditional database.

Philip Rathle: I love all of those, and I a hundred percent agree. We even used the term “whiteboard friendly” early on to describe the fact that the way a business person would model their domain is using circles and lines—which is a graph.

I have a colleague, Ajay Singh, who recently came in after years of working at Databricks. He made an observation about the category: many people assume they know something about graphs—that they’re niche, they’re hard to learn, and they don’t scale. What he’s discovered, first of all, is they’re everywhere. We’re all graph-database end users. Number two, they’re much easier to learn than a relational database. If you haven’t yet learned to think in terms of a relational database, graphs are how we think. If you’ve learned to think as a relational database, it’s pretty daunting at first—it was for me too. You kind of have to unlearn and come back to your original way of thinking. He said that for a business person, you show them a graph visualization and they’ll leap up and go grab their colleagues and say, “Wow, for the first time I can really see my domain.” And then they can take the next steps and ask more complex questions, like you said—multi-level causality, deeper context and influence. So it’s easier to learn. Of course there’s a little activation energy to unlearn whenever you learn a new skill. But my experience is that once a technologist starts working with graphs, it just becomes so natural, and your projects tend to go faster because you don’t have to deal with all the ORM and the impedance mismatch.

James Kaplan: I wonder if you could comment a little bit on the history of graph databases. I may be one of the few people in the world who’s interested in databases and interested in history, but, you had this sort of boomlet, in the two thousands and a lot of interest in Owl and then it didn’t quite work out.

I was wondering if you had a view on what the evolution of the interest in graph databases was?

Philip Rathle: So relational obviously became an ISO standard pretty early—SQL 86. Prior to that you actually had hierarchical databases, IMS, and—

James Kaplan: Which was a good thing to do if your processors basically ran on hamsters or something. A hierarchical database is a little less processor-intensive than a relational database.

Philip Rathle: But something interesting about them is they represent hierarchies in a more natural way. Of course the way it was done is very different from what’s done today with graph databases.

James Kaplan: As someone who’s tried to represent knowledge hierarchies and relational databases, yes, I hear what you’re saying.

Two original stories for graph databases

Philip Rathle: That’s right. Graph databases came out, and there are kind of two origin stories. One started with the worldwide web and Tim Berners-Lee’s idea: let’s annotate the worldwide web and have more than just HTML; let’s add—I’m gonna use the word “context” to mean many different things in this conversation—

James Kaplan: We need context on context. We need meta context.

Philip Rathle: That’s right.

James Kaplan: Oh, come on. That was funny.

Philip Rathle: That actually emerged as W3C.

James Kaplan: And that’s the whole semantic web—W3C.

Philip Rathle: That was the semantic web.

James Kaplan: So a lot of that collapsed under its own weight. I just don’t think the world was ready for it at that point. Is that an unfair statement?

Philip Rathle: There’s still some degree of tagging—schema.org is heavily used. But you’re right, it’s not used in context that much. It’s become a fairly specialized technology. I’ll come back to where it’s shown up. There’s been a thread on the database-management side.

The other origin story came from Neo4j founder Emil Eifrem. He had a literal napkin moment on a flight to Mumbai from Sweden, working through a content management system and dealing with the confluence of many different hierarchies intertwining.

So one, it was a hierarchy of content. This was an application that had, digitized and brought online a number of photo CDs that this company licensed. So you have the hierarchy of different, photos and how they’re structured into CDs and collections. You have another hierarchy of like this photo, the ontology, like the hierarchy of meaning on top of it.

And then you had users and groups and permissions and then, relationships across these things. And he’d been using Postgres in the background and realized 90% of his time was actually spent doing effectively, like object to relational mapping or graph to relational mapping.

And he said, the famous words, how hard can it be?

Let me just invent my own database that actually treats the data. In a way that looks more like my domain, so I don’t have this giant impedance mismatch.

There the property graph was born. We added labels a few years later in 2013, for reasons we can or can’t get into. But the idea is you have nodes—nodes can have any number of key-value pairs, which are properties. You have relationships that have a type and a direction, but relationships can also have properties.

James Kaplan: Which I absolutely love. Not only that we know each other, but how well do we know each other, right?

Philip Rathle: And how did the relationship start? You typically have fewer properties on relationships than on nodes, but these can be incredibly powerful. In lots of domains like Customer 360, there’s a whole set of data where you suspect things but don’t know them a hundred percent. But there are certain kinds of operations where you want to use that information. Having that waiting in the relationships, and having different types to denote it, can be very powerful. So that ended up being the core design—let’s solve enterprise database kinds of problems. What happened in parallel in the RDF sphere—sorry, I don’t think I’d used the term RDF yet—the semantic web—

James Kaplan: Well, before we come back to RDF versus LPG, which I very much want to get into—where does the famous Google blog post “Things, not strings” fit into all this?

Philip Rathle: So Google, things not strings was 2012.

Trillion-dollar graph companies already exist.

Philip Rathle: There’s been, some talk and blogging and you had a guest recently talking about context graphs as the next trillion dollar opportunity.

There is already a a trillion dollar graph company and that’s Google.

James Kaplan: There’s multiple trillion dollar graph companies.

Philip Rathle: I cannot disagree with that. Facebook is certainly one. LinkedIn is certainly

James Kaplan: And then—people sometimes ask, are graphs real? Can they scale? When people have asked me to explain a knowledge graph I say: you are a knowledge graph user. If you use LinkedIn, Facebook, Google, Wikipedia, you’re a knowledge graph user.

Philip Rathle: A hundred percent. So Google’s first, foray into web search, was in, I think 2 22 thou, 2000 or 2000, 2001, 2002, somewhere around

they, th they not only had the world’s largest index of the world wide web, that that wasn’t particularly differentiated. You had 30 other companies doing this.

James Kaplan: Alta Vista. Remember Alt Vista?

Philip Rathle: I Sure do. And of course, Yahoo and, web crawler and Lycos and so on and so

James Kaplan: Oh, oh my God. It’s old Home Week.

Philip Rathle: We’re just dating ourselves.

James Kaplan: Exactly. I, I was on a panel last night and, someone on the panel described having an iPad as a child. I was 40 right when the iPad came out, I think, or maybe a maybe or, or almost 40.

Philip Rathle: What initially made Google so successful was one graph algorithm—PageRank—a centrality algorithm. They solved the problem that you can easily gamify websites: if your ranking is based on the number of times a word appears on the page, you just repeat that word a gazillion times, white on white background, one-point font. That’s what people were doing. Google cut through that noise by saying, let’s rank webpages based on the number of inbound links, and let’s do recursive algorithms so you can’t gamify it—weight the inbound links based on the number of inbound links from the referring page. That took them about a decade.

After about a decade they acquired a company called Metaweb in 2010—which had built a knowledge graph. What does “knowledge graph” mean in the Google context? Take the worldwide web and the things that show up; put them into a graph so you have entities, and for each entity something that describes its meaning. So you search for “Rio.” If you don’t know the intent, you could bring back Rio the city, Rio the hotel casino—

James Kaplan: Duran Duran in the background—we had Commodore 64s.

Philip Rathle: There you go. So today when you use Google you’ll oftentimes get this pane on the right. They introduced that in 2012 with “Things, not strings”—a brilliant blog post that says it’s one thing to return text, another to return meaning. That knowledge-graph pane is actually a graph visualization of that node, where each blue link is a link to an adjacent node through a relationship—like “capital of” Albany. That catapulted them for another decade; they disrupted themselves, no one could keep up. They did it again in 2024.

They seem to do this in 12-year cycles—they described using LLMs but pulling data back from the knowledge graph. There’s a term for that: graph RAG—retrieval-augmented generation by pulling in information from the graph and using it in one of multiple ways. There are various flavors of graph RAG. So Google is a trillion-dollar graph company. “Things, not strings” I think is particularly relevant now, where people are trying vector search.

With vectors there’s a relative meaning—if I have word embeddings I can test the distance between these two and these two and see which is closer. But nowhere in that is any information about what the thing actually means in any human-understandable way. So it’s powerful, but you very quickly hit limitations.

Vector search and knowledge graphs as complements

James Kaplan: I was wondering if you could speak about the trade-offs and the synergies between vector search and graph databases. Vector doesn’t necessarily have meaning at very high scales; it can sort of collapse on itself. I especially like the scenario where you use vector search to populate a graph and then do combined deterministic and stochastic search on the graph. Could you comment on when to use vector search versus when to search a graph?

Philip Rathle: The first thing I’ll say is I don’t view these as mutually exclusive.

James Kaplan: It’s always a continuum. There’s always complementarity.

Philip Rathle: Vector started out as a specialized kind of database—really one index operation on one data type, one particular transformation. Now it’s in all databases; you don’t need a special database for it, including graph databases and the one I work on. Where vectors become powerful is when I have unstructured text and I’m trying to use it to inform an AI operation—specifically to inform an LLM. I ask a question and want to find a similar question that’s been asked in the past. You can find a similar chunk, map to that answer, bring back the text around it to the model and get a better answer. What you don’t get: it’s not normally used for structured data—it’s really an unstructured technique. You don’t really have fine-grained access controls, which are important for discernment and preventing certain data from being misused. Maybe you have it at a super coarse-grained level—

James Kaplan: I love the idea of having access control per node.

Philip Rathle: And per relationship.

James Kaplan: You may or may not want people to know that you and I know each other.

Philip Rathle: Or you may want people to know there’s a connection but not the nature of it. There’s actually a special permission called Traverse for that. So there’s real fine granularity once you have your knowledge graph in a graph database supporting these permissions. Other things: you can’t look at a vector and make sense of it—it’s just numbers. You could say it’s understandable by a machine; I’d argue it’s not, because what the machine “understands” is distance between vectors. There’s no meaning there; it’s just a discrete thing. So if I’m trying to understand supply chain, it’ll bring back the one thing that refers to supply chain, but I can’t store my supply chain in that format. I can’t store my employee hierarchy. These things I just brought up are where graphs come in. What we’ve found: when we have both vectors and graph—same database or different databases referencing each other—vector indexing is a handy trick for landing into a chunk. Then you can do entity extraction from the chunk to see what parts of the domain it refers to, and traverse from one world to the other. “Here’s a chunk of text—now I’m gonna go into the domain, pull back the supply chain, use that for ranking the vectors.” You might run PageRank, use it for filtering what’s brought back, use it for bringing back several levels of context around this particular thing. Another way to use the knowledge graph: have the AI system delegate certain questions to the knowledge graph—run a deterministic operation. A supply-chain question probably has an exact answer. Ultimate beneficial owner in finance—exact answer. And that answer might be 20 levels deep. You’re not gonna get there with an LLM or with vectors. But it’s easy to have the LLM translate the question into a graph query, delegate to the graph. It’s almost like the right hemisphere of the brain delegating to the left: go figure this one out. Pass it back, and you get an exact, explainable answer. So you get determinism in your AI system, which is pretty cool.

James Kaplan: I mean, you bring up a great example, which is, sort of beneficial owners, legal structures or legal entity structures and financial services, which is all sorts of fun to deal with. If we to, if we were to zoom out a bit, what do you think are like the four or five most exciting business use cases for graph databases over the next few years?

What are the things you say? This is where you see people getting a lot of value

Philip Rathle: I see a lot. This is really a horizontal phenomenon. One—let me start with an AI use case that hasn’t been so much in the headlines but I see a lot—is democratizing certain kinds of complex data operations that were accessible only to data scientists who knew how to query and pull the data together. I’ve seen this at companies like Uber, Comcast Business, Walmart. They’re taking complex questions that normally a business expert who understands a database query language has to go and ask—in that case Cypher slash GQL, the industry standard for graph database querying. You end up with one or two or maybe five top people in that department who understand that query language and have business domain expertise and can iteratively ask complex questions. When you say, let’s democratize it through an LLM, with a SQL database on the backend, the questions very quickly get intricate—next thing you know you’ve got 200-line SQL queries with 15 joins and 10 levels of recursion. I see far worse every day. Those queries might take half a day on million-dollar hardware. You put that into a graph database on modest hardware and they’ll run in subsecond.

Knowledge graphs for dynamic modeling

James Kaplan: What’s the intersection between graph databases and dynamic models,. if you think about many supply chains, many commercial markets, they’re graphs, right? You have customer, you have vendors, you have customers, you have assets, you have products, and there’s relationships between these things.

Are you seeing people doing dynamic modeling or doing complex modeling using graphs at all?

Philip Rathle: Yes, in two ways. It has to do with separating the storage layer and the reasoning layer. What are you modeling? The real world or the digital world. What makes it graph-worthy—where a graph database will add value? If it’s a simple model, you’re probably fine with Excel or basic SQL. Where it becomes valuable to put it into a graph is when you have data that shows up in the real world as a multi-level hierarchy or more. And of course some of these aren’t just hierarchies. HR: you have your direct reporting structure, but also communities of practice, mentors, dotted lines to projects—all of that changes over time. Each person has skills you can tag into the graph. People are in teams; you can include team composition relative to project. So anytime I have something like that—and likewise a digital twin of a car, of an aircraft, parts and parts of parts—

James Kaplan: and it would seem to me, doing a digital twin in a relational database seems like a painful thing to do, although I’m sure people do it.

Philip Rathle: We had an aircraft manufacturer going back over a decade that had an application that they would give to their clients to, essentially spec out their airplane. each time you choose one particular thing, like I want a screen in the seat back that would. all of these dependencies. And the dependency calculation would take like a day to run.

Each time you just added one thing and then you put that in a graph database and all of a sudden it’s running like that. So there’s lots of that. Essentially any data that shows up in the real world as a network or a hierarchy where you want to do a path or a journey—

Customer journey, patient journey. That kind of model ends up being perfect for storing in the graph. So now I can store my data structure in the graph, and I have the choice of where to reason about it.

James Kaplan: And what does reasoning about the data structure look like? Give me an example.

Philip Rathle: A simple one: I change a light bulb. I want to use this new kind of light bulb in a plane because I’ve deprecated the old kind. Maybe that changes the threading, which changes the housing, which changes the weights and balances. You have all these cascading dependencies and the stakes are incredibly high—you can’t get any of this wrong. Another example with automobiles, a real one from Volvo: they have the door system and the key fob. They want to do research and say, what other systems are impacted by the key fob? What a new person may not realize is if you hard-press the key fob, the windows roll up—so the window system is tied to the key fob, but you wouldn’t know that without a digital twin. They use a graph for the dependencies and relationships between this subsystem and that subsystem and this functionality—and to the person working on it—so I can go talk to them and make sure the work I’m doing doesn’t negatively impact their system. And inside computer and telephony networks it’s things like, what are all the attack vectors? Red-teaming,

James Kaplan: cybersecurity, at least in the enterprise. Cybersecurity was an early adopter of graphs,

Philip Rathle: And there are a lot of cybersecurity companies that embed graph databases inside of their, inside of their offerings. There are also a lot of cybersecurity activities that aren’t yet covered by off the shelf software. And particularly like banks go off and build their own, adjunct cybersecurity solutions to close those gaps using graphs.

Tech environments, corporations, file systems -- all graphs

James Kaplan: One of the reasons a CMDB—a configuration management database—is often so painful is that trying to model a modern technology environment in a relational database borders on insanity. Any modern environment where you have applications that support business processes, applications that run on infrastructure, applications that consume data, infrastructure that consumes other infrastructure—it’s just naturally a graph.

Philip Rathle: This is a great example. We have a public case study with British Telecom. BT has a CMDB, an off-the-shelf one they’ve been using for years. A bunch of stuff runs in it; it does certain things well. It doesn’t do dependency analysis or network planning well—and those are pretty key. So they have a mirror image where the graph feeds on that; it’s kept up to date and enables a lot more capability. Back to your modeling and reasoning question. There are a few kinds of reasoning. One is simply connect the dots—sounds basic, but connect the dots across 20 levels. You can do things that seem like magic. Almost every pharma company we’re involved with does drug discovery by connecting the dots between a hundred thousand pharmaceutical compound candidates at one end and, at the other end, everything that’s known about some kind of ailment and how it interacts with the body—just connect the dots across the body in between. So you could look at this as connect-the-dots graph reasoning; no one’s really come up with the term, so I’m doing it myself. What you could also do: if I want an LLM to be a good sparring partner or an agent to come up with its own hypotheses, dip into the graph, pull back something four, five, ten levels out, hand that back to the model as sentences. Node–relationship–node can easily be translated into English; feed that as a giant piece of context or a giant prompt. Now I’ve got an agent that can riff because it understands more about that particular domain at that point in time—stripped of what’s appropriate for that use. Say I have a firewall between investment banking and corporate banking.

James Kaplan: Another couple of examples: it’s a natural fit for modeling an organization, particularly when we don’t have entirely rigid hierarchies. Once you move to Joe reports to Bob who reports to Sally who reports to Sue, and you have project teams, people working across organizations, matrix reports—that begins to feel much more like a graph than the kind of simple hierarchy you can easily model in a relational database.

Philip Rathle: This is very true. You have companies like Daimler that when they spin up a new team actually refer to a graph to include all the things you described—where people are physically located, time zones, team members who are long-term experts in a particular domain, some people from outside the company to bring new perspectives, some who have worked together and some who haven’t. You do that with a graph. If listeners want to tune in, Walmart has a graph-based application available to all employees for career-journey analysis for exactly this reason. So graphs are showing up—on any given day you’re using graph technology in the background in more ways than meet the eye.

James Kaplan: And then B2B CRM. B2C CRM is a bit simpler—more of a hub-and-spoke model. I know a cable provider that has a relationship with a large number of atomistic households. But once you get to B2B sales—capital markets, group health insurance—you have multiple entities in the customer, multiple people in the selling organization, relationships that evolve over time, multiple products. That feels like a natural graph.

Philip Rathle: So funny enough, I don’t think I’ve talked to you about this, James. I co-founded a startup in the early 2000s after being involved in building United Airlines’ first Customer 360 system—focused on passengers. We went out and talked to other airlines and found the top problem they wanted to solve was what you said: not passenger, corporate—the mapping of their sales rep to some level in some giant corporation, some big hierarchy. We had a standard out-of-the-box relational model; we did our best and ended up punting because every company has a different number of hierarchies and we couldn’t generalize. It was super hard. Fast forward to when I started at Neo4j—we already had four or five customers. The very first was Cisco, and their use case was exactly that. They had a system, still running, called the Hierarchy Management Platform, initially to deal with the fact that Cisco acquires a new company every month. Each company has salespeople that attach into a sales hierarchy, products into a product hierarchy, and territory mappings. You need this super complex, multi-level calculation across multiple hierarchies to figure out the commission trail, ownership, and exception assignment—this person has this territory but this one other account in some other territory because they have a personal relationship. You can’t even model some of that in a relational database. So that’s a pitch-perfect use case. And then you overlay AI on top—agents with access to these data structures. They interact with them pretty well, because they’ve been trained on a decade plus of graph modeling and graph language. You end up being able to solve these complex problems much more naturally than with a relational database.

James Kaplan: A file system is in many respects a graph. We all struggle with finding the right document at the right time—maybe I’m describing knowledge management. I’ve talked about the Obsidian-Cursor stack I built to manage my own professional life. You could argue the way we interact with documents is in many respects a graph.

Philip Rathle: And the way we interact with ideas. Mind maps have been around for a while—that’s a form of graph. There’s a note-taking company called Roam Research that’s been out for a while and recognizes that we think in networks; it’s a network-based note-taking tool that takes mind maps to the next level. I keep waiting for Drive or some equivalent to incorporate graph structures. When you have data in a graph structure you can run centrality—what’s the most relevant document—filtered by recency, by people in my function. It lends itself to much more powerful search. In this world of AI we have a lot of powerful search capability; a lot of off-the-shelf products are primarily vector-based. Some use the term “knowledge graph” to describe going maybe one level, generously, with the graph, but they’re not fundamentally structured to allow the rich exploration you’re describing. So companies are building their own. No doubt we’ll see off-the-shelf graph-native options to solve the search problem we’ve had for years.

James Kaplan: Where do you, let’s just pivot a little bit. What does ontology mean to you and how important is ontology and are you one of the people if there’s a continuum between those people who think that you should have the ontology nailed down upfront, and those people who think that the an ontology should be emergent, where do you land on that continuum?

Philip Rathle: It’s funny. For the longest time, it was a piece of jargon that I tried to get our teams to avoid using because we like to use friendly terms like node in relationship instead and Edge,

Like nobody needs more terms and more complexity. Having said that, like, I credit to Alex Karp for having popularized that term. , It’s a super important for companies to have a handle on inside of their data systems.

You’ve got three kinds of graphs.

One kind is metadata: how all these different systems hook together—provenance, lineage, definitions. There’s no actual data in that. Then you have your actual data, maybe two levels down. Your ontology is a definitional structure of the things available in your domain—what’s the structure. It’s super abstract. Let me give you an example. Say I’m IKEA. IKEA has a product hierarchy and a product catalog, and there’s a hierarchy of meaning associated with that: furniture—underneath that outdoor and indoor, home versus business—tables and chairs that fit in one category or the other, and a hierarchy of brands, colors, collections, and so on. So your ontology describes the unique set, grouped as a hierarchy, of what the things in your domain are. Oftentimes it’s multiple overlapping hierarchies. Usually the first thing you do with graph data is bring in an ontology from somewhere. These things are usually already lying around—it sounds like a mysterious new term, but you probably already have a database, probably relational, reflected in many places: a product master, your product catalog. Bring that in so you can use it when making a product recommendation—“I’ve got three things in my shopping cart, what are they trying to do?” To understand that you need to understand what kinds of things they are and how those things go together. That’s a collaboration between your ontology and your graph of what’s been bought and what’s in the cart. There are some big ontologies out there—eBay has had a billion products in an ontology in a graph with us for many years. In the rare cases where you don’t have ontologies in structured form, you can often infer them from unstructured documents using unstructured-to-graph construction—for example, when you’re trying to understand a competitor’s ontology.

James Kaplan: Ontology creation process can be very strategic. It’s actually quite a good thing for a management team to have an explicit discussion about what they mean by a product or what they mean by a customer, or what they mean by a service offering and how those things relate to one another.

Philip Rathle: That’s right. I think there’s an observed reluctance to hire people or have people spend their cycles managing ontologies. I’m of two minds. One: there’s a lot that can be automated and a lot of work already done—so there’s probably a lot you can do with what you already have. The other piece of good news is you can do a lot with agents riffing on each other to improve it. The third is there’s a lot of cool gamification. I had lunch with my brother-in-law, who’s a doctor; he said he often gets garbage recommendations from the AI system but then has an option to annotate—what was wrong with this? That’s feedback, a way to crowdsource ontology creation, kind of like Wikipedia but more in-stream. Having said all of that, these things are so high-value—as you point out—that it’s worth having some small number of people spend time making sure they’re correct, especially when your AI comes back and says “I’m 80% sure but not totally sure.” In the same way that foundation-model companies, whose entire business depends on high-quality data, use the likes of Scale AI and lots of people to do RLHF—reinforcement learning with human feedback—and other annotation. There is some value in doing some of that in-house, as much as I know that’s not a popular thing to hear, and in addressing the long tail of the ontology. But you don’t necessarily need to start there.

Is context graph a form of knowledge graph?

James Kaplan: So what, what is a context graph? How is that similar or different from a knowledge graph?

Philip Rathle: That word’s being used in a lot of different ways. You had the Foundation Capital blog post—I think that’s one legit and powerful definition: your decision traces and the graph that comes out of that. What’s a decision trace? Information about the actor and what’s being acted upon, and all the people in the approval process for that decision—so that if you pull the thread you have information about why a decision was made. I have my actor and the thing being acted upon. Then, if we’re talking ontology, I need to know what that thing is. So now I have my ontology. “Semantic context” is often used to describe the metadata graph—these terms get recycled a lot.

I might pull data back at a metadata layer, then some purchase history around the person who carried out that action—now I’m pulling back a graph of what’s been bought, how it relates, what’s been returned, complained about. Something challenging about that definition is you end up pulling in everything. What’s reflective of the world I’ve been living in, having worked with graphs for close to 14 years, is that this is the nature of the graph: you pull the thread and suddenly you have the entire world. We say “digital twin of a—” What’s good about it, or what I recommend: don’t get caught up or scared off by the idea that you need to boil the ocean and have a graph of everything before you get any value.

That’s very much not the case. You can start small, get value in a very short time by deciding what particular problem you’re trying to solve and what’s the minimum viable graph around that. Oftentimes it’s fairly small; you can start at a departmental level. What’s great with a graph database—and we should disambiguate: there’s context graph, knowledge graph, graph database (the technology you put your knowledge graphs and context graphs in, purpose-built for that)—you have schema-flexible implementation.

You can add more data without a schema migration; your model can evolve; you can add to your graph and evolve your use case and have an asset you build on and reuse rather than a one-off you have to recreate from scratch.

James Kaplan: Maybe this is overly simplistic, but I see a context graph as another instantiation of a knowledge graph—a form of knowledge graph that focuses on decisions.

Philip Rathle: That’s right. It’s either that or synonymous with knowledge graph—I could argue both. At the end of the day, what we call them is maybe less important than understanding how this stuff can be useful.

James Kaplan: Thank you so much. This has been wonderful.

Philip Rathle: It’s been a pleasure, James.