AI agents are failing in production — and the reason isn't bad models. It's missing context.
Not missing data. Missing decision context — the reasoning, exceptions, approvals, and tribal knowledge that explain why work happens the way it does. The stuff that lives in people's heads, in Slack threads, and in the way a senior team member clicks through three systems before making a judgment call nobody documented.
This is the problem that context engineering and context graphs are built to solve. And it's a problem we've been solving at Gralio — not in theory, but in production. Our process capture method built the context that powers a live AI agent automating 82% of a team's biggest workflow. Here's how we did it at GT Golf.
Context Capture → Working AI Agent: The GT Golf Story
GT Golf is a PE-owned distributor of golf consumables serving courses, pro shops, and retail stores across the US. Like many post-acquisition businesses, they'd inherited a patchwork of systems, processes, and tribal knowledge — with no unified picture of how work actually gets done.
We deployed the Gralio diagnostic across their 15-person customer support and order-taking team. Two weeks of screen recordings — the equivalent of embedding 15 consultants to shadow every team member without the disruption or cost.
The diagnostic captured something no system of record had: the full operational context of how the team actually worked. Every process, every tool, every decision point, every workaround — as-is, not as-reported.
What came out of that context capture?
A live AI agent that now automates 82% of email order entry.
The team had been manually reading incoming order emails — PDFs, Excel files, plain text, even photos of handwritten notes — and transcribing every detail into NetSuite sales orders. A single order could take 10+ minutes of careful data entry.
Our AI agent monitors the inbox, reads orders in any format, extracts the key data, fuzzy-matches customers and SKUs against NetSuite records, and creates a draft sales order — all within 30 seconds. But here's the part that matters for the context graph story: the agent uses captured decision context to make the right calls.
It selects the correct warehouse location based on past order history. It matches remaining data fields based on observed past behavior patterns. It handles the exception logic that the team had been carrying in their heads — the tribal knowledge of how orders get processed, not just that they get processed.
None of this was in a database. None of it was in a policy document. It lived in the way people worked — and we captured it through screen recordings before it could be lost.
The diagnostic also surfaced context that no interview would have uncovered:
- One location had SaaS tooling that automated a chunk of work. Another location was doing the same work by hand — unaware the tool existed and was already licensed. Tribal knowledge loss, visible only through observation.
- Employees were clicking through a phantom popup hundreds of times per week. It served no purpose. Nobody questioned it because the context of why it existed had been lost.
- A built-in ERP feature was going unused because the knowledge of its existence hadn't survived team turnover.
This is what context capture looks like in practice. And it's exactly what the enterprise AI world is now calling a context graph. Read the full GT Golf case study →
What Is a Context Graph — and Why Is Everyone Talking About It?
A context graph is a structured, queryable record of how decisions were made inside an organization. Not just what happened (the CRM already stores that), but why it was allowed to happen — who approved what, which policy applied, what exception was granted, and which precedent from last quarter justified it.
The term gained traction in late 2025 when Foundation Capital investors Jaya Gupta and Ashu Garg published a thesis arguing that the next trillion-dollar enterprise platforms won't win by storing more data — they'll win by capturing the decision traces that explain why data became what it is.
The argument resonated because it described a problem AI teams were already hitting in the field. Agents deploying into real workflows — contract review, order processing, support escalation — kept running into ambiguity that better data access alone couldn't resolve.
Since then, the idea has moved fast:
- Gartner declared 2026 "The Year of Context" at their March D&A Summit in Orlando, positioning context engineering as critical infrastructure for enterprise AI. Their data shows organizations with the highest AI satisfaction invest nearly twice as much in foundations — data quality, governance, context — as in AI tools themselves.
- Anthropic, Google, and Manus have all published detailed context engineering frameworks in 2026, establishing it as a core discipline for building production AI agents.
- The EU AI Act's Article 12 mandates decision logging for high-risk AI systems by August 2026 — directly validating the decision trace architecture.
- AI governance platform spending is projected to hit $492 million this year and exceed $1 billion by 2030.
Context engineering AI is no longer a thesis. It's a procurement line item.
The Problem: Enterprises Run on Knowledge Nobody Captures
Enterprises have systems of record. Salesforce for customers. NetSuite for orders. Workday for employees. These systems store current state well. The deal is closed-won. The order is fulfilled. The employee is onboarded.
What they don't store is the decision logic that produced those outcomes. The Foundation Capital thesis identifies four categories of missing context — and they map exactly to what we observe in every Gralio engagement:
- Exception logic that lives in people's heads. "We always give this customer type an extra discount because their procurement cycles are brutal." That's not in the CRM. It's tribal knowledge passed down through onboarding and side conversations.
- Precedent from past decisions. "We structured a similar deal last quarter — we should be consistent." No system links those two deals or records why the structure was chosen.
- Cross-system synthesis. A support lead checks customer ARR in Salesforce, sees open escalations in Zendesk, reads a Slack thread flagging churn risk, and decides to escalate. That synthesis happens in their head. The ticket just says "escalated to Tier 3."
- Approvals that happen outside systems. A VP approves a discount on a Zoom call. The opportunity record shows the final price. It doesn't show who approved the deviation or why.
This isn't dirty data or siloed data. It's reasoning that was never treated as data in the first place.
And this is exactly what AI agents need to function autonomously. Without it, they either escalate everything to a human (defeating the purpose) or guess (creating risk). Gartner and Forrester both warn that over 40% of agentic AI projects will be abandoned by 2027 due to unclear value and governance failures. The pattern is consistent: agents that aren't grounded in organizational context don't survive contact with reality.
Why This Can't Be Solved from the Top Down
The instinct for many enterprises is to solve this architecturally — build a context layer, wire up the data warehouse, add semantic models. Gartner's framework even prescribes it: assemble multidisciplinary teams, select graph databases, build ontologies.
That's fine in theory. But it skips a fundamental step: you can't capture decision context if you don't know where decisions are being made.
Existing systems of record are structurally blind to this. Salesforce knows the deal is closed but not the twelve steps, workarounds, and judgment calls that produced the outcome. Data warehouses like Snowflake receive data via ETL after decisions are made — by the time data lands, the decision context is gone. Even the new orchestration-layer startups can only capture traces for workflows they're already running.
Before you can build a context graph, you need to answer a more basic question: what does work actually look like in this organization? Which processes involve judgment? Where are the exceptions? What tribal knowledge are people carrying?
That's the gap. And it's where screen-level process capture — watching how work actually happens — becomes the critical first step.
How Gralio Builds the Foundation for Context Graphs
Gralio captures how knowledge workers actually do their jobs — through screen recordings — and transforms that raw activity data into structured process maps, time analytics, inefficiency reports, decision traces, and auto-generated SOPs.
What does that mean in practice? Team members install a lightweight recorder that runs quietly while they work normally. No interviews. No workshops. No consultants shadowing people with clipboards. The AI analyzes the recordings and produces a complete picture of how the organization operates — as-is, not as-reported.
This is the foundation layer that context graphs require but that nobody else provides. And as the GT Golf case proves, it's the foundation layer that makes AI agents actually work.
We Capture Where Decisions Happen
Our AI processing pipeline doesn't just log clicks. It identifies distinct business processes, maps them as flowcharts with decision points, and tracks time allocation across every process and every team member.
More importantly, it captures six categories of inefficiency that map directly to where decision context lives:
- Context gathering — copy-pasting across apps to build context before making a decision
- Cross-system synthesis — excessive tab and app switching for basic lookups
- Manual data transfer — copying values between tools manually
- Re-finding — searching for the same entity repeatedly across systems
- Manual reformatting — reshaping data before it can be used
- Duplicate entry — entering the same data in multiple systems
Each of these is a signal. When someone copies data from three different systems before making a call, that's a decision surface — exactly the kind of high-judgment, exception-heavy moment that context graph proponents identify as most valuable to capture.
At GT Golf, this is how we identified email order entry as the #1 automation target. The diagnostic showed the team spending the majority of their time on a process that was pure cross-system synthesis: reading an email, interpreting the order format, looking up customers and SKUs in NetSuite, applying location-selection logic from memory, and transcribing everything manually. The context we captured — the matching logic, the exception patterns, the behavioral precedents — became the specification for the AI agent.
We Surface Tribal Knowledge
Our decision trace agent identifies six categories of undocumented organizational knowledge from screen recordings:
- Exception logic — tribal knowledge that drives non-standard behavior
- Precedent — past decisions informing current ones
- Cross-system synthesis — decisions made by pulling from multiple tools simultaneously
- Informal approvals — approvals via Slack, Zoom, or verbal channels not recorded in official systems
- Judgment calls — business judgment not documented in policy
- Policy overrides — deviations from standard procedure with reasoning
These six categories map almost exactly to the Foundation Capital thesis's description of what's missing from enterprise data models. The difference is we're not theorizing about it — we're extracting it from real observations of real work, and using it to build agents that actually function in production.
We Build the Knowledge Structure
Every observation Gralio makes feeds into an interconnected knowledge graph: process diagrams linked to time logs, linked to decision traces, linked to app usage, linked to inefficiency patterns, linked back to the original recordings.
Every insight traces back to its source via artifact IDs. You can go from a process map to the specific recording where a team member demonstrated the exception handling logic. That's not a dashboard — it's an auditable record of how work actually gets done.
And critically, it's the record that makes AI agents buildable. The GT Golf agent doesn't just follow a static script. It applies context — past order behavior, customer matching logic, location selection patterns — because that context was captured, structured, and made available.
From Observation to Automation: The Full Sequence
The context graph conversation is happening at an architectural level — orchestration layers, semantic models, graph databases. That's important work. But it assumes you already know which processes involve judgment, where the exceptions concentrate, and what tribal knowledge is at risk.
Most organizations don't know this. They think they know — but what they have is what people report in interviews and workshops, not what actually happens on screen.
The sequence that actually works:
- Step 1: Observe. Capture how work actually happens at the screen level. Identify processes, decision points, exceptions, and tribal knowledge. This is the precondition everything else depends on.
- Step 2: Document. Transform observations into structured process maps, SOPs, and decision traces. Build the knowledge graph.
- Step 3: Automate. Use the captured context to build AI agents that handle the routine while preserving the decision logic. At GT Golf, this step took weeks — not months — because the context was already captured.
- Step 4: Compound. The system gets smarter over time. New recordings refine existing process maps. Decision traces become searchable precedent. Every automated decision adds another trace to the graph.
At Sylvan, an a16z-backed MEP construction contractor, this sequence identified 70–85% automation potential across four finance workflows in two weeks. The first build — automated financial reporting — replaced 20–25 hours of monthly manual work. The key insight: the reporting logic was tribal knowledge held by one person. We extracted it from screen recordings and codified it into automation.
For a global payroll operation, we generated production-ready SOPs complete with annotated screenshots, decision points, and exception handling — all from screen recordings of specialists performing their normal work. Country-specific tax logic, reconciliation procedures, termination workflows — the kind of knowledge that typically takes months to transfer during onboarding, captured and documented in days.
Why This Matters Now
Gartner predicts 40% of enterprise applications will include AI agents by the end of 2026. But the agents that work — the ones that survive contact with real workflows — are the ones grounded in organizational context.
The EU AI Act makes decision logging legally binding for high-risk systems by August 2026. But even without regulation, the business case is straightforward: if you're deploying AI agents into real workflows, those agents need context. And the fastest way to get that context is to observe how work actually happens.
We've seen this firsthand. The GT Golf agent doesn't work because the model is smart. It works because the context is right — because we captured the decision logic, the matching patterns, the exception handling, and the behavioral precedents that make the difference between an agent that guesses and an agent that gets it right 82% of the time with zero human input.
That's what Gralio does. We don't just map processes — we capture the decision intelligence that makes AI transformation possible. See how it worked at GT Golf →

