AI Data Sync & Freshness: Notion, Google Docs, Sheets & Linear in Rocket

How we keep Notion, Google Docs, Google Sheets, and Linear data fresh inside Rocket — so the AI never reasons over stale context.

The Problem

Here's something that doesn't come up enough in AI product discussions: when a user connects their Notion workspace or a Google Sheet to your product, they assume the AI just... knows what's in it. Right now.

In Rocket, users attach external sources — Notion pages, Google Docs, Sheets, Linear boards — as context for Solve and Build. The AI reads these sources, reasons over them, and writes back: creating Linear issues from a build plan, populating a Sheet with structured output, pushing decisions into Notion.

That means the data has to be current. And keeping it current across four providers, each with their own APIs, auth models, and rate limits — that's the engineering challenge worth talking about.

Splitting the Data Model: Why One Pipeline Wasn't Enough

We tried a single sync pipeline first. It didn't survive contact with real usage patterns. The issue: project-level context (a Notion spec pasted once, read across dozens of sessions) and session-level integrations (the AI actively creating Linear issues mid-conversation) have fundamentally different consistency requirements. Forcing both through the same sync mechanism meant either over-fetching static docs or under-serving live interactions.

So we split into two pipelines:

Project Context handles long-lived, read-heavy data. A user drops a URL, we ingest it, and it's available across all sessions in that project. Freshness tolerance is minutes to hours — the content changes infrequently, and the cost of a slightly stale product spec is low.

Thread Integrations handle live, read-write connections scoped to a single Build session. When the AI writes a Linear issue, it needs to read that issue back on the very next turn. This demands read-after-write consistency — the same guarantee you'd expect from a database, except the "database" is a third-party API with its own latency, rate limits, and eventual consistency behavior.

Both feed into a single context assembly layer that merges what the project "knows" (attached documents) with what the thread "can do" (live tool connections). The AI sees one unified context. The freshness contracts behind that context are entirely different.

This split was the single most important architectural decision we made. Everything else — sync strategy, freshness checks, re-fetching — follows from it.

Bidirectional Sync: Writing Back to Connected Sources

Reading external sources is the straightforward half. Writing back introduces harder consistency problems:

Notion: Push decisions and generated documentation back as properly structured blocks
Google Docs: Append generated specs or analysis to existing documents
Google Sheets: Populate structured output into specific cell ranges
Linear: Create issues from build plans — titles, descriptions, labels, assignments — directly in the user's sprint board

After every write-back, we immediately re-fetch the affected source so the AI's context reflects the latest state. The AI must see its own writes on the next turn. Without this, you get duplicate Linear issues, repeated Sheets data, confused conversations. This was a non-negotiable design constraint from day one.

Auth Lifecycle: The Silent Reliability Problem

More user-visible failures come from expired OAuth tokens than from any actual bug in our sync logic. Each provider has its own token expiry behavior — some last hours, some much longer, and any of them can be revoked at any time.

When a sync fails due to an auth error, the system detects it and prompts the user to re-authorize inline — right where they are, without navigating away. Once re-authorized, the failed operation retries automatically. No manual reconnection steps, no "go to settings and fix it" friction.

This is a small detail that disproportionately affects reliability. Without it, users hit a stale-context problem, don't understand why, and blame the AI.

What We Took Away

The data model split was everything. Read-heavy project context and read-write session integrations need different consistency guarantees. Trying to serve both from one pipeline is a false economy.

Users don't think about sync. Linking a Notion page means "the AI knows this." Every sync failure that surfaces as a confusing AI response is a design failure on our end.

Write-back demands database-grade consistency across API boundaries. Read-after-write consistency is table stakes in databases. Achieving it across third-party APIs with rate limits and variable latency required rethinking how we keep context current after every write.

Table of contents

Connected Sources That Stay Current: The Live Sync Problem Nobody Talks About

The Problem

Splitting the Data Model: Why One Pipeline Wasn't Enough

Bidirectional Sync: Writing Back to Connected Sources

Auth Lifecycle: The Silent Reliability Problem

What We Took Away

The work is only as good as the thinking before it.

The Problem

Splitting the Data Model: Why One Pipeline Wasn't Enough

Bidirectional Sync: Writing Back to Connected Sources

Auth Lifecycle: The Silent Reliability Problem

What We Took Away