All work · Index
AI · Delivery Intelligence

Predictive delivery intelligence for engineering leaders

Stop reading status. Start reading delivery.

PLUTO — hero
Fig. A — Engineering panel — Execution Health decomposed into four dimensions
I

Context

Engineering leadership runs on a delayed feedback loop. By the time a sprint review reveals a slip, the slip is two weeks old and three downstream decisions have already been made on top of it. Every existing tool — Jira, Linear, ClickUp, GitHub Insights — describes the past. None of them tell a director what is about to break, or what to do about it. The result is a leadership layer that is permanently reactive to its own organisation.

II

Problem

The signals that predict delivery risk — PR queue ageing, review-time drift, scope creep mid-sprint, ownership concentration on a single file, after-hours spikes — exist in the raw data. They are just spread across two systems, expressed in different vocabularies, and visible to nobody whose job it is to act on them. Worse: the people closest to the signal (engineers) are the people least incentivised to surface it. The result is that risk only becomes visible after it has already cost something.

III

Approach

PLUTO ingests GitHub and ClickUp into a unified Engineering State Graph — tasks, PRs, commits, people, sprints, repos as nodes; assignments, reviews, dependencies, file ownership as edges. On top of the graph runs a forecasting layer: a single Execution Health score (0–100) with four named dimensions, sprint-completion probability, and a Bottlenecks rank that puts the slowest PRs and longest review queues at the top of the leadership view. Alerts are severity-ranked, evidence-linked, and trained by manual feedback on every dismissal.

IV

System

  1. 01

    GitHub + ClickUp connectors with webhook-first ingestion and a typed event store — idempotent, deduplicated, schema-versioned.

  2. 02

    A normalised Engineering State Graph linking task → PR → commit → file → person → sprint. Every edge is queryable, every node is incrementally updated.

  3. 03

    Execution Health score (0–100) decomposed into Code Flow, Planning Stability, Review Efficiency, and Work Distribution — drilldown per team, sprint, repo.

  4. 04

    Sprint Status view with planned-vs-completed, top blockers, and a probability number that updates as the sprint unfolds.

  5. 05

    Bottlenecks panel — slowest PRs, longest reviewer queues, blocked tasks — ranked by time lost, not by chronology.

  6. 06

    Structural Risk heatmap exposing ownership concentration and single-person failure points before they become an incident.

  7. 07

    Alert lifecycle (create → acknowledge → resolve) with feedback that trains the model on what counts as a real risk in this organisation.

Fig. — schematic

How the pieces fit.

GitHubprs · commits · filesClickUptasks · sprintsstate graphtaskprcmtpplsprrepoforecasthealth72sprint p0.84review44flag · 5d earlyunified state graph · forecast · alert before the slip
Fig. — execution
VI

Architecture

L01Frontend
  • Next.js
  • TypeScript
  • Recharts
  • Five-panel dashboard
L02API
  • FastAPI
  • Pydantic schemas
  • Role-aware queries
  • Webhook ingress
L03Intelligence
  • Forecast models
  • Bayesian sprint probability
  • Bottleneck ranking
  • Burnout signal detection
L04Data
  • PostgreSQL
  • Event store
  • Engineering State Graph
  • Per-tenant partitioning
L05Ingestion
  • GitHub connector
  • ClickUp connector
  • Webhook + polling fallback
  • Celery workers
L06Infra
  • Docker
  • Kubernetes (scale path)
  • Slack + email alerting
Fig. — system stratification
VII

Outcome

Pilot orgs running PLUTO catch at-risk deliverables ~5 days before the deadline, where the previous baseline was the day-of. Median PR review time fell by more than half once the Bottlenecks panel made the longest queues unignorable. Leadership reviews moved from "what happened" to "what are we doing about the three flags on the board" — a different conversation entirely.

M01Early-warning coverage
90
% target
M02Milestone date error
±10
%
M03Median PR review time
−60
%
M04Sprint predictability
80
% commitments met
VIII

Learnings

  • /01

    A single Execution Health score collapses too much. Four named dimensions ("why is it red?") was the difference between a number leaders ignored and a number they acted on.

  • /02

    Webhook-first ingestion is the only sane default. Polling is a fallback, never the primary, because the lag destroys the early-warning property.

  • /03

    Burnout signals are real but politically dangerous. We kept them as a private manager view and never wired them into team-facing dashboards.

  • /04

    The graph turned out to be more valuable than the dashboards. Half the wins came from one-off queries over the state graph that the prebuilt panels did not anticipate.

Backlinks

Where to look next.

Stack
PythonFastAPIPostgreSQLNext.jsCeleryDocker