DORA Metrics Without the Spreadsheet: Setting Up DevLake in Your Platform

Every engineering leader says they care about DORA metrics. Fewer actually measure them. And the ones who do often measure them wrong — counting deploys from a spreadsheet, estimating lead time by feel, and declaring the team “elite” based on vibes.

DevLake fixes this. It is an open-source engineering analytics platform that pulls data from your real tools — GitHub, GitLab, Jira, Jenkins, ArgoCD, PagerDuty — correlates it, and produces accurate DORA dashboards without anyone having to manually track anything.

This post covers what DORA metrics actually mean, why most teams measure them badly, and how to get a working DevLake setup in an afternoon.

What DORA Metrics Are (And Aren’t)

The DORA research program identified four metrics that consistently predict software delivery performance and organisational health:

Metric	What it measures
Deployment Frequency	How often you ship to production
Lead Time for Changes	Time from commit to production
Change Failure Rate	Percentage of deployments that cause an incident
Mean Time to Recovery	How long to restore service after a failure

The benchmarks matter less than the direction. An elite team deploys multiple times per day with sub-hour lead time, under 5% change failure rate, and recovers in under an hour. A low performer ships monthly, takes weeks from commit to prod, breaks things 46% of the time, and takes days to recover.

What DORA metrics are not: a tool for ranking engineers. Lead time measures your deployment pipeline, not individual velocity. Change failure rate measures incident response and testing maturity, not developer carelessness. Measuring these at an individual level is both wrong and counterproductive.

Why Most Teams Measure Them Badly

The typical approach: ask someone to pull deployment data from CI, pull incident data from PagerDuty, and join them in a spreadsheet. This fails for several reasons:

Deployment data lives in three different systems and none of them agree on what counts as a “deployment”
Lead time requires linking a production deployment back to the commits it contains, which requires knowing what SHA was deployed and when
Change failure rate requires correlating deployments with incidents in the same time window, which requires both systems to have accurate timestamps
Nobody keeps the spreadsheet updated

DevLake automates all of this by treating your tools as data sources, pulling their APIs on a schedule, and building a unified data model that already understands the relationships.

How DevLake Works

DevLake has three layers:

Plugins — connectors for each data source (GitHub, Jira, Jenkins, etc.) that pull data via API and store it in a normalised schema
Domain layer — a common data model that maps source-specific concepts (GitHub PR, Jira ticket, ArgoCD deployment) to generic ones (pull request, issue, deployment, incident)
Grafana dashboards — pre-built DORA dashboards that query the domain layer

The result: you configure your data sources once, and the DORA dashboards populate automatically.

Setting Up DevLake

Prerequisites

Docker and Docker Compose
Access to your data sources (GitHub token, Jira credentials, etc.)
About 30 minutes

Step 1 — Run DevLake with Docker Compose

curl -s https://raw.githubusercontent.com/apache/incubator-devlake/main/docker-compose.yml \
  -o docker-compose.yml
curl -s https://raw.githubusercontent.com/apache/incubator-devlake/main/env.example \
  -o .env
docker compose up -d

DevLake exposes three services:

localhost:4000 — the DevLake config UI
localhost:3002 — Grafana (dashboards)
The MySQL database (internal)

Step 2 — Connect Your GitHub Data Source

In the DevLake UI at localhost:4000, go to Connections → GitHub and create a connection with a personal access token that has repo scope.

Then create a project and add a scope — this is where you specify which repos to include. For a monorepo setup, add the single repo. For a multi-repo org, add each repo that counts as a deployable unit.

The GitHub plugin pulls:

Pull requests (for lead time calculation)
Commits (for linking PRs to deployments)
Releases (which DevLake can treat as deployment events)

Step 3 — Define Deployments

This is the step most setups get wrong. DevLake needs to know what counts as a deployment. There are three options depending on your stack:

Option A: GitHub releases

If you cut a GitHub release on every production deploy, DevLake can use that. In the scope config, set the deployment pattern to match your release tag format:

v\d+\.\d+\.\d+

Option B: CI/CD pipeline runs

If you use GitHub Actions with a named deployment workflow, configure the plugin to treat runs of that workflow as deployments:

deploy-production

Option C: ArgoCD (recommended for Kubernetes shops)

Connect the ArgoCD plugin. It understands sync operations natively and maps them to deployments in DevLake’s domain layer. This gives you the most accurate data because ArgoCD knows exactly when a revision was running in production.

# DevLake ArgoCD connection config
endpoint: https://argocd.your-cluster.internal
token: <argocd-api-token>

Step 4 — Connect Incident Data

Change failure rate and MTTR require incident data. Connect PagerDuty, OpsGenie, or your incident management tool.

For PagerDuty:

Connection type: PagerDuty
API token: <your-pagerduty-token>

DevLake will correlate incidents with deployments by timestamp: if an incident opens within a configurable window after a deployment, it counts as a change failure.

The default window is 24 hours. For high-frequency teams, tighten this to 1–4 hours. For weekly-deploy teams, 72 hours is more realistic.

Step 5 — Configure the DORA Dashboard

Open Grafana at localhost:3002 (default credentials: admin/admin). The DORA dashboard is pre-installed under Dashboards → DevLake → DORA.

Set the time range to at least 90 days for meaningful baselines. The four panels map directly to the four metrics:

Deployment Frequency: deployments per day/week/month
Lead Time for Changes: median and 95th percentile, from first commit to deployment
Change Failure Rate: incidents / total deployments
MTTR: median incident duration for deployment-correlated incidents

Interpreting the Numbers

A few things to calibrate before drawing conclusions:

Lead time will be longer than you expect. DevLake measures from the first commit in a PR, not from when the PR was created. If engineers commit early and iterate, that time counts. This is correct — it reflects actual cycle time.

Change failure rate is not just about bugs. A deployment that triggers an alert because of a config change you intended counts as a failure if it opens an incident. This is also correct — the system experienced degradation.

MTTR includes detection time. If your monitoring takes 20 minutes to page someone after a bad deploy, that 20 minutes is in your MTTR. Improving alerting latency improves MTTR without changing anything about how you respond.

Running DevLake in Production

The Docker Compose setup is fine for evaluation. For a persistent installation:

Use an external MySQL or PostgreSQL instance for the database
Set up a Kubernetes deployment using the official Helm chart
Configure data collection to run on a schedule (DevLake supports cron expressions per pipeline)
Put Grafana behind your SSO if engineers will use it directly

# Example Helm values
devlake:
  database:
    externalUrl: 'mysql://devlake:password@mysql.internal:3306/devlake'
  grafana:
    enabled: true
    ingress:
      enabled: true
      host: devlake.internal.your-company.com

The Bottom Line

DORA metrics are only useful if they’re accurate, and they’re only accurate if they’re automated. Manual measurement introduces bias, gaps, and the temptation to game the numbers.

DevLake takes about half a day to set up properly. After that, the data is continuous, consistent, and requires no human effort to maintain. The hard part isn’t the tooling — it’s deciding what your deployment event actually is and making sure every production change goes through a path that DevLake can see.

Get that right, and you have an honest baseline. From a baseline, you can improve. From vibes, you can’t.