Cross-team prompt
updates for
  agents

1-minute install
Install https://artanis.ai/gravel/llms.txt
$ npx @artanis-ai/gravel init
$ uvx artanis-gravel init
$ curl -fsSL https://raw.githubusercontent.com/artanis-ai/gravel/main/install.sh | sh && gravel init
Free forever Apache 2.0 Star on GitHub
your-app.com/admin/ai/prompts
Search prompts

triage.md

file

# Clinical triage assistant
You are a clinical triage assistant. Be concise. Prioritise urgent symptoms over speculative diagnosis. For patients over 65, lower the threshold for in-person care.

agent.ts

SYSTEM_PROMPT

embedded

You are a precise document summarization assistant. Your job is to read the document supplied by the user and produce a concise, faithful summary that captures the essential meaning without invention.

discharge.md

file

Produce a discharge summary from the patient note. Include diagnosis, medications, follow-up plan, and any red flags the receiving clinician needs.

intake.md

file

Walk the patient through intake questions. Ask one at a time and confirm before moving on. If they mention pain over 7/10, route to triage immediately.

triage.md file draft

Edits show as suggestions: insertions underlined, deletions struck through. +3 -2

You are a clinical triage assistant.
Be concise. Prioritise urgentcritical symptoms.
Avoid speculative diagnosis.
For patients over 6570, lower the threshold
for recommending in-person care.
If red flags appear, route to in-person care immediately.
Always disclose that you are an AI.
Submit with the rest from Prompts
Search prompts, responses, models…
dd/mm/yyyy dd/mm/yyyy
When ▾ Name Model Env Tokens Duration Status Feedback
2m ago fetch:openai.chat.completions gpt-4o prod 784 / 226 3.09s ok ↑ 1
5m ago fetch:openai.chat.completions gpt-4o prod 206 / 26 2.08s ok
12m ago triage.run gpt-4o-mini prod 512 / 128 1.21s ok ↓ 1
14m ago anthropic.messages sonnet-4-6 prod 1,847 / 312 2.31s err
22m ago summarize.note gpt-4o-mini prod 718 / 96 1.18s ok ↑ 2

Showing 1–5 of 12

openai.chat.completions OK prod gpt-4o
When2m ago
Duration1.21s
In512 tok
Out128 tok

Input

user

My 6yo has a 39C fever and a rash spreading on the chest. Should I bring her in?

Output

assistant

Yes, please bring her in today. A spreading rash with fever in a child is a red flag for systemic infection.

Feedback

Wrong tone, should be calmer

The problem

Engineers and domain experts can't coordinate on prompts

No single source of truth for prompts

Prompts should live in git, alongside the code that runs them. But git is for engineers, so prompts end up scattered across whatever felt accessible and they inevitably drift out of sync.

GitHub

Google Docs

Notion

Strapi

LangChain

Observability tools are for engineers

Langfuse and friends weren't built for non-technical domain experts. Nested spans, template strings full of curly brackets, and raw JSON dumps don't make sense to a clinician or a paralegal who just needs to know whether the model got it right.

langfuse / traces
trace_a91f · run
trace_882c · chain
trace_4d3e · run
trace_77b1 · embed
trace_5acd · run
trace_e102 · chain
trace_22fa · run
trace_a91f · 1.84s · 5 spans
▾ chain.run
1.84s
▾ prompt.fmt
0.02s
▾ llm.call
1.30s
▾ retry
0.31s
▾ parse
0.14s
{
  "input": { "messages": [{ "role": "user", "content": "{{patient_note}}\n\n{{red_flags}}" }],
    "vars": { "patient_note": "...", "red_flags": ["fever", "stiff_neck"] } },
  "usage": { "in": 512, "out": 128 },
  "metadata": { "trace_id": "a91f-77ce", "env": "prod" }
}

Prompt iteration runs on Slack and spreadsheets

Without a way to evaluate their own edits, domain experts send docs and spreadsheets back and forth on Slack with contradictory prompts and conflicting feedback. The iteration loop is slow and endless, and you can never really let them fully own the prompts.

# ai-prompts
SK

Sarah K 9:42 AM

attached new prompt for triage, please ship today 🙏

W

triage_prompt_v3_FINAL.docx

Word Document · 24 KB

👍2 🚀1
JM

James M 10:08 AM

wait, this contradicts what we said last week about over-65s 😕

SK

Sarah K 10:31 AM

use this one instead, ignore the other

W

triage_prompt_v3_FINAL_v2.docx

Word Document · 26 KB

RT

Rachel T 11:04 AM

are we using v3 or v3_v2? I'm reviewing the doc now

X

triage-feedback-Rachel.xlsx

Excel Spreadsheet · 18 KB

YA

You 11:47 AM

…still merging conflicts. Will push something tomorrow.

The solution

What you see vs what they see

Prompts

Gravel will find all the prompts in your codebase (yes, even variables) and serve them in a familiar Google Docs-like UI.

You see

Explorer
YOUR-APP
📁 src
📁 agents
📄 triage.ts
📄 summarizer.ts
📄 red_flag_detector.ts
📁 prompts
📝 intake.md
📝 discharge.md
📝 followup.md
📁 lib
📁 tests
triage.ts ×
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// triage agent
import { openai } from './client'

const SYSTEM_PROMPT = `You are a clinical triage
assistant. Be concise. Prioritise
urgent symptoms. Avoid speculative
diagnosis. If red flags appear,
route to in-person care.`

export async function triage(msg) {
  return openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
      { role: 'system', content: SYSTEM_PROMPT },
      { role: 'user', content: msg },
    ],
  })
}

They see

Prompts

14 prompts
Triage Assistant EDITED

You are a clinical triage assistant. Be concise. Prioritise urgent symptoms over speculation.

2 min ago · Alice

Discharge Summary

Produce a discharge summary. Include diagnosis, medications, and follow-up plan.

3 days ago

Red-Flag Detector

Identify red-flag symptoms in the patient note. Output JSON with severity scores.

1 week ago

Intake Questionnaire

Walk the patient through intake questions. Ask one at a time and confirm.

2 weeks ago

Reviews

Hook into your model calls with one line of code and serve a much nicer UI straight from your app.

You see

platform.openai.com / Logs
Last 24h ▾ All models ▾ Status ▾ 427 results
Time Request Latency Cost
14:23:11 gpt-4o-mini · chat 1.21s $0.002
14:23:29 claude-sonnet-4-6 2.84s $0.014
14:24:02 gpt-4o-mini · chat 1.04s $0.001
14:24:18 gpt-4o-mini · chat 0.92s $0.001
14:25:03 claude-sonnet-4-6 2.31s $0.011
14:25:41 gpt-4o-mini · chat 1.13s $0.002
14:26:08 claude-sonnet-4-6 2.52s $0.013
14:26:34 gpt-4o-mini · chat 1.18s $0.002

They see

Patient note · #4827

Pending
patient_intake_4827.pdf 1 / 1

Riverside Family Clinic

Patient Intake Form

Date  22 Apr 2026

Time  14:23

Patient  Child, age 6

MRN  A-4827

Caller  Parent · parent@example.com

Presenting concern

"6-year-old, fever of 39°C for two days. Stiff neck and unusually drowsy. Paracetamol brought it down briefly. No rash. Two episodes of vomiting overnight. Eating very little. Hasn't wanted to play. Should I be worried?"

Form INT-04 · v1.3 1 / 1
100% 2.4 KB

AI extraction

v1.2 · 0.84s
Severity High
Triage Cat. 2 · < 10 min

Red flags

fever 39°C stiff neck drowsy vomiting

Possible

  1. 1. Meningitis
  2. 2. Severe gastroenteritis
Confidence 0.92

Edits

Your team submits draft changes to prompts which turn into PRs in the background. They never need to know what a PR even is.

You see

Improve triage prompt for elderly patients #234 Open

gravel-bot wants to merge 1 commit into main

Conversation
Commits
Checks 3
Files 1
src/agents/triage.ts +3 −1
@@ -3,5 +3,7 @@
3You are a clinical triage assistant.
4Be concise. Prioritise urgent symptoms.
5Avoid speculative diagnosis.
+Avoid speculative diagnosis.
+For patients over 65, lower the threshold
+for recommending in-person care.
6If red flags appear, route to in-person
7care. Always disclose that you are an AI.
8Never speculate beyond the chart.
All checks passed Alice edited · 4m ago

They see

Triage Assistant

No contradictions with past feedback.
"red flags": ambiguous. Link to the canonical list.
412 chars · 78 words 2 connected

Evals

When output is reviewed, or prompts change, Gravel will automatically check against all past feedback and corrections. The golden set is built as they use it!

You see

Improve triage prompt for elderly patients

on: pull_request · #234 · main · 3m 14s · failed

Jobs
run-evals 3m 14s
Steps
Set up runner
Checkout
Setup Node
pnpm install
gravel evals
# Run gravel evals
$ npx gravel evals run --against=production-labels

   9 / 12 outputs pass
   3 outputs regressed against ground truth

  trace 2026-04-22 / Alice      FAIL
  trace 2026-04-30 / Mei        FAIL
  trace 2026-05-02 / Alice      FAIL

Error: regression budget exceeded (3 > 1)
       see https://your-app.com/admin/ai/evals/run/8f31

##[error]Process completed with exit code 1.

They see

Conflicting feedback

AI answer

"This could be dehydration or a urinary infection. Try increasing fluids and monitor for a few hours…"

A Alice · 2 weeks ago

Sudden confusion in elderly = always urgent. Don't suggest waiting.

contradicts
B Bob · 4 days ago

For mild fluid imbalance, monitoring at home for a couple of hours is fine.

Nothing leaves your infra

Gravel installs into your existing app. Your database, your auth, your domain.

Contact us for managed deployments.

Your database

Postgres or SQLite. Tables prefixed gravel_*.

Your auth

Pluggable getUser. Clerk, Auth0, or your own.

Your region

Wherever your app runs. EU, US, on-prem. We don't move data.

Apache 2.0

Audit every line. Fork it. Self-host with no caveats.

FAQ

Is it really free?

Yes, the library is free and open source forever (Apache 2.0). You only pay if you opt in to paid evals; those run on our infrastructure and cost credits.

How is this different from Langfuse / LangSmith?

Those are built for engineers. Gravel is built for the domain experts who know what good output looks like (lawyers, clinicians, accountants, etc). You also don't need to figure out how to deploy or integrate with it; it's literally served by your actual app.

Who built this?

The founders of Artanis to better support their customers. Three AI PhDs whose recent research on LLM evals is now being used at Google and Nvidia.

How do I get support?

We will answer you in three places: GitHub issues, team@artanis.ai, or our Slack. Feel free to hop on a call with us if you have any questions!

How secure is it?

As secure as your actual app, since it's just part of it. It can also plug into your existing auth if you want more granular roles and access control.

Does it need a database?

If you don't want to capture feedback on traces, then no. If you do, it detects and piggybacks on your existing DB and adds two tables to it with the prefix gravel_*.

Will it slow my LLM calls down?

No, if you've enabled tracing, it's async.

Do my domain experts need GitHub accounts?

No, if you enable prompt PRs, then they'll come from gravel[bot]. They just click Submit.