Research initiative
Rule-Before-Train AI Safety
"Can rule-based scaffolding give us safer AI than post-hoc alignment?"
Modern AI safety is largely retrofitted — we train models on enormous corpora and then apply RLHF, content filters, and prompt engineering to constrain behaviour after the fact. Rule-Before-Train inverts that: define explicit behavioural rules first, ship them as a working rule-based agent, and let machine-learning components extend that scaffold rather than replace it.
Current findings
- A working rule-based conversational engine is achievable in modest engineering time (Nyx AI ships a C++20 rule engine with normalised token matching, JSON-driven intents, and a scenario-based eval harness).
- Rule-based scaffolding gives operators an explicit, auditable spec for what an agent will and won't do — a property RLHF-aligned models cannot offer.
- The hard problem isn't building the engine; it's authoring high-quality intents that cover the long tail without rule explosion.
What we want to achieve
- Reach feature parity with simple LLM chat for a constrained domain (e.g., research assistance) — without invoking a hosted model.
- Add narrow ML components (intent classification, slot extraction) that augment the rule scaffold rather than replace it.
- Publish the eval suite so others can compare rule-first vs. train-first approaches on consistent benchmarks.
The question, restated
Every shipping AI assistant today follows the same pattern: train a large language model on broad corpora, then patch its behaviour with reinforcement learning from human feedback, content filters, and prompt engineering. The behaviour you get is emergent — a side effect of the training pipeline. The constraints you apply are retrofitted — applied after the model already knows everything.
Rule-Before-Train inverts that. The agent’s behaviour starts with explicit, human-readable rules. Machine learning extends that scaffold — for intent classification, for slot extraction, for fluency — but never replaces it.
Why this is worth doing
Three reasons, in increasing order of importance.
- Auditability. A rule-based agent’s behaviour is explicit. You can read what it will do. You cannot do that with a 70-billion-parameter language model.
- Update locality. Fixing a bad behaviour in a rule-based agent is editing a JSON intent file. Fixing the same bad behaviour in a trained model is a retraining cycle.
- Resource economics. A rule-based engine ships in megabytes and runs on a laptop. A trained model that delivers comparable behaviour for a constrained domain costs tens of thousands of dollars to train and serves from a GPU.
What Nyx AI demonstrates
Nyx AI is the working prototype that informs this research. It’s a C++20 rule engine with:
- Normalised token + phrase matching (case- and diacritic-insensitive, configurable stemming).
- JSON-driven intent definitions (text patterns → response templates).
- Lightweight conversation state and persistent memory.
- A scenario-based evaluation harness for regression testing.
- Local organiser stores for notes, tasks, and reminders — so the engine can be useful in a personal-assistant role without invoking any hosted model.
The engine ships today; the research questions it surfaces are about what to do next.
Open questions
- How far can a well-authored rule-based agent go in a constrained domain before the rule-explosion cost dominates?
- What’s the right interface between rules and small narrow ML models (intent classification at 100ms vs. 10ms; slot extraction with confidence thresholds)?
- Can we publish a public eval suite that lets practitioners compare rule-first vs. train-first agents on comparable tasks?
The answers will inform Nyx’s next year of development.