SafeDeployment
Getting AI deployment wrong in a defence and infrastructure context has real commercial and reputational consequences. This final module gives you a complete, practical governance framework — how to choose your first pilot, how to stage the rollout, what data controls to put in place, how to manage your team through the change, and how to keep improving once agents are running.
The three deployment mistakes that sink AI pilots
Most AI pilots that fail don't fail because the technology doesn't work — they fail because of how they were deployed. The same three mistakes appear repeatedly across industries. Normoyle can avoid all of them.
Giving agents permission to send emails, update registers, or issue documents before their accuracy has been verified on real work. The agent makes one bad call — an incorrect RFI, a wrong cost — and trust collapses across the whole team. Fix: start with read and draft only. Earn permissions progressively.
Deploying an agent without designating a specific person who owns its output. When something goes wrong, everyone assumes someone else reviewed it. Fix: every agent output has a named human owner who reviews and signs. No exceptions.
Using a free consumer AI tool for confidential project data — client names, pricing, defence specifications — without understanding where that data goes. Fix: match the tool to the data sensitivity. Consumer tools for non-confidential work only. API with data agreement for everything else.
Choosing the right first pilot
The first agent you deploy at Normoyle will shape the team's attitude toward all future AI tools. Get it right and you build confidence. Get it wrong — by picking something too complex, too sensitive, or too hard to verify — and you set the programme back months.
Use this scorecard to evaluate any proposed pilot. The higher the score, the better the candidate:
| Criterion | Score 1 — Poor fit | Score 3 — Good fit | Estimating pilot score |
|---|---|---|---|
| Frequency — how often does this task occur? | Once or twice a year | Weekly or more | 3 — multiple RFQs per week |
| Verifiability — can you check the output against a known answer? | Subjective — hard to say if right or wrong | Objective — past results to compare against | 3 — dozens of past quotes to test against |
| Inputs — how structured are the inputs? | Ambiguous, conversational, highly variable | Structured documents with consistent format | 3 — PDFs and DXFs with defined content |
| Consequence of error — what happens if the agent gets it wrong? | Immediate external consequence — safety, legal, client impact | Internal draft — caught in review before any external impact | 3 — estimator reviews before quote leaves business |
| Data sensitivity — what data does the agent touch? | Classified, legally privileged, or highly confidential | Internal commercial data with standard controls | 2 — cost rates are sensitive but manageable |
| Team readiness — is the relevant team willing to try? | Strong resistance — people feel threatened | Curious and open — at least one champion | 2 — mixed initially, but one estimator keen to try |
The estimating pilot scores 16 out of 18 — an excellent first pilot. Compliance documentation scores 14–15. Start with estimating; move to compliance once the first agent is proven.
The four-stage pilot framework
Every new agent at Normoyle follows four stages before full deployment. This isn't bureaucracy — it's how you build the evidence base that justifies expanding the agent's permissions and the team's trust in its output.
- Stage 1 — Read only (weeks 1–2) The agent reads documents and reports what it finds. No drafting, no writing to registers, no output that anyone acts on. The purpose is purely diagnostic: does the agent correctly understand your drawings, your cost database format, your register structure? Run it against 5–10 past documents and compare its extraction to the known correct answers. Success metric: >90% accuracy on information extraction.
- Stage 2 — Draft with 100% review (weeks 3–8) The agent produces drafts. A designated reviewer checks every single output before it's used for anything — even internal purposes. The reviewer logs: accepted as-is, accepted with minor edits, accepted with major edits, or rejected. This log is your evidence base. Success metric: >90% of outputs accepted with minor or no edits over at least 4 weeks.
- Stage 3 — Draft with spot-check review (weeks 9–16) Review rate drops to 20–30% of outputs, selected randomly. The remainder gets a quick human scan — not a full check — before use. Only move to Stage 3 after Stage 2 success metrics are met. Continue logging every error found. Success metric: error rate below 5% on spot-checked outputs.
- Stage 4 — Full deployment with audit trail The agent runs as a standard business tool. Every output is logged with a timestamp, the prompt version used, and the reviewing person. Monthly audit: the agent owner reviews a random sample of 10 outputs for quality and consistency. Immediate rollback protocol if error rate rises above threshold. Success metric: sustained <5% error rate; no undetected errors reaching clients.
Before any agent goes into Stage 3 or 4, define in writing: what error rate triggers a rollback to the previous stage? Who makes that call? What does rollback look like in practice? A rollback is not a failure — it's the system working. The failure is discovering errors after they've left the business.
Data security rules for Normoyle
Not all AI tools handle data the same way. The wrong tool for the wrong data is one of the most common and most serious mistakes in AI deployment. This table defines Normoyle's rules — it should be shared with every team member who uses any AI tool for work.
| Data type | Consumer tools (ChatGPT free, Claude.ai free) | Business API tools (Claude Team/Business, API) | On-premises only |
|---|---|---|---|
| General engineering knowledge Standards explanations, general drafting help |
PERMITTED | PERMITTED | Not required |
| Anonymised drawings Client name and project removed |
PERMITTED | PERMITTED | Not required |
| Internal cost rates and margins | NOT PERMITTED | PERMITTED — with data processing agreement | Optional additional protection |
| Client names and live project details | NOT PERMITTED | PERMITTED — with data processing agreement | Not required for standard projects |
| Supplier pricing from NDAs | NOT PERMITTED | PERMITTED — check NDA terms first | Preferred for high-sensitivity pricing |
| Mill certificates and traceability data | NOT PERMITTED | PERMITTED — internal tools only | Required for some defence programmes |
| Defence project specifications | NOT PERMITTED | CHECK CLASSIFICATION FIRST | REQUIRED if classified |
| Personnel and HR data | NOT PERMITTED | NOT PERMITTED | NOT IN SCOPE — ever |
Anthropic's Claude Team and Business plans include a data processing agreement (DPA) — a contractual commitment that your data won't be used to train their models and will be handled according to defined security standards. This is what makes it appropriate for confidential commercial data. The free consumer tier of Claude.ai and ChatGPT do not include a DPA. If you're unsure which tier Normoyle is on, check with management before uploading any confidential data.
Human-in-the-loop: the review gates
Every agent workflow at Normoyle must have defined review gates — specific points where a human checks the output before it proceeds. These gates are not optional and don't get removed as agents improve. They are the mechanism by which Normoyle maintains professional accountability for every document it produces.
GATE 1 — Internal quality check Trigger: Agent produces any output (quote, NCR, RFI, report) Who: Designated reviewer for that agent type Estimating: lead estimator Compliance: project engineer Delivery docs: PM or senior project engineer Check: Does the output make sense? Are flagged items genuine issues? Are numbers, references, and dates correct? Time: 5–20 min depending on output complexity Action: Accept / edit / reject and regenerate GATE 2 — Before any external communication Trigger: Any document or email leaving Normoyle Who: Project engineer or PM (never delegated to junior) Check: Full content review — professional judgement applied Tone appropriate for recipient and relationship? No admissions, commitments, or liability statements? Drawing references correct revision? Programme impact statements accurate? Time: Same as you would spend writing it from scratch "The agent did it" is not a reason to review faster Action: Approve and send / edit and send / do not send GATE 3 — Before expanding agent permissions Trigger: Request to give agent new capability (e.g. send emails, update live register, new data type) Who: Business owner + whoever manages IT/security Check: Stage 2 accuracy metrics met? Is the new permission genuinely necessary? What could go wrong? What is the rollback? Has the system prompt been updated and retested? Time: Formal meeting — minimum 30 min Document the decision and the conditions GATE 4 — Monthly performance audit Trigger: Monthly (calendar reminder — don't skip) Who: Named agent owner for each deployed agent Check: Random sample of 10 outputs from the past month Any errors that weren't caught in review? Any pattern of near-misses or recurring issues? Has the system prompt drifted from the approved version? Any new risk factors (new project types, new clients)? Time: 1–2 hours per agent per month Output: One-page audit note filed in the quality system
Managing the team through the change
Technology adoption fails most often not because the technology doesn't work — but because the people using it don't trust it, don't understand it, or feel threatened by it. At Normoyle, the change management approach is as important as the technical deployment.
The two fears — and how to answer them honestly
When you introduce AI agents at Normoyle, two concerns will come up. Don't dismiss them — they're legitimate. Address them directly:
The honest answer: For the tasks Normoyle is automating — data extraction, template filling, document drafting — the agent handles the mechanical work. The skilled work — the judgement calls, the client relationships, the engineering decisions — stays with people. What changes is that skilled people spend more time on skilled work. That's the point.
The honest answer: It will get things wrong, especially early. That's exactly why every output goes through a review gate before it's used. The agent's errors get caught. The review process is the safety net — and it's non-negotiable regardless of how good the agent gets.
What to say to each part of the team
The estimating agent handles the part of quoting that takes hours but doesn't need your expertise — reading drawings, building the BOM, looking up rates. Your time moves to reviewing the flagged items, applying your experience to the hard calls, and focusing on the bids most worth winning. The agent makes you more productive at the work only you can do. Your sign-off on every quote means your professional judgement is still the last word.
You remain the professional owner of every document that leaves this business. The agent drafts — you decide, edit, and sign. Your NER registration, your professional responsibility, and your relationship with the client are unchanged. What changes is that you're not spending three hours writing an RFI from a blank template. You spend 10 minutes reviewing a well-structured draft and applying your judgement to the parts that need it.
The agents operate in the office — on drawings, documents, and procurement data. The fabrication work, the welding, the installation — that's all unchanged. What you might notice is that the office team has more time to resolve issues quickly, chase deliveries more proactively, and get you the information you need on site faster.
Normoyle's recommended rollout plan
This is a concrete 12-month plan. Adjust timing based on actual Stage 2 results — don't advance before the success metrics are met, but also don't stay in earlier stages longer than necessary once they are.
| Period | Agent | Stage | Success metric to advance |
|---|---|---|---|
| Months 1–2 | Estimating agent | Stage 1–2: Read only, then 100% draft review | >90% BOM accuracy on test quotes; >90% drafts accepted with minor edits |
| Months 3–4 | Estimating agent | Stage 3: Spot-check review on standard job types | <5% error rate on spot-checked outputs; no errors reaching clients |
| Month 3 | Compliance agent | Stage 1–2: Read only on a current project (non-classified) | Findings match manual review; no missed non-conformances in testing |
| Months 5–6 | Compliance agent | Stage 2–3: Draft NCRs and checklists with full review | >90% of drafted NCRs accepted with minor edits |
| Month 5 | RFI drafting agent | Stage 1–2: Start on one active project with one PM champion | PM reports time saving; no RFIs sent with errors |
| Months 7–9 | Procurement agent | Stage 1–2: Read PO register, daily status report, draft emails | PM uses report daily; draft emails require minimal editing |
| Months 10–12 | All agents | Stage 3–4: Full deployment on standard work types | Monthly audit shows sustained accuracy; team uses agents without friction |
Keeping agents consistent: prompt version control
As agents are used in production, their system prompts will be edited — rules added, constraints clarified, output formats refined. Without version control, a single bad edit can silently degrade an agent that was working well, and you won't know until errors appear in output.
- Keep a dated prompt log for every agent A simple text file or shared document: date, version number, what changed, and why. Before editing any system prompt, copy the current version to the log. This takes 2 minutes and means you can always roll back.
- Test before deploying any prompt change Run the updated prompt against at least 3 past test cases before using it on live work. A prompt change that fixes one problem sometimes breaks another. Never deploy an untested prompt on a live project document.
- One person owns each agent's prompt The "agent owner" is the only person who edits the system prompt. Others can request changes — but the owner reviews, tests, and deploys. Without clear ownership, prompts get edited by multiple people and drift into inconsistency.
- Document the approved prompt version in your QMS For compliance and defence work especially: the version of the system prompt used to produce a document should be recorded alongside the document itself. This is your audit trail if a client or auditor questions how a document was produced.
Measuring what matters
AI deployment at Normoyle should show measurable results. Track these metrics from week one so you have evidence of impact — both for internal confidence and to justify continued investment.
Track estimator time on each quote before and after the agent. Target: 70% reduction in data-extraction and drafting time. Measure from drawing receipt to draft-ready-for-review.
Percentage of agent drafts accepted with minor or no edits. Target: >90% in Stage 2 before advancing. Track by agent type — estimating, compliance, RFI — separately.
Errors that got through review and reached a client or external document. Target: zero. Any escape is a serious event requiring immediate prompt review and process debrief.
Number of POs past due date at any given time. The procurement agent should drive this down by catching at-risk items earlier. Track weekly — the trend matters as much as the number.
Time from NCR raised to NCR closed. The compliance agent speeds up the raising and documentation — but close-out still requires engineering resolution. Track to ensure agents aren't creating a backlog.
How many team members are actively using each agent tool after 3 months? Voluntary adoption is the real measure of success. If people aren't using it without being told to, something is wrong — with the tool, the training, or the change management.
Hands-on exercise: your deployment plan
This is the capstone exercise for the course. Each person or small group produces a one-page deployment plan for the estimating agent pilot. Bring it to the final session for group review.
- Name your agent owner Who at Normoyle will own the estimating agent? They write and maintain the system prompt, run the monthly audit, and are the escalation point if something goes wrong. Write down the name.
- Define your Stage 1 test set List 5 past quotes you'll use to test the agent before it touches live work. Choose a range: two simple jobs, two medium complexity, one complex. Write down the job reference and what you'll measure.
- Set your Stage 2 success threshold What acceptance rate do you need to see before moving to spot-check review? (Recommended: 90%.) What does "accepted with minor edits" mean in your context? Write it down so there's no ambiguity.
- Define your rollback trigger At what error rate will you roll back from Stage 3 to Stage 2? Who makes that call? How quickly? Write it down.
- Identify your data controls Which Claude plan will Normoyle use for the estimating agent? Is a data processing agreement in place? Which data types will the agent touch — and are they all permitted under the rules in this module?
- Write three sentences for your team Using the change management guidance above, write what you'll say to: (a) the estimator who'll use the agent, (b) the PM who signs off quotes, and (c) any team member who asks "will this replace my job?"