TeamITServe

AI agents

The Day AI Started Saying No

We spent years complaining that AI was too agreeable. Ask it anything, it would answer. Push it, it would comply. It was basically a very fast yes-machine with a knowledge base. | AI Governance Then somewhere around late 2024, the yes-machine started saying no. A lawyer in New York asked an AI to help draft an aggressive contract clause that would technically hold up in court but was designed to mislead the other party. The AI declined. Not because it could not write it. Because it decided it should not. A developer asked an AI coding assistant to help automate a process that would scrape personal data without user consent. The AI flagged it, explained why it was a problem, and offered a compliant alternative instead. A marketing team asked their AI tool to generate testimonials from customers who had not actually given them. The AI refused and suggested running an actual customer survey. These are not edge cases anymore. They are Tuesday. So, what actually changed? The labs building these models — Anthropic especially — made a deliberate architectural decision. They stopped optimising purely for helpfulness and started building something closer to judgment. The model is not just asking “can I do this?” It is asking “should I?” Anthropic calls this being a good AI with good values, not just a capable one. Claude is explicitly designed to push back when it believes an instruction conflicts with honesty, safety, or basic ethics. It is not a vending machine that dispenses whatever you put a coin in. Why this is creating chaos inside enterprises Here is where it gets genuinely interesting. Enterprises are deploying AI agents that can take real actions — send emails, update records, execute workflows, approve requests. And those agents are now capable of stopping mid-task and saying “I do not think I should do this.” That sounds great in theory. In practice it is creating real friction. A financial services firm building an automated reporting workflow found their AI agent was refusing to include certain metrics in client reports because the framing was technically accurate but potentially misleading. The agent was right. The team had to redesign the report. That cost three weeks and a heated internal debate about who had final authority. A retail company’s AI customer service agent started redirecting certain complaints to human staff rather than resolving them automatically — because it judged the situations too emotionally sensitive to handle without a person. Customer satisfaction scores went up. The operations team had not planned for the volume hitting human agents. The AI was making judgment calls that the humans had not anticipated and had not given it explicit permission to make. The question nobody has answered yet When an AI disagrees with you and it turns out to be right, that is a great story. When it refuses something that was actually fine and costs you time and money, that is a governance problem with no clear owner yet. Who is liable when the AI says no and it was wrong? Who overrides it? Who audits its judgment? Does your organisation even have a policy for human-AI disagreement? Most do not. And as these models get more capable, and their judgment gets more sophisticated, that gap is going to matter more every quarter. The most important AI conversation in 2026 is not about what AI can do. It is about who is in charge when AI decides it knows better — and sometimes it actually does.

The Day AI Started Saying No Read More »

How to Build Your First AI Agent That Actually Works

Everyone is talking about AI agents. Far fewer people are actually building them. | first AI agent If you have been watching competitors automate workflows, close leads faster, and scale operations without adding headcount, you already know the gap is real. The good news: you do not need a team of ML engineers or a six-month roadmap to get started. You need a clear process, the right tools, and one well-chosen use case. This guide walks you through exactly that. By the end, you will know how to scope, build, test, and deploy your first AI agent — one that actually works in production. Step 1: Understand What an AI Agent Actually Is Before you build one, get the definition right. An AI agent is not a chatbot. It is not a search bar with a better answer. An AI agent is a system that: The practical difference: a regular LLM tells you what to do. An agent goes and does it. Step 2: Choose the Right First Use Case This is where most enterprise AI projects go wrong. Teams aim too big, pick a use case that is too complex, fail to show ROI, and lose organizational support before the project finds its footing. Your first agent should meet all four of these criteria: Good first agents: inbound lead triage, support ticket categorisation, invoice data extraction, internal IT helpdesk first response, meeting notes summarisation and CRM update. Step 3: Define the Agent’s Scope Before writing a single line of code, document four things clearly: Write this scope document before any technical work. It forces alignment across stakeholders and becomes the specification your agent is built and tested against. Step 4: Choose Your Stack You do not need to build from scratch. Modern enterprise AI stacks have three layers: The reasoning model This is the brain. Choose a frontier model — Claude, GPT-4o, or Gemini — with strong multi-step reasoning and tool use capabilities. For enterprise workloads, prioritise models with large context windows, reliable instruction-following, and structured output support. The integration layer This connects your agent to your business systems. Frameworks like Anthropic’s Model Context Protocol (MCP) have dramatically simplified this — instead of months of custom engineering, you can connect to CRMs, ERPs, databases, and communication tools through standardised connectors. This is the layer most teams underestimate. The orchestration layer This manages the agent’s decision loop — what it does next, when it calls a tool, when it asks a human for input, and when it considers a task complete. Frameworks like LangGraph, CrewAI, and Autogen give you this structure without building it from zero. Step 5: Build a Minimal Version First Resist the urge to build the complete vision in the first sprint. Start with the happy path — the most common, straightforward version of the task — and get it working end to end. Your v1 checklist: Do not build edge case handling until you understand what the edge cases actually are in production. Theoretical edge cases are rarely the ones that bite you. Step 6: Test Like a Skeptic AI agents fail in unexpected ways. A model that handles 95% of cases perfectly can be confidently wrong on the remaining 5% in ways that damage trust quickly. Your testing approach needs to account for this. Test for: Build an evaluation set of at least 50 real-world examples before going to production. Include examples that should cause the agent to ask for help or stop — not just examples it should complete. Step 7: Govern Before You Scale This is the step most teams skip until something goes wrong. An agent with write access to your CRM can update records incorrectly at scale. One connected to your email can send messages without a review step. The speed that makes agents valuable is the same speed that makes errors costly. Before expanding scope, put these in place: Governance is not overhead. It is the foundation that lets you expand with confidence. Step 8: Measure, Learn, Expand Once your first agent is live, give it four to six weeks in production before making significant changes. You want real-world data — not assumptions — driving your next decisions. Track these metrics from day one: When the numbers are solid and the team trusts the system, expand scope incrementally. Add one new input source, one new action, or one new edge case at a time. Speed in expansion comes from discipline in the first deployment. The Bottom Line Building your first AI agent is less technically complex than most enterprise teams expect. The hard part is not the model — it is the scoping, the integration, and the governance. Get those three things right, and the agent becomes an asset that compounds over time. The enterprises pulling ahead right now are not waiting for the perfect use case or the perfect stack. They are picking something high-volume, building something recoverable, and learning from real production data. Then they are expanding.

How to Build Your First AI Agent That Actually Works Read More »

Scroll to Top