Festi Market

Teaching AI not to hallucinate

 

Scaling B2B product content with AI while eliminating hallucinations through iterative prompt engineering. A case study in teaching language models not to invent plausible-sounding lies.

100

Products

8

Iterations

~5%

Error Rate

After 8 iterations

The final prompt

We’re adapting products for a Shopify site. Style: stay professional—this is B2B. What readers want is reliable information in a professional tone. Before creating any output, use web_search to verify every festi-market.com URL with site:festi-market.com searches. Never guess or assume a URL exists. If it doesn’t exist, remove the entire related product mention, not just the link.

You can suggest better titles, but remove no information. Adapt English text to French. Add nothing, invent nothing. Never create logo URLs—find real ones or omit them. If you can’t access a product page, notify me instead of inventing content. Don’t add labels like “optimized description” or “(bullet points)”—they’re not in the template. No emojis. This is B2B.

The template structure is invariant: H1 title, H2 subtitle, brand with verified URL only, description, main features (H3), key benefits (H3), related products with verified links (H3), financing section (H3), technical specifications table (H3) with power/voltage at top, dimensions/weight at bottom, “1 year parts warranty” always last.

Before finalizing, verify no invented links exist, meta tags have no labels, and all links have been tested via web_search—finding them in results isn’t enough. Meta elements: first line is meta-title, second line is meta-description, third line is URL anchor. No other text, no labels like “Meta-title:”.

What we did

Learning from failures, not avoiding them

Product one looked perfect until we spotted the logo URL. The AI had invented logo-neumarker-professional.png—following the site’s naming conventions so perfectly it looked real. Product two had emojis scattered throughout B2B documentation. Product seven invented collection links that pointed nowhere. Product twelve shortened technical titles, removing the exact specifications buyers search for.

We didn’t try to anticipate every failure upfront. Each mistake became a new rule. “Never invent logo URLs” got added after product one. “Preserve complete titles” came from product twelve. By product fifty, we had eight pages of defense mechanisms built from actual failures, not hypothetical problems. The prompt evolved into a forensic record of how the AI actually misbehaved.

Building verification into generation

The breakthrough was making the AI verify before claiming anything as true. Before any URL went into a product page, it had to search for it. Before listing specs, it had to fetch the manufacturer’s page. This wasn’t “generate then verify”—it was “verify while generating.”

We implemented mandatory web_search for every collection link. The AI couldn’t claim /collections/professional-griddles existed without searching site:festi-market.com first. For manufacturer data, web_fetch retrieved actual pages, constraining the AI to documented information rather than plausible guesses. The tradeoff was speed—each product took longer. But we eliminated thirty minutes of post-generation fact-checking per product.

Treating templates as code

The client’s HTML template was precise. Table heights: exactly 19.5938px per row. Specs order: power first, warranty last. Financing section: word-for-word, no paraphrasing. Early on, the AI would “improve” things—adjust heights for better spacing, reorder specs by importance, rephrase copy for clarity. Each change broke something.

We learned to treat the template like code, not guidelines. Not “follow the general structure” but “this is the exact output.” That rigidity seemed excessive—specifications like “height must equal 19.5938px times row count” felt pedantic. Until all 100 products uploaded to Shopify without a single formatting issue. Perfect adherence beat approximate matching every time.

Compound learning through context

All 100 products were generated in a single Claude Project. Each correction—”don’t invent URLs,” “verify links,” “preserve titles”—propagated forward. Pattern recognition emerged across products. When the AI consistently struggled with items lacking certain specs, we added explicit handling: “If the manufacturer page doesn’t list dimensions, omit that table row rather than guessing.”

Product fifty was generated with rules learned from products one through forty-nine. Product one hundred incorporated every correction made throughout the project. This compound learning made later products cleaner than earlier ones. The minimal client corrections validated that accumulated rules had captured actual requirements, not just our assumptions about what might go wrong.