UALR Chatbot – How data engineering and careful curation is the most boring and successful lever.”

The Dispatch

Marla Johnson recruited me for the role at Tech Launch, during the summer of 2025. The task: fix their retrieval-augmented chatbot. My mission: diagnose the issues, implement solutions, and get that accuracy soaring above 80%-ish.

Garbage In, Garbage Out—Shocking, I Know

First incision: the data. An audit revealed that the ingestion pipeline indiscriminately harvested undergraduate-facing pages, introducing lexical priors misaligned with the graduate-student information need. I engineered a domain-specific scraper that enforced graduate-level URL patterns, applied regex-based content filters, and re-indexed the corpus, reducing irrelevant tokens by 50%.

Chunking

Next pathology: chunk salad. The existing segmentation strategy produced atomic passages devoid of discourse context, yielding low-recall retrieval. We implemented a semantically coherent chunking policy—merging adjacent paragraphs, preserving section headings, and injecting manually curated metadata—improving answer-level BLEU by 18 % and user-reported satisfaction from 2.1 → 4.6 / 5.

Diagnosed data ingestion pipeline; evicted undergrad noise.
Built grad-student-only scraper—think VIP list, but for knowledge.
Redesigned chunking & annotation schema; context no longer MIA.
Shipped a bot that finally answers questions without sounding like it skipped med school.