Gen-AI Notice Relevance Model
TL;DR
I led the product effort to design, deliver and operate a Gen-AI notice-relevancy system that automatically detects and hides irrelevant web notices from a high-volume daily feed. The system combines a high-precision spam classifier, an allowlist/blocklist onboarding process, an LLM-based judge for double checks, UX filtering controls and operational automations. By default the system hides roughly 30-40% of irrelevant notices, preserves auditability and gives researchers easy discoverability and undo controls.
The Problem
We monitor ~50k public sources daily to find tax-relevant updates. The crawl + diff pipeline produces many notices every day — a large portion of that volume is noise (site chrome changes, navigation updates, weather text, date tweaks). Manually triaging this noise consumed valuable researcher time and delayed identification of true regulatory changes.
Objective
Remove noise from researcher work queues while preserving visibility and auditability. Enable safe automation with clear controls and easy recovery.
My Remit & Constraints
As Senior Product Manager, I was responsible for delivering a production-grade relevancy feature that:
- •Ships quickly with phased rollouts and measurable acceptance criteria
- •Scales across thousands of sources with a clear human→AI feedback loop
- •Operates safely in regulated workflows (no silent deletion of content)
- •Reduces triage load while keeping rare relevant items recoverable
Approach & Execution
Re-scope and Align Stakeholders
Convened cross-functional cadence — research leads, data scientists, engineering, ops and QA — to document core problems, define acceptance criteria, and agree on operational policies (hide vs retain vs auto-complete). Captured representative noise patterns as a prioritized testbed.
Built a Hybrid Relevance Pipeline
Spam Classifier
LLM-prompted classifier tuned to label notices 100% irrelevant with ~99.9% precision. Production uses GPT-4o-mini; GPT-4.1-mini showed ~15-20% better recall in tests.
Content-Type & Tag Refinement
AI-suggested tags validated by researchers to improve classification precision and reduce label drift.
LLM Judge
A second LLM check that double-checks 100% irrelevant labels and surfaces uncertain cases to researchers.
Designed UX & Researcher Controls
Led design for a conservative, transparent UX: 100%-irrelevant notices hidden by default with toast explanation, double range slider for score visibility control, and filters to reveal or include all items while keeping hidden items discoverable for audit.
Operational Guardrails & Automation
Balanced automation and safety: auto-complete for unassigned 100% irrelevant notices after 2 days, 30-day retention for hidden items, and phased onboarding where new sources are blocklisted until validated.
Phased Rollout & Monitoring
Deployed to UAT, ran targeted reviews, then rolled out to production in phases. Instrumented dashboards for irrelevancy rates, per-source performance and false-positive reports with continuous feedback loops for model refinement.
How It Works — User Flow
Daily Ingest & Scoring
System crawls configured sources, computes diffs, creates notices and assigns relevance/irrelevancy scores.
Inbox Landing
100% irrelevant notices hidden by default. Toast indicates behavior. Default slider range 0-80 shows non-100% items.
Filter Controls
Double range slider lets researchers broaden view (0-100) or view only 100% irrelevant. Relevant items unaffected.
Review & Undo
Hidden items accessible for 30 days. Researchers can recover false positives or feed them back for retraining.
Auto-Care
Unassigned 100% irrelevant items auto-complete after 2 days.
Feedback Loop
Researchers flag misclassifications; examples used to refine prompts, tags and model retraining.
Key UX Decisions
Hidden by Default
100%-irrelevant notices are hidden in the To-Do inbox with a toast explaining why items disappeared.
Double Range Slider
Filters control which irrelevancy scores are visible (default 0-80 so 100% items are excluded).
30-Day Retention
Hidden items remain accessible for audit and recovery before permanent clearance.
Auto-Complete
Unassigned 100% irrelevant notices automatically marked completed after 2 days.
Outcomes & Impact
- •Inbox reduction: System hides roughly 30-40% of irrelevant notices by default, substantially reducing triage workload.
- •Faster sourcing: Researchers spend less time reviewing noise and more on high-value content discovery.
- •Lower manual churn: Auto-complete and hide-by-default behavior reduced repetitive manual actions.
- •Trust & auditability: Retention & discoverability rules preserved visibility and allowed confident automation.
Challenges & Trade-offs
False Positives Risk
Hiding items risks missing rare relevant notices. Mitigated by retention, filters, and judge processes.
Definition Ambiguity
Relevance judgments can be subjective. Created labeled examples and aligned stakeholders for consistency.
Model Drift
Source changes required monitoring and a human review cadence to detect drift early.
Automation vs Transparency
Chose conservative defaults (hide, not delete) to build trust before expanding automation.
Key Learnings
Curate explicit examples early. Representative noise samples accelerate prompt & model tuning.
Hybrid rules + ML is pragmatic. Deterministic rules handle known noise, ML handles fuzzy cases.
Provide discoverability & undo. Users must be able to find and recover hidden items for trust.
Operational policies matter. Retention, auto-complete and exportability enable safe automation in regulated contexts.
The product includes the feedback loop. Continuous retraining and tag curation are part of the shipped product, not an afterthought.