How AI Clustering Turns 500 Feedback Items into 12 Actionable Groups
We use vector embeddings and cosine similarity to automatically group similar feedback. Here's how the pipeline works — from raw text to prioritized clusters with sentiment analysis.
The problem: feedback overload
Your app has 500 feedback items. Some are about dark mode. Others mention "night theme" or "dark color scheme." They're all the same request — but without reading every single one, you'd never know. Manual categorization doesn't scale.
Vector embeddings: turning text into numbers
When a user submits feedback, we send the text to Voyage AI(voyage-3-lite model) which returns a 1024-dimensional vector. Each dimension captures a semantic aspect of the text. "Add dark mode" and "night theme option" produce vectors that are very close in this high-dimensional space, even though the words are different.
We store these vectors in PostgreSQL using the pgvector extension. This lets us run similarity searches directly in the database — no external vector store needed.
Cosine similarity: finding related feedback
When new feedback arrives, we compute cosine similarity against all existing cluster centroids. If the similarity exceeds our threshold (0.82), the feedback joins that cluster. If not, a new cluster is created.
This threshold was tuned empirically. Too low (0.7) and unrelated feedback gets grouped. Too high (0.9) and obvious duplicates stay separate. 0.82 hits the sweet spot for product feedback.
Auto-naming with Claude
Once a cluster has 2+ items, we send the feedback titles to Claude haiku-4-5and ask for a concise cluster name and summary. The prompt is simple:
- Given these feedback items, generate a short title (under 30 chars)
- Write a 1-sentence summary of what users are asking for
- Classify sentiment as positive, negative, or neutral
The result is stored and displayed immediately. Admins can edit the title and summary inline if the AI gets it wrong.
Priority scoring
Each cluster gets a priority score (0–100) based on three factors:
- Vote count — total votes across all feedback in the cluster (weighted 50%)
- Feedback count — how many separate users reported this (weighted 30%)
- Recency — average age of feedback items (weighted 20%)
High-priority clusters (70+) are highlighted in red, medium (40–69) in amber, and low in green. This gives product managers an instant view of what matters most.
The full pipeline
Putting it all together, every feedback submission triggers this async pipeline:
- User submits feedback via widget or public board
- Text is embedded using Voyage AI (1024-dim vector)
- Cosine similarity search against existing clusters
- Join existing cluster or create new one
- Claude generates/updates cluster name + summary + sentiment
- Priority score recalculated
The entire pipeline runs in under 2 seconds. Users see their feedback instantly, and admins see it auto-categorized in the cluster view.
Try FeedMission for free
Collect feedback, cluster with AI, and ship better products.
Get Started Free