Free · No account needed
Content clustering via sitemaps
Add your sitemap
Adjust settings
Cluster visualization
Your cluster map renders here. Export to PNG or CSV, or share a link.
Enter a sitemap URL or paste XML above, then click "Generate clusters" to see your topic map here.
Workflow
How to create content clusters using sitemaps
From sitemap URL to stakeholder-ready cluster map — four steps, all in your browser after URLs are fetched.
- Step 1
Paste a sitemap URL or raw XML (index sitemaps and nested indexes are supported).
- Step 2
Tune clustering settings — Lexical groups by URL/title overlap; Semantic uses embeddings when wording differs.
- Step 3
Click Generate clusters, then explore treemap, tree, icicle, circle packing, or 3D layouts.
- Step 4
Export charts or create a share link when you are ready to hand results to stakeholders.
Controls
Settings available in the sitemap content clustering tool
Every control in the generator panel for this mode, and what it changes in your clusters.
Similarity threshold
How closely URLs must match to land in the same cluster (Lexical, or Semantic with threshold mode).
Min / max cluster size
Caps how small or large each topical group can grow.
Lexical vs Semantic
Lexical uses TF-IDF in your browser; Semantic embeds page text via Gemini for meaning-based grouping.
Unfurl page titles
Fetches live title and meta description per URL for richer signals (uses credits when signed in).
K-means & AI labels
Semantic mode can fix cluster count with k-means and auto-name groups for presentations.
Visualization
Switch chart type after clustering without recomputing — six layouts including 3D graph.
Why teams cluster straight from sitemaps
When stakeholders reference content clustering, they usually mean grouping live URLs — not hypothetical keywords. This workflow anchors clusters to what search engines already discover.
From a sitemap
Turn any sitemap into a cluster map
Enter a sitemap URL or paste raw XML. The tool fetches, parses, and clusters all your pages in one step, no file export needed.
Open this mode →From scratch (with AI)
Describe your niche, get a full cluster map
No content to upload? No problem. Tell the AI your main topic, industry, and goals, it generates a complete set of pillar pages and supporting content ideas in seconds.
Open this mode →From internal links
Visualize your existing link structure
Export internal link data from Screaming Frog and drop it in. See every page as a node, every link as an edge, sized, colored, and weighted by link type and authority.
Open this mode →From URLs, titles & keywords
Cluster the content you already have
Paste in a list of titles and keywords, or export a CSV from your CMS or a crawler. Lexical TF-IDF clustering runs in your browser to group pages into topics; no data leaves your device.
Open this mode →Turn clustered URLs into stakeholder-ready visuals
Swap layouts without recomputing clusters — ideal when executives want treemaps while practitioners prefer trees.
Treemap
See cluster sizes at a glance. The nested rectangles make it immediately obvious which topic groups dominate your site.
Force-directed tree
Explore how pages connect to each other. Drag nodes and zoom in to trace relationships across the full content graph.
Tidy tree
A clean hierarchical view that shows the depth and branching of every cluster, ideal for presenting site architecture.
Icicle chart
Space-efficient stacked bars that let you click into any cluster and zoom through the hierarchy level by level.
Circle packing
Circles within circles. A visually rich layout that makes nested cluster relationships intuitive to read at a glance.
3D knowledge graph
An immersive three-dimensional network where nodes float in space and edges flow between them. Rotate, zoom, and orbit to navigate clusters from any angle.
Purpose-built for large inventories
Designed for migrations, deduping initiatives, and sprint planning where you must defend IA decisions with evidence.
Whole-site discovery without exporting spreadsheets
Point at a standard or index sitemap. We enumerate URLs, normalize paths, and prepare text signals so clustering reflects how your CMS actually publishes pages.
See overlapping topical footprints instantly
Content clustering surfaces semantic buckets directly from URLs and titles — ideal before pruning redirects or consolidating templates.
Privacy-first clustering pipeline
Lexical grouping executes locally once URLs are retrieved. Nothing is persisted unless you deliberately publish a sanitized share snapshot.
Frequently asked questions
Content clustering terminology, inputs, and hand-offs to other modes.
Which sitemap formats are supported?
Standard XML sitemaps plus nested sitemap indexes. Paste XML directly if crawl access is restricted.
What is the difference between Lexical and Semantic mode?
Lexical builds sparse vectors with TF-IDF over the text we derive from URL paths and titles gathered from your crawl. Items land in the same cluster when they literally share important words or stems — great for messy spreadsheets, overlapping product names, or URLs that encode topics in slugs. Semantic sends combined text snippets to Gemini embeddings so similarity reflects meaning, not spelling. Different wording about the same intent can still merge. Turning URL unfurling on adds richer page titles and descriptions for embeddings. Start with Lexical when you want deterministic, offline-friendly grouping; switch to Semantic when synonyms and paraphrases split the Lexical map too finely.
What does K-means do in Semantic mode?
K-means partitions embedding vectors into groups automatically. Roughly speaking, cluster count scales with how many documents you have versus your minimum cluster size — raising minimum size yields fewer, broader themes without tuning a cosine cutoff by hand. Use it when you want the algorithm to infer group boundaries while embeddings stay fixed for your dataset.
What does Custom threshold do in Semantic mode?
Custom threshold skips automatic K-means and instead merges pages only when their embedding cosine similarity meets your cutoff — you effectively dial how aggressively clusters fuse. Higher values demand tighter semantic matches (fewer merges); lower values join looser neighborhoods. Around 0.95 cosine similarity is a sensible starting point with Gemini embeddings before you widen or tighten by hand.
Does content clustering replace manual IA workshops?
It accelerates discovery by proposing statistically coherent groups. Strategists still validate overlap with search intent and business priorities.
Can I jump to keyword uploads instead?
Yes. Open Keyword clustering when you already exported titles from Search Console or a crawler CSV.