Topic Cluster Generator

Free · No account needed

Content clustering via sitemaps

Enter a sitemap URL, adjust settings and hit "Generate clusters". That's it! The tool fetches the sitemap, unfurls the URLs, and clusters them all in one step.

Add your sitemap

Adjust settings

0.7
URL unfurling
Vectorization mode

Cluster visualization

Your cluster map renders here. Export to PNG or CSV, or share a link.

Enter a sitemap URL or paste XML above, then click "Generate clusters" to see your topic map here.

Workflow

How to create content clusters using sitemaps

From sitemap URL to stakeholder-ready cluster map — four steps, all in your browser after URLs are fetched.

  1. Step 1

    Paste a sitemap URL or raw XML (index sitemaps and nested indexes are supported).

  2. Step 2

    Tune clustering settings — Lexical groups by URL/title overlap; Semantic uses embeddings when wording differs.

  3. Step 3

    Click Generate clusters, then explore treemap, tree, icicle, circle packing, or 3D layouts.

  4. Step 4

    Export charts or create a share link when you are ready to hand results to stakeholders.

Controls

Settings available in the sitemap content clustering tool

Every control in the generator panel for this mode, and what it changes in your clusters.

Similarity threshold

How closely URLs must match to land in the same cluster (Lexical, or Semantic with threshold mode).

Min / max cluster size

Caps how small or large each topical group can grow.

Lexical vs Semantic

Lexical uses TF-IDF in your browser; Semantic embeds page text via Gemini for meaning-based grouping.

Unfurl page titles

Fetches live title and meta description per URL for richer signals (uses credits when signed in).

K-means & AI labels

Semantic mode can fix cluster count with k-means and auto-name groups for presentations.

Visualization

Switch chart type after clustering without recomputing — six layouts including 3D graph.

Why teams cluster straight from sitemaps

When stakeholders reference content clustering, they usually mean grouping live URLs — not hypothetical keywords. This workflow anchors clusters to what search engines already discover.

Turn clustered URLs into stakeholder-ready visuals

Swap layouts without recomputing clusters — ideal when executives want treemaps while practitioners prefer trees.

Treemap

See cluster sizes at a glance. The nested rectangles make it immediately obvious which topic groups dominate your site.

Force-directed tree

Explore how pages connect to each other. Drag nodes and zoom in to trace relationships across the full content graph.

Tidy tree

A clean hierarchical view that shows the depth and branching of every cluster, ideal for presenting site architecture.

Icicle chart

Space-efficient stacked bars that let you click into any cluster and zoom through the hierarchy level by level.

Circle packing

Circles within circles. A visually rich layout that makes nested cluster relationships intuitive to read at a glance.

3D knowledge graph

An immersive three-dimensional network where nodes float in space and edges flow between them. Rotate, zoom, and orbit to navigate clusters from any angle.

Purpose-built for large inventories

Designed for migrations, deduping initiatives, and sprint planning where you must defend IA decisions with evidence.

Whole-site discovery without exporting spreadsheets

Point at a standard or index sitemap. We enumerate URLs, normalize paths, and prepare text signals so clustering reflects how your CMS actually publishes pages.

See overlapping topical footprints instantly

Content clustering surfaces semantic buckets directly from URLs and titles — ideal before pruning redirects or consolidating templates.

Privacy-first clustering pipeline

Lexical grouping executes locally once URLs are retrieved. Nothing is persisted unless you deliberately publish a sanitized share snapshot.

Frequently asked questions

Content clustering terminology, inputs, and hand-offs to other modes.

Which sitemap formats are supported?

Standard XML sitemaps plus nested sitemap indexes. Paste XML directly if crawl access is restricted.

What is the difference between Lexical and Semantic mode?

Lexical builds sparse vectors with TF-IDF over the text we derive from URL paths and titles gathered from your crawl. Items land in the same cluster when they literally share important words or stems — great for messy spreadsheets, overlapping product names, or URLs that encode topics in slugs. Semantic sends combined text snippets to Gemini embeddings so similarity reflects meaning, not spelling. Different wording about the same intent can still merge. Turning URL unfurling on adds richer page titles and descriptions for embeddings. Start with Lexical when you want deterministic, offline-friendly grouping; switch to Semantic when synonyms and paraphrases split the Lexical map too finely.

What does K-means do in Semantic mode?

K-means partitions embedding vectors into groups automatically. Roughly speaking, cluster count scales with how many documents you have versus your minimum cluster size — raising minimum size yields fewer, broader themes without tuning a cosine cutoff by hand. Use it when you want the algorithm to infer group boundaries while embeddings stay fixed for your dataset.

What does Custom threshold do in Semantic mode?

Custom threshold skips automatic K-means and instead merges pages only when their embedding cosine similarity meets your cutoff — you effectively dial how aggressively clusters fuse. Higher values demand tighter semantic matches (fewer merges); lower values join looser neighborhoods. Around 0.95 cosine similarity is a sensible starting point with Gemini embeddings before you widen or tighten by hand.

Does content clustering replace manual IA workshops?

It accelerates discovery by proposing statistically coherent groups. Strategists still validate overlap with search intent and business priorities.

Can I jump to keyword uploads instead?

Yes. Open Keyword clustering when you already exported titles from Search Console or a crawler CSV.