SlideShare uma empresa Scribd logo
1 de 35
Baixar para ler offline
Copyright © President & Fellows of Harvard College.
Ravi Mynampaty
Categorizing Your Search Queries to Improve Findability
About this talk…
 Case study on how we are improving search and
browse by performing clustering exercises on search
query data
 Not rocket science
 High-level overview
 You can follow this method, with your own insights and
tweaks
 You can kick this off next week at your work
Inspired by…
• Chapters 8 & 9
• The power of incrementalism
What is clustering?
A process for organizing and analyzing search log
data that:
 Is repeatable, low-cost, scalable, simple
 Yields actionable results
 Supports constant incremental improvement
to search
What’s clustering good for?
 Ensure results for high frequency queries
 Improve Metadata and Taxonomy
 Inform and validate decision making in site IA
 Informs editorial/curatorial activities
 Provides Feedback for Search Suggestions
o Autosuggest, synonym lists, no-hits page
suggestions
 But more on this later...
So how do I cluster search queries?
A simple set of steps
Create
query report
Cluster
queries
Determine #
queries to
analyze
Analyze
clusters
Draw
conclusions
and ACT
Step 1: Create a query report
We started with the site with the most traffic
• Upper-bound limit
• One year’s data by quarter
• Cut off tail at frequency < 10
Step 1: Create a query report
We started with the site with the most traffic
• Upper-bound limit
• One year’s data by quarter
• Cut off tail at frequency < 10
Step 1: Create a query report
We started with the site with the most traffic
• Upper-bound limit
• One year’s data by quarter
• Cut off tail at frequency < 10
HBS Working Knowledge FY12 Use Snapshot
Overall Traffic
Page Views: 6,439,485
Visits: 3,635,746
Unique visitors: 2,734,620
On-site searches: 174,425
Views per Visit: 1.77
Local Search visit rate: 5%
Organic Search visit rate: 46%
Step 2: Cluster the queries
Step 2 (cont’d): Three levels of clustering
Level Method Example
Narrow Simple
normalization
Eliminate
grammatical,
spelling, typos, and
punctuation
differences
Mid-level Group by subject management,
finance, decision
making
Broad Group by facet topic, name, date,
content type
Step 2 (cont’d): Levels  Tasks Enabled
Level Improve your
base for
query
analysis
Ensure
representation
of major
clusters on your
site
Improve
Metadata/Index
/Taxonomy
Improve
Search
Suggestions
Narrow
(simple)
X X X
Mid-level
(group by
subject)
X X X
Broad
(group by
facet)
X X
Step 2 (cont’d): Narrow Clustering Example
Step 2 (cont’d): Mid-level Example
Cluster brand
branding 245
brand 160
brand management 73
consumer branding 57
global brand 32
service brands 24
brand image retail bank 17
employer branding 16
brand management professional
services 16
global branding 13
b2b branding 13
importance of branding 12
brand 2002 12
brand equity 11
brand image 11
Step 2 (cont’d): Broad Clustering Example
Step 2 (cont’d): List of facets we used
Facet Example
content type
case studies, cases, working papers, articles,
newspaper
date 2011, world in 2030
demographic characteristics women, Gen Y, gender, baby boomers
event economic crisis
format podcast, video
geographic area india, japan, mount everest
industry global wine industry
job type/role
independent director, entrepreneur, ceo, phd
economist
organization name ikea, zara, toyota
person name michael porter, kanter, sebenius
product name / brand name ipad
product/commodity coffee, wine, cement
topic this covers the majority of keywords
work
faculty work, ex: publication name, title of a
case
Step 3: Choose #clusters to analyze
Number of
Clusters
Analyzed
Analyze Top Hits Improve Metadata/
Taxonomy
/Index
Supply Search
Suggestions
50 X
150 X X
300+ X X X
Small # Clusters can cover a lot of your data
Number of top clusters % Total Queries
Top 20 clusters 14
Top 30 clusters 18
Top 50 clusters 26
Top 100 clusters 37
Now you have your clusters…
What do you do with them?
TAKE ACTION!
Analyze Top (“Short Head”) Clusters
Clustering has created a condensed and reliable
list of your top search queries
 Are they what you thought they would be?
 Does the information on your site accurately
represent the top searches?
 Are you fulfilling user needs?
Use your clusters: Improve Site Navigation
Examine the short-head of clusters, basically:
 For each cluster, add up the frequencies
of queries
 Reorder clusters by cumulative frequency
descending
 Ensure top clusters are accounted for in your
navigation
 Use cluster topics as browse/navigation
headers/footers for your website
WK Top Clusters
Cluster Frequency
innovation 867
balanced scorecard 794
leadership 570
cases 545
social media 508
negotiation 470
knowledge management 457
ethics 448
apple 430
corporate social responsibility 398
Use your clusters: Improve Taxonomy
• Missing categories in browse taxonomy
• "Balanced Scorecard"
• “Ethics”
• “Social media”
• Second-level topics in the WK context
Use your clusters: Improve Taxonomy
• Missing categories in browse taxonomy
• "Balanced Scorecard"
• “Ethics”
• “Social media”
• Second-level topics in the WK context
Use your clusters: Improve Taxonomy
• Missing categories in browse taxonomy
• "Balanced Scorecard"
• “Ethics”
• “Social media”
• Second-level topics in the WK context
Use your clusters: Improve Taxonomy
• Missing categories in browse taxonomy
• "Balanced Scorecard"
• “Ethics”
• “Social media”
• Second-level topics in the WK context
Mid-level clustering:
Informs editorial /curatorial activities
 “Featured Topics”
o What topics to highlight this week/month/year
o News items to focus on
o What research guides to create
o How to formulate queries for the topics
How about improving search?
 Clustered list provides synonyms for taxonomy
 Requires human judgment and
standards/guidelines for synonyms – in our
case, synonyms are exact
 Map to one "like term" in the search engine
Example:
Balanced Scorecard, BSC, Balanced score card
kaplan and norton -> Balanced Scorecard
Use your clusters: Improve no-hits page
Time Commitment
• 2 hours to 2 weeks
• Variables include:
• What kind of information you want to gather
• How broad or narrow you want your clusters
• How many queries you analyze
• In our case ~2 person-weeks
Results vs. Time Invested
Analyze top
clusters
Update
Taxonomy
Create New
Metadata
Determine
New Search
Suggestions
2 Hours X X
6 Hours X X X
One Week X X X X
Next Steps: Autosuggest
 Your top clusters probably make up a large
percentage of what people are looking for
o Use them to establish/supplement
auto-suggest!
Example: suggestions for “innovation”
o innovation and leadership
o disruptive innovation
o innovation management
o open innovation
Next Steps: New Access Structures
 Needed an obvious way to search podcasts
o Put in best bets for now
 A lot of people searching for article titles
o Considering simple interface/approach for select
field-specific search, e.g. “title”
 Consider adding other facets to browse
taxonomy where we have entities tagged
o “company name”, “job type/class”, etc.
Summary
 Established plan/process, but be willing to tweak
as you go
 Keep it very simple.
 Play with your data – the more we played, the better
we understood what benefits could be realized by
levels of clustering and effort
 Tuning process/results
o Build staging/working prototypes
o Repeat process on other sites
Thank you! And remember…TAKE ACTION!
Kropla drąży skalę !
Questions?
searchguy@hbs.edu
@ravimynampaty
http://www.slideshare.net/mynampaty/

Mais conteúdo relacionado

Semelhante a Improve Your Site Search Through Simple Query Clustering

Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise SearchFindwise
 
Optimising Your Content for Findability
Optimising Your Content for FindabilityOptimising Your Content for Findability
Optimising Your Content for FindabilityFindwise
 
Marketing AI - How to Build a Keyword Ontology
Marketing AI - How to Build a Keyword OntologyMarketing AI - How to Build a Keyword Ontology
Marketing AI - How to Build a Keyword OntologyDan Segal
 
Optimising Your Content for findability
Optimising Your Content for findabilityOptimising Your Content for findability
Optimising Your Content for findabilityKristian Norling
 
Adaptable Information Workshop slides
Adaptable Information Workshop slidesAdaptable Information Workshop slides
Adaptable Information Workshop slidesLouis Rosenfeld
 
Sosland Online Resource Center
Sosland Online Resource CenterSosland Online Resource Center
Sosland Online Resource Centerguest46fdc3
 
Information Search
Information SearchInformation Search
Information Searchallerhed
 
Best Practices for Enterprise Search
Best Practices for Enterprise SearchBest Practices for Enterprise Search
Best Practices for Enterprise SearchChris Risner
 
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...Lillian Rigling
 
Data Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and FutureData Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and Futurefeiwin
 
Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019Sonya Liberman
 
[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation SystemsAxel de Romblay
 
Catégorisation automatisée de contenus documentaires : la ...
Catégorisation automatisée de contenus documentaires : la ...Catégorisation automatisée de contenus documentaires : la ...
Catégorisation automatisée de contenus documentaires : la ...butest
 
Large language models in higher education
Large language models in higher educationLarge language models in higher education
Large language models in higher educationPeter Trkman
 
CIS 336 (STRAYER) Entire Course NEW
CIS 336 (STRAYER) Entire Course NEWCIS 336 (STRAYER) Entire Course NEW
CIS 336 (STRAYER) Entire Course NEWshyamuopuop
 
Search Analytics for Fun and Profit
Search Analytics for Fun and ProfitSearch Analytics for Fun and Profit
Search Analytics for Fun and ProfitLouis Rosenfeld
 
Udacity webinar on Recommendation Systems
Udacity webinar on Recommendation SystemsUdacity webinar on Recommendation Systems
Udacity webinar on Recommendation SystemsAxel de Romblay
 
Metadata Management In A Social Media World, Spsbos, 2 2010
Metadata Management In A Social Media World, Spsbos, 2 2010Metadata Management In A Social Media World, Spsbos, 2 2010
Metadata Management In A Social Media World, Spsbos, 2 2010Christian Buckley
 
Tuning Up Site Search - IA Summit 2007
Tuning Up Site Search - IA Summit 2007Tuning Up Site Search - IA Summit 2007
Tuning Up Site Search - IA Summit 2007Chris Farnum
 

Semelhante a Improve Your Site Search Through Simple Query Clustering (20)

Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise Search
 
Optimising Your Content for Findability
Optimising Your Content for FindabilityOptimising Your Content for Findability
Optimising Your Content for Findability
 
Marketing AI - How to Build a Keyword Ontology
Marketing AI - How to Build a Keyword OntologyMarketing AI - How to Build a Keyword Ontology
Marketing AI - How to Build a Keyword Ontology
 
Optimising Your Content for findability
Optimising Your Content for findabilityOptimising Your Content for findability
Optimising Your Content for findability
 
Adaptable Information Workshop slides
Adaptable Information Workshop slidesAdaptable Information Workshop slides
Adaptable Information Workshop slides
 
Sosland Online Resource Center
Sosland Online Resource CenterSosland Online Resource Center
Sosland Online Resource Center
 
Information Search
Information SearchInformation Search
Information Search
 
Best Practices for Enterprise Search
Best Practices for Enterprise SearchBest Practices for Enterprise Search
Best Practices for Enterprise Search
 
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
 
Data Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and FutureData Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and Future
 
Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019
 
[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems
 
Catégorisation automatisée de contenus documentaires : la ...
Catégorisation automatisée de contenus documentaires : la ...Catégorisation automatisée de contenus documentaires : la ...
Catégorisation automatisée de contenus documentaires : la ...
 
Large language models in higher education
Large language models in higher educationLarge language models in higher education
Large language models in higher education
 
Organizational Learning in Practice
Organizational Learning in PracticeOrganizational Learning in Practice
Organizational Learning in Practice
 
CIS 336 (STRAYER) Entire Course NEW
CIS 336 (STRAYER) Entire Course NEWCIS 336 (STRAYER) Entire Course NEW
CIS 336 (STRAYER) Entire Course NEW
 
Search Analytics for Fun and Profit
Search Analytics for Fun and ProfitSearch Analytics for Fun and Profit
Search Analytics for Fun and Profit
 
Udacity webinar on Recommendation Systems
Udacity webinar on Recommendation SystemsUdacity webinar on Recommendation Systems
Udacity webinar on Recommendation Systems
 
Metadata Management In A Social Media World, Spsbos, 2 2010
Metadata Management In A Social Media World, Spsbos, 2 2010Metadata Management In A Social Media World, Spsbos, 2 2010
Metadata Management In A Social Media World, Spsbos, 2 2010
 
Tuning Up Site Search - IA Summit 2007
Tuning Up Site Search - IA Summit 2007Tuning Up Site Search - IA Summit 2007
Tuning Up Site Search - IA Summit 2007
 

Mais de Ravi Mynampaty

Build Your Own World Class Directory Search From Alpha to Omega
Build Your Own World Class Directory Search From Alpha to OmegaBuild Your Own World Class Directory Search From Alpha to Omega
Build Your Own World Class Directory Search From Alpha to OmegaRavi Mynampaty
 
Let Search Power Your Intranet!
Let Search Power Your Intranet!Let Search Power Your Intranet!
Let Search Power Your Intranet!Ravi Mynampaty
 
How we spiked the HBS water supply with Solr
How we spiked the HBS water supply with Solr How we spiked the HBS water supply with Solr
How we spiked the HBS water supply with Solr Ravi Mynampaty
 
Building a Solr-driven Web Portal
Building a Solr-driven Web PortalBuilding a Solr-driven Web Portal
Building a Solr-driven Web PortalRavi Mynampaty
 
Developing a Search & Findability Practice for the Enterprise
Developing a Search & Findability Practice for the EnterpriseDeveloping a Search & Findability Practice for the Enterprise
Developing a Search & Findability Practice for the EnterpriseRavi Mynampaty
 
Clustering Search Log Data
Clustering Search Log DataClustering Search Log Data
Clustering Search Log DataRavi Mynampaty
 
How We Incrementally Improved Search
How We Incrementally Improved SearchHow We Incrementally Improved Search
How We Incrementally Improved SearchRavi Mynampaty
 
What to Feed Your Search Engine: The Evolution of Search Analytics at HBS
What to Feed Your Search Engine:  The Evolution of Search Analytics at HBSWhat to Feed Your Search Engine:  The Evolution of Search Analytics at HBS
What to Feed Your Search Engine: The Evolution of Search Analytics at HBSRavi Mynampaty
 
Business owner findability interview questions
Business owner findability interview questionsBusiness owner findability interview questions
Business owner findability interview questionsRavi Mynampaty
 
Developing & Implementing Findability Standards
Developing & Implementing Findability StandardsDeveloping & Implementing Findability Standards
Developing & Implementing Findability StandardsRavi Mynampaty
 

Mais de Ravi Mynampaty (13)

Build Your Own World Class Directory Search From Alpha to Omega
Build Your Own World Class Directory Search From Alpha to OmegaBuild Your Own World Class Directory Search From Alpha to Omega
Build Your Own World Class Directory Search From Alpha to Omega
 
Let Search Power Your Intranet!
Let Search Power Your Intranet!Let Search Power Your Intranet!
Let Search Power Your Intranet!
 
How we spiked the HBS water supply with Solr
How we spiked the HBS water supply with Solr How we spiked the HBS water supply with Solr
How we spiked the HBS water supply with Solr
 
Building a Solr-driven Web Portal
Building a Solr-driven Web PortalBuilding a Solr-driven Web Portal
Building a Solr-driven Web Portal
 
Developing a Search & Findability Practice for the Enterprise
Developing a Search & Findability Practice for the EnterpriseDeveloping a Search & Findability Practice for the Enterprise
Developing a Search & Findability Practice for the Enterprise
 
Unix for Librarians
Unix for LibrariansUnix for Librarians
Unix for Librarians
 
Clustering Search Log Data
Clustering Search Log DataClustering Search Log Data
Clustering Search Log Data
 
How We Incrementally Improved Search
How We Incrementally Improved SearchHow We Incrementally Improved Search
How We Incrementally Improved Search
 
Findability Standards
Findability StandardsFindability Standards
Findability Standards
 
What to Feed Your Search Engine: The Evolution of Search Analytics at HBS
What to Feed Your Search Engine:  The Evolution of Search Analytics at HBSWhat to Feed Your Search Engine:  The Evolution of Search Analytics at HBS
What to Feed Your Search Engine: The Evolution of Search Analytics at HBS
 
Better Search UX
Better Search UXBetter Search UX
Better Search UX
 
Business owner findability interview questions
Business owner findability interview questionsBusiness owner findability interview questions
Business owner findability interview questions
 
Developing & Implementing Findability Standards
Developing & Implementing Findability StandardsDeveloping & Implementing Findability Standards
Developing & Implementing Findability Standards
 

Último

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Último (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

Improve Your Site Search Through Simple Query Clustering

  • 1. Copyright © President & Fellows of Harvard College. Ravi Mynampaty Categorizing Your Search Queries to Improve Findability
  • 2. About this talk…  Case study on how we are improving search and browse by performing clustering exercises on search query data  Not rocket science  High-level overview  You can follow this method, with your own insights and tweaks  You can kick this off next week at your work
  • 3. Inspired by… • Chapters 8 & 9 • The power of incrementalism
  • 4. What is clustering? A process for organizing and analyzing search log data that:  Is repeatable, low-cost, scalable, simple  Yields actionable results  Supports constant incremental improvement to search
  • 5. What’s clustering good for?  Ensure results for high frequency queries  Improve Metadata and Taxonomy  Inform and validate decision making in site IA  Informs editorial/curatorial activities  Provides Feedback for Search Suggestions o Autosuggest, synonym lists, no-hits page suggestions  But more on this later...
  • 6. So how do I cluster search queries? A simple set of steps Create query report Cluster queries Determine # queries to analyze Analyze clusters Draw conclusions and ACT
  • 7. Step 1: Create a query report We started with the site with the most traffic • Upper-bound limit • One year’s data by quarter • Cut off tail at frequency < 10
  • 8. Step 1: Create a query report We started with the site with the most traffic • Upper-bound limit • One year’s data by quarter • Cut off tail at frequency < 10
  • 9. Step 1: Create a query report We started with the site with the most traffic • Upper-bound limit • One year’s data by quarter • Cut off tail at frequency < 10 HBS Working Knowledge FY12 Use Snapshot Overall Traffic Page Views: 6,439,485 Visits: 3,635,746 Unique visitors: 2,734,620 On-site searches: 174,425 Views per Visit: 1.77 Local Search visit rate: 5% Organic Search visit rate: 46%
  • 10. Step 2: Cluster the queries
  • 11. Step 2 (cont’d): Three levels of clustering Level Method Example Narrow Simple normalization Eliminate grammatical, spelling, typos, and punctuation differences Mid-level Group by subject management, finance, decision making Broad Group by facet topic, name, date, content type
  • 12. Step 2 (cont’d): Levels  Tasks Enabled Level Improve your base for query analysis Ensure representation of major clusters on your site Improve Metadata/Index /Taxonomy Improve Search Suggestions Narrow (simple) X X X Mid-level (group by subject) X X X Broad (group by facet) X X
  • 13. Step 2 (cont’d): Narrow Clustering Example
  • 14. Step 2 (cont’d): Mid-level Example Cluster brand branding 245 brand 160 brand management 73 consumer branding 57 global brand 32 service brands 24 brand image retail bank 17 employer branding 16 brand management professional services 16 global branding 13 b2b branding 13 importance of branding 12 brand 2002 12 brand equity 11 brand image 11
  • 15. Step 2 (cont’d): Broad Clustering Example
  • 16. Step 2 (cont’d): List of facets we used Facet Example content type case studies, cases, working papers, articles, newspaper date 2011, world in 2030 demographic characteristics women, Gen Y, gender, baby boomers event economic crisis format podcast, video geographic area india, japan, mount everest industry global wine industry job type/role independent director, entrepreneur, ceo, phd economist organization name ikea, zara, toyota person name michael porter, kanter, sebenius product name / brand name ipad product/commodity coffee, wine, cement topic this covers the majority of keywords work faculty work, ex: publication name, title of a case
  • 17. Step 3: Choose #clusters to analyze Number of Clusters Analyzed Analyze Top Hits Improve Metadata/ Taxonomy /Index Supply Search Suggestions 50 X 150 X X 300+ X X X
  • 18. Small # Clusters can cover a lot of your data Number of top clusters % Total Queries Top 20 clusters 14 Top 30 clusters 18 Top 50 clusters 26 Top 100 clusters 37
  • 19. Now you have your clusters… What do you do with them? TAKE ACTION!
  • 20. Analyze Top (“Short Head”) Clusters Clustering has created a condensed and reliable list of your top search queries  Are they what you thought they would be?  Does the information on your site accurately represent the top searches?  Are you fulfilling user needs?
  • 21. Use your clusters: Improve Site Navigation Examine the short-head of clusters, basically:  For each cluster, add up the frequencies of queries  Reorder clusters by cumulative frequency descending  Ensure top clusters are accounted for in your navigation  Use cluster topics as browse/navigation headers/footers for your website
  • 22. WK Top Clusters Cluster Frequency innovation 867 balanced scorecard 794 leadership 570 cases 545 social media 508 negotiation 470 knowledge management 457 ethics 448 apple 430 corporate social responsibility 398
  • 23. Use your clusters: Improve Taxonomy • Missing categories in browse taxonomy • "Balanced Scorecard" • “Ethics” • “Social media” • Second-level topics in the WK context
  • 24. Use your clusters: Improve Taxonomy • Missing categories in browse taxonomy • "Balanced Scorecard" • “Ethics” • “Social media” • Second-level topics in the WK context
  • 25. Use your clusters: Improve Taxonomy • Missing categories in browse taxonomy • "Balanced Scorecard" • “Ethics” • “Social media” • Second-level topics in the WK context
  • 26. Use your clusters: Improve Taxonomy • Missing categories in browse taxonomy • "Balanced Scorecard" • “Ethics” • “Social media” • Second-level topics in the WK context
  • 27. Mid-level clustering: Informs editorial /curatorial activities  “Featured Topics” o What topics to highlight this week/month/year o News items to focus on o What research guides to create o How to formulate queries for the topics
  • 28. How about improving search?  Clustered list provides synonyms for taxonomy  Requires human judgment and standards/guidelines for synonyms – in our case, synonyms are exact  Map to one "like term" in the search engine Example: Balanced Scorecard, BSC, Balanced score card kaplan and norton -> Balanced Scorecard
  • 29. Use your clusters: Improve no-hits page
  • 30. Time Commitment • 2 hours to 2 weeks • Variables include: • What kind of information you want to gather • How broad or narrow you want your clusters • How many queries you analyze • In our case ~2 person-weeks
  • 31. Results vs. Time Invested Analyze top clusters Update Taxonomy Create New Metadata Determine New Search Suggestions 2 Hours X X 6 Hours X X X One Week X X X X
  • 32. Next Steps: Autosuggest  Your top clusters probably make up a large percentage of what people are looking for o Use them to establish/supplement auto-suggest! Example: suggestions for “innovation” o innovation and leadership o disruptive innovation o innovation management o open innovation
  • 33. Next Steps: New Access Structures  Needed an obvious way to search podcasts o Put in best bets for now  A lot of people searching for article titles o Considering simple interface/approach for select field-specific search, e.g. “title”  Consider adding other facets to browse taxonomy where we have entities tagged o “company name”, “job type/class”, etc.
  • 34. Summary  Established plan/process, but be willing to tweak as you go  Keep it very simple.  Play with your data – the more we played, the better we understood what benefits could be realized by levels of clustering and effort  Tuning process/results o Build staging/working prototypes o Repeat process on other sites
  • 35. Thank you! And remember…TAKE ACTION! Kropla drąży skalę ! Questions? searchguy@hbs.edu @ravimynampaty http://www.slideshare.net/mynampaty/