Discover how generative AI is changing the way Mayo Clinic employees find information. This presentation will cover practical uses of AI in internal search tools and show real examples of its impact. You'll learn about the importance of trust, transparency, and making sure AI understands the context. The session will also share best practices from testing and research, offer strategies for responsibly adding AI to your systems, and suggest ways to make AI-powered search tools easier to use.
Understanding the Pakistan Budgeting Process: Basics and Key Insights
Smart Search, Smarter Care: Powering Enterprise Search with Generative AI
1. SMART SEARCH,
SMARTER CARE
POWERING ENTERPRISE SEARCH WITH GENERATIVE AI
Gianna Pfister-LaPin
Center for Digital Health
Mayo Clinic
IntraNET Reloaded USA
Hyatt Regency Long Beach
March 20 – 22, 2024
Recovered from lunch?
Ready talk about generative AI
My name is Gianna Pfister-LaPin, from the Center For Digital Health at Mayo Clinic
Level-set expectations
I am a user experience (UX) researcher
I focus on the end users of a product
Work with Employee Platform product team
Conduct research to design and optimize products of Mayo Clinic’s digital workplace
This will be user-centered, non-technical POV.
Don’t try to read all the text on the slides
This work comes from many talented colleagues who are breaking new ground in this space.
Imagines throughout the slide come from Midjourney
Midjourney runs as a bot on Discord server
Runs on prompts – all prompts are at the end of this deck.
This was created with “generate an illustration of generative ai” and it kept making human faces (female) then started making brains
Gartner publishes Hype Cycles – best guess on impact of disruptive technology
It tracks technology from inception > mainstream adoption, assumes all tech will follow the same pattern
“Innovation Trigger” > “Peak of Inflated Expectations” > “Trough of Disillusionment”
“Slope of Enlightenment” > “Plateau of Productivity”
And guess where Generative AI falls on this.
Here’s an emoji form of the Hype Cycle.
Everyone’s heard of ChatGPT
Here is where it sits in the AI family tree
Generative AI – type of AI creates original content that mimics its source material
Foundation Models - fine-tuned on massive unlabeled datasets (e.g. the entire Internet: text, images, videos, music, binary code) all content, everywhere
Large-Language Models (LLMs) are trained specifically on collections of text, designed to output written language (any text) (ex. BERT, PaLM)
GPTs (Generative Pretrained Transformers) group of models, created by OpenAI, that actually understand how language works; grammar, sentence structure (ex. GPT-3, GPT-4)
ChatGPT – OpenAI commercial product that allows people like you and me use a GPT model without having to learn how to program in Python.
Talking about search on the INTERnet
Tell a story:
Bluesky – GenZ can’t use Google
That’s because Google has changed over the last 15-20 years – it’s basically defunct
Results are irrelevant & full of spam and advertising
Academics have proven this is true
Search hack – put “reddit” in your search query
The public is lacking trust in Google’s results
“Trust” is a key theme throughout this presentation
Diagram by Dion Hincliffe
GenAI is a protective layer between orgs knowledge and the AI-enabled knowledge worker
When an employee has access to this kind of tool they become more: efficient, strategic, creative, collaborative
It not only provides access to the knowledge, it helps them think about the knowledge in new & different ways
Generative AI is the new search.
Mayo Clinic’s 2030 strategy is to Cure, Connect, and Transform.
AI and machine learning technologies are key to the strategic plan
Mayo = innovation leader in the delivery of healthcare for years
Integrating digital transformation technologies like AI into all areas of the organization.
It’s estimated that around 30% of medicine can benefit from automation.
Mayo is reducing administrative burden / manual tasks for their workforce
Then they can focus on tasks that are best performed by humans.
Mayo Clinic is shaping the future of work through two case studies that feature genAI
We’re taking steps to ensure the outcomes are accurate and trustable
Both projects = internal search for employees.
Searching for information is one application of genAI technology that a lot of companies are exploring.
This first project deals with a clinical decision tool for clinicians.
AskMayoExpert is an internal decision-making resource that gives physicians quick access to medical guidelines and best practices at the point-of-care. It's part of a bigger ecosystem that helps “Mayo know what Mayo knows”
It contains treatment plans, care process models, which are flowcharts for making care decisions, also information for patients, and ways to find and connect with other Mayo specialists quickly.
It started as a way to ensure every patient gets the same high-quality care at all Mayo Clinic places and their partner facilities.
AskMayoExpert has some links on the front page and some navigational options, but the primary starting point is this search box. You can see, it has predictive search and offers suggestions as you type.
So the team already knew that AME is a trustable source of knowledge that reduces clinicians’ anxiety about making a treatment/diagnosis mistake, but they are frustrated with the way the search worked currently. It’s very “brittle”
Here is the AME Product Team:
Daveena Tauber and Jill Meyerson are the brilliant researchers who designed and facilitated two research studies for this project. They authored the insights and findings that I’m going to share with you. Their work really enables the product team to move forward quickly in implementing this new technology.
Matt Gardner is AME’s product manager and is responsible for steering the product boat and keeping all the stakeholders happy. As you can imagine, Mayo’s leadership likes to keep a close eye on how things are progressing with this and Matt does a great job managing those relationships.
Dr Yunguo Yu (yooun-gou yoo) is an AI expert that we brought in for this initiative. He created the first working LLM on his local computer, a rudimentary model – just to show it could be done. It wasn’t optimized or pretty but it was essential in conducting the subsequent research.
These two research projects took place in the second half of last year and really helped the team understand how current AME users searched for information, how they phrased their search queries, and what they would expect a genAI-enabled search to do for them.
The first study also focused on finding out how search queries were formatted, which helped the team create a question bank to use in training the LLM for the proof-of-concept.
Both research projects used the same general methodology:
They recruited internal participants who had experience with the AME tool from a variety of specialties and roles
The product team also consulted with subject matter experts who happened to have specialized knowledge about AI
They conducted virtual interviews and had participants do things like look at screenshots, look at previous AME queries and talk about what they were thinking when they did them, or try generating queries for a working prototype of a genAI search tool.
For the first study, they asked participants to think about how they would frame the question if they were talking to a knowledgeable colleague.
And finally, the researchers talked quite a bit about how these participants used AME in their daily practice, how they searched for information, and what they thought about using genAI to access content within AME.
This is an example of the kinds of questions created after the first study, and the answers generated by the proof-of-concept model. This is what the participants were shown in order to get their feedback.
Please note that each answer included the content ID of the data source used to generate the response. The participants picked up on this right away.
Other questions we used to train the LLM model
- What questions do I need to ask my patient?
- What diagnostic tests do I need to run and in what order?
- What are the differential diagnoses for [insert symptoms]?
- What is the cause of [insert condition/symptom, etc.]?
- What is the treatment for [insert condition name]?
- What are the normal/abnormal ranges for this test result?
And that fact really speaks to one of the biggest outcomes of these studies.
Trust Requires Transparency. The whole issue of data provenance or exposing the origin and history of a piece of content, is very important to our users. As I mentioned earlier, AME is a trusted source of knowledge that helps reduce anxiety about making the right care decisions, and we want them to know they can continue to trust this product.
Contextual Relevance: in the current state, participants liked the tool overall but didn’t think it would be immediately useful in their practice today. It still needs quite a bit of work in the form of additional functionality, like the ability to add patient details which would ideally generate recommendations customized to that patient
Expert Led: The design of clinical products, and really any product that deals with a highly specialized industry, really needs to be expert-led. Clinicians really don’t have patience for prototypes that are unrealistic. The team really relied on the expertise of subject-matter-experts to make sure things like terminology and so forth are accurate.
One Of Many Tools: Both stakeholders and participants agreed that this way of searching AME is a potentially very helpful tool in the care toolbox and deserves to be resourced appropriately so it can move forward quickly.
Lastly, coming back to trust again, the researchers learned that user trust is transitive. It can be transferred to a person, an organization, or a brand, but it is brittle. When users saw typos or inaccurate answers, participants reacted with distrust and worry.
As you can imagine, the roadmap for this product is quite ambitious and moving ahead quickly.
Next Steps
The next thing the team wants to do is really expand the question bank to increase the realism of the models, to do this they’ll source more questions from a wider variety of user groups across the organization.
They would like to add in the other types of knowledge that clinicians already have access to in AME, including care process models which weren’t included in the training dataset.
There are plans to open the model up so people can directly interact with it in real time. That will greatly increase their understanding of how care providers actually want to use this tool.
Near Future
Looking a little further out, there will be a need to scale up the capacity of the model and its infrastructure to reduce the time spent waiting for an answer to be generated. Right now it’s a little on the slow side.
And they have been talking about conducting pilots actually in the clinic, which would help with understanding exactly how and why a clinician turns to AME and how they’d integrate this new tool into their workflow.
Lastly, discussions about how to best conduct training have come up, which is so important but also challenging when users are as busy as our clinicians are.
Of course every step forward will be taken carefully only after thorough testing. That is truly non-optional.
Moving on, this next case study has to do with how Mayo Clinic is leveraging genAI for enterprise search.
Mayo’s decentralized intranet is robust and, like many other intranets, provides access to tools and platforms needed for clinical and business operations
However, it’s really difficult to find what you need in this extensive knowledge system. Employees complain regularly that it doesn’t meet their needs and it really affects productivity and efficiency.
We’ve tried many different approaches to solve this problem. We’ve used different technology solutions, like Google Search Appliance and SearchBlox.
We’ve also tried to improve the underlying business processes that are causing problems with search.
So the team is now looking at how genAI might help make search better for employees.
The product team supporting Search consists of:
Lisa Semidey and Caroline Little are the talented UX researchers that created the findings I am sharing with you, and ensured they are robustly supported by user insights.
Craig Hobson designed and programmed the Figma prototypes were used to evaluate test assumptions.
Brad Herr and Katie Mau are the product leaders on this team. Brad’s strategic collaboration with internal stakeholders has been immensely invaluable, and Katie’s day-to-day guidance ensured the team kept on track and stayed true to the overall product vision
These studies were conducted in Q1 of this year and results literally just came in a few weeks ago.
Both studies were exploratory in nature and were conducted so the researchers could really understand how employees were currently searching on the intranet. They wanted to get beyond the complaints and the “search sucks” comments to understand what people really needed from search.,
They also wanted to get some initial feedback on a potential interface for what search could look like when genAI is applied.
They followed this up with another round of interviews and testing, using the feedback from the first study to refine the concepts.
As with the first case study, both research projects used the same general methodology:
Participants came from a variety of job roles, both clinical and non-clinical. Both on-site and remote workers were included.
The researchers grouped participants into four different types depending on whether they had prior experience working with generative AI or products like ChatGPT or Bing or Copilot, and if they did, how comfortable they were with them.
Participants were interviewed about these previous experiences, what they thought of AI in general, and what their expectations might be to use genAI to search Mayo’s intranet
After the initial questions, participants were shown some concepts in the form of a Figma prototype that had limited interaction, and the facilitators observed how they explored the interface and had a dialog about it.
Craig the designer created three different doors to the test scenario, which was that it’s open enrollment time and you have questions about your benefits. This was a scenario that is plausible for every employee, so we didn’t have to have multiple scenarios for different job roles.
Obviously these are flat screenshots so they don’t show the interactive nature of these prototypes, but I wanted to give you a little taste of how the team put together the concepts
So we have a version that’s essentially the same search we currently have, but it's enhanced with a featured result at the top of the results list. There’s also a chat button at the top that launches a chat dialog.
We have a version that adds a generated summary at the top
And we have one that’s purely a conversational chat interface, that shows a question-and-answer back and forth type experience.
Looking at them, you can see they are inspired by other commercial products out in the world like Bing or Google Bard, but they take our employees’ needs into consideration rather than blindly following the outside world, sort of like “it’s good enough for OpenAI, so it must be the right way of doing things”
AI Chat Expectations vs Reality: This was really interesting because it was very clear people have different ideas of how a “chat” worked, especially in relation to running a search. People who were familiar with products like ChatGPT just assumed it worked like that, but people who didn’t have that experience thought it would be more like a tech support chat or a customer-service type chat with a human on the other end. So, especially if you are thinking about multi-turn chat, which allows you to ask follow-up questions within the same conversation, what kind of chat experience you are expecting can make an impact on how much detail the user puts into the initial search query.
Guidance and Training: It is very important that we provide employees with resources to help them understand what this new search can do for them. This can be both good instructions and labels within the tool, as well as structured tutorials, formal policies and guidelines, and potentially a sandbox environment where users can practice without worrying about making mistakes.
Personalization: The participants who noticed that the concepts showed they were effectively “logged in” or personally recognized liked this idea, and said it would be valuable if search could use that information about them to serve more relevant results. This could be based on where they were in the organization, what job role they had, or what they had searched for previously.
Content Strategy: Finally, and this insight was more a result of the process of coming up with interface solutions to create the prototypes, genAI won’t solve all our problems. It may do a great job of improving overall findability of content, and integrating different collections of content together which couldn’t be searched previously, it won’t ensure our content is properly vetted or tagged, or ensure it exists at all. It also won’t give overworked content authors time they don’t have to create content, or ensure its brand compliant.
Just like in the other project, this one is moving very quickly.
All the results obtained from the previous research are being incorporated into updated concepts and designs. The product team is working with the designers to speed this iterative process along.
At the same time, product leaders are working to get on the same page with our internal technology office on what the preferred solution will be. That decision will probably have an impact on how the next prototype looks and behaves. There is a desire to keep the same technology for our internal search as what’s on the external public site.
Once that decision is made, they can build a working model that can be used to run tests, and do both preliminary smoke testing as well as use it to run user research.
Using an interactive prototype in Figma is a great way to build something that kind of hints at functionality, but it can be time consuming to keep changing it and updating it. Having a working model would make the testing process much more realistic for the participants.
I’d like to summarize all these findings into a set of recommendations that you can use to help guide your genAI implementation project, if you are planning one in the near future or are still in the early stages.
Some of them may seem very obvious, but they are all research-based and important to consider during a project like this.
The first group is Trust and Reliability. There’s that Trust theme again!
Establish & Maintain Trust: To build and keep employee trust, make sure your genAI tool is seen as a reliable part of your organization's knowledge sources. This means you use data that is thoroughly cleaned and checked for errors to train it. Ensuring the answers the tool generates are accurate, and being upfront about what the tool can and can't do, will ensure your employees will continue to trust it.
Manage Perceptions: When it comes to shaping how people see and feel about this tool, it should align with what your organization sees as valuable. You will also want to be transparent about what steps you are taking to keep their data -- and the organization's data -- safe and the tool secure, because people are very concerned about these risks.
Evaluate Performance:: Set clear benchmarks for how well you expect the tool to perform, especially when it comes to getting things right in critical areas like patient care. Plan to regularly check how it's doing compared to your expectations and get input from experts and any necessary regulatory or oversight groups.
The next recommendations have to do with the overall User Experience.
Usefulness & Relevance: You'll want to focus on making this tool useful, not just usable. It needs to give answers that are straight-to-the-point, that are exactly what someone is looking for when they want them. They should be easy to scan and lead straight to more detailed information if needed, without making people click around for it.
Training: Develop some solid training on how to get the best possible results out of the tool. And make sure they have the bandwidth or space in their schedule to actually take it. Employees should understand what kinds of questions the tool can answer, how to format their queries the best way, and what it just can't help with.
Design for Non-Answers: A genAI-enabled search tool won't be able to answer every question, but it may hallucinate an answer if it can't find one in its data source. Be sure to account for the possibility that it doesn't know the answer, or isn't sure about the answer, and ideally have it direct users to someone in the organization who does have the answer, or to other reliable resources.
Lastly, this group of recommendations deals with the actual implementation of genAI in your organization.
Transition Plan: Change management might be an outdated concept in today’s climate of constant disruption. Sometimes it feels like all we can do is hang on and survive, let alone have time and brainpower to actually make and follow a plan. But implementation of AI can really shake your workforce to the existential core of who and what they are, and employees who are nervous or suspicious are not as productive and efficient as they could be. You might want to think about bringing in a consultant who has experience putting together a cohesive plan that can take into account how disruptive AI can be in an organization.
Communicate Proactively: Not only should you be communicating with employees, but you should also bring your stakeholders on board very, very early. I’m talking all the way up to the CTO or CEO. Plan to present your progress regularly, and when you encounter resistance or you sense they are at all hesitant about your proposal, address it right away. And if they aren’t a little nervous or hesitant, then you probably aren’t doing it right.
Iteratively Improve: Finally, this is a product you will want to iteratively improve on. I may be biased because of the work I do, but I do believe that listening to your users is absolutely essential. Pay attention to what kinds of queries employees are putting into the tool, and what kinds of answers are coming out of it. I’ve recommended to the product team I support to interview at last five users every week to stay on top of how the tool is being used.. You don’t have to bring in dozens of people to get a sense of what’s happening, usually five is enough.
That’s all! Thank you so much for your time and attention. We have X minutes for questions.
Frequently Asked Questions
Technology
AI is supported by an internal technology office (OCTO), who manages the platform and provides services to multiple customers and projects throughout Mayo Clinic
Using Google Vertex: https://cloud.google.com/vertex-ai