SlideShare a Scribd company logo
1 of 29
Dr. David Talby
NATURAL LANGUAGE UNDERSTANDING IN HEALTHCARE:
STATE OF THE ART NLP, MACHINE LEARNING AND DEEP
LEARNING WITH OPEN SOURCE SOFTWARE
CONTENTS
1. THE IMPORTANCE OF NLP IN HEALTHCARE AI
2. NLP IS ULTRA DOMAIN SPECIFIC
3. STATE OF THE ART RESEARCH RESULTS
4. SPARK NLP: PRODUCTION GRADE & AT SCALE
AI VS. DOCTORS
Deep Learning
Computer
Vision
Access to Care
Diagnostic
Accuracy
NLP IN HEALTHCARE
Deep Learning
NLP
Efficiency
Accuracy
Radiology Diagnostic
Mental
Health
Safety
Events
Inpatient
Pre-
Auth
Key
Opinion
Leaders
Research
Meta
Analysis
Clinical
Coding
Financial
Anti-
Fraud
Adverse
Events
Drug Development
Recruit
for Trials
RISK PREDICTION CASE STUDY: DETECTING SEPSIS
“Compared to previous work that only
used structured data such as vital signs
and demographic information, utilizing
free text drastically improves the
discriminatory ability (increase in AUC
from 0.67 to 0.86) of identifying
infection.”
COHORT SELECTION CASE STUDY: ONCOLOGY
“Using the combination of structured and
unstructured data, 8324 patients were
identified as having advanced NSCLC.
Of these patients, only 2472 were also in the
cohort generated using structured data only.
Further, 1090 patients would be included in the
structured data only cohort who should have
been excluded based on additional data.”
CONTENTS
1. THE IMPORTANCE OF NLP IN HEALTHCARE AI
2. NLP IS ULTRA DOMAIN SPECIFIC
3. STATE OF THE ART RESEARCH RESULTS
4. SPARK NLP: PRODUCTION GRADE & AT SCALE
WHAT MAKES LANGUAGE HARD
• Nuanced
– Sure / I agree / Absolutely! / Whatever / Yes sir / Just to see you smile ❤️
• Fuzzy
– Blue, New, Tall, Child, Tell, Do
• Contextual
– “Patient denies alcohol abuse”
• Medium specific
– “SGTM c u in 15”
• Domain specific
– All forward-looking statements included in this document are based on
information available to us on the date hereof, and we assume no obligation to
revise or publicly release any revision to any such forward-looking statement,
except as may otherwise be required by law.
We all speak many languages.
THE FIRST RULE OF NLP:
ED Triage Notes
states started last night, upper abd, took alka seltzer approx
0500, no relief. nausea no vomiting
Since yeatreday 10/10 "constant Tylenol 1 hr ago. +nausea.
diaphoretic. Mid abd radiates to back
Generalized abd radiating to lower x 3 days accompanied
by dark stools. Now with bloody stool this am. Denies dizzy,
sob, fatigue. Visiting from Japan on business.”
Features
Type of Pain
Intensity of Pain
Body part of region
Symptoms
Onset of symptoms
Attempted home remedy
EMERGENCY ROOM LANGUAGE
Different Vocabulary
Different Grammar
Different Context
Different Meaning
Different
Language Models
Tokenizer Normalizer
Lemmatizer Fact Extraction
Part of Speech Tagger Spell Checker
Coreference Resolution Dependency Parser
Sentence Splitting Negation Detection
Named Entity Recognition Sentiment Analysis
Intent Classification Summarization
Word Embeddings Emotion Detection
Question Answering Relevance Ranking
Best Next Action Translation
CONTENTS
1. THE IMPORTANCE OF NLP IN HEALTHCARE AI
2. NLP IS ULTRA DOMAIN SPECIFIC
3. STATE OF THE ART RESEARCH RESULTS
4. SPARK NLP: PRODUCTION GRADE & AT SCALE
NAMED ENTITY RECOGNITION
“DEEP LEARNED” NER
“Entity Recognition from Clinical Texts via Recurrent Neural Network”.
Liu et al., BMC Medical Informatics & Decision Making, July 2017.
F-Score Dataset Task
85.81% 2010 i2b2 Medical concept extraction
92.29% 2012 i2b2 Clinical event detection
94.37% 2014 i2b2 De-identification
ENTITY RESOLUTION
“DEEP LEARNED” ENTITY RESOLUTION
“CNN-based ranking for biomedical entity normalization”.
Li et al., BMC Bioinformatics, October 2017.
F-Score Dataset Task
90.30%
ShARe /
CLEF
Disease & problem norm.
92.29% NCBI Disease norm. in literature
ASSERTION STATUS DETECTION
Prescribing sick days due to diagnosis of influenza. Positive
Jane complains about flu-like symptoms. Speculative
Jane’s RIDT came back clean. Negative
Jane is at risk for flu if she’s not vaccinated. Conditional
Jane’s older brother had the flu last month. Family history
Jane had a severe case of flu last year. Patient history
“DEEP LEARNED” ASSERTION STATUS DETECTION
“Improving Classification of Medical Assertions in Clinical Notes“
Kim et al., In Proceedings of the 49th Annual Meeting of the Association for Computational
Linguistics: Human Language Technologies, 2011.
Dataset Metric
94.17%
4th i2b2/VA
Mirco-averaged F1
79.76% Marco-averaged F1
WORD EMBEDDINGS
BIOMEDICAL WORD EMBEDDINGS
“How to Train Good Word Embeddings for Biomedical NLP”.
Kim et al., In Proceedings of the 49th Annual Meeting of the Association for Computational
Linguistics: Human Language Technologies, 2011.
• Intrinsic measures (similarity between concepts)
• Extrinsic measures (improving NER)
• Hyperparameter optimization
• Sub-sampling
• Minimum-count
• Learning rate
• Vector dimension
• Context window size
• Available under an open license here
CONTENTS
1. THE IMPORTANCE OF NLP IN HEALTHCARE AI
2. NLP IS ULTRA DOMAIN SPECIFIC
3. STATE OF THE ART RESEARCH RESULTS
4. SPARK NLP: PRODUCTION GRADE & AT SCALE
NLP FOR APACHE SPARK
Data Sources API
Spark Core API (RDD’s, Project Tungsten)
Spark SQL API (DataFrame, Catalyst Optimizer)
Spark ML API (Pipeline, Transformer, Estimator)
Part of Speech Tagger
Named Entity Recognition
Sentiment Analysis
Spell Checker
Tokenizer
Stemmer
Lemmatizer
Entity Extraction
Topic Modeling
Word2Vec
TF-IDF
String distance calculation
N-grams calculation
Stop word removal
Train/Test & Cross-Validate
Ensembles
High Performance Natural Language Understanding at Scale
Design Goals
• State of the art Performance & Scale
• Frictionless Reuse
• Enterprise Grade
Built on the Spark ML API’s
Apache 2.0 Licensed
Active development & support
NLP FOR APACHE SPARK: COMBINED NLP & ML PIPELINES
pipeline = pyspark.ml.Pipeline(stages=[
document_assembler,
tokenizer,
stemmer,
normalizer,
stopword_remover,
tf,
idf,
lda])
topic_model = pipeline.fit(df)
Spark NLP annotators
Spark ML featurizers
Spark ML LDA implementation
Single execution plan for whole pipeline
PERFORMANCE BENCHMARKS
• Training was 80x faster to train on 2.6MB
• Training was 38x faster on 100k
• Training on 100k & 2.6MB took roughly the same
• Additional near-linear speedup on a cluster
• Prediction was 1.6x faster on 75MB
• Prediction was 1.4x faster on 15MB
• Adding NLP stages takes roughly the same
• Additional near-linear speedup on a cluster
HEALTHCARE EXTENSIONS
Data Sources API
Spark Core API (RDD’s, Project Tungsten)
Spark SQL API (DataFrame, Catalyst Optimizer)
Spark ML API (Pipeline, Transformer, Estimator)
Part of Speech Tagger
Named Entity Recognition
Sentiment Analysis
Spell Checker
Tokenizer
Stemmer
Lemmatizer
Entity Extraction
Topic Modeling
Word2Vec
TF-IDF
String distance calculation
N-grams calculation
Stop word removal
Train/Test & Cross-Validate
Ensembles
High Performance Natural Language Understanding at Scale
com.johnsnowlabs.nlp.clinical.*
Healthcare specific NLP
annotators for Spark in
Scala, Java or Python:
• Entity Recognition
• Value Extraction
• Word Embeddings
• Assertion Status
• Sentiment Analysis
• Spell Checking, …
data.johnsnowlabs.com/health
1,800+ Expert curated,
clean, linked, enriched &
always up to date data:
• Terminology
• Providers
• Demographics
• Clinical Guidelines
• Genes
• Measures, …
CASE STUDY: DEMAND FORECASTING OF ADMISSIONS FROM ED
Features from Structured Data
• How many patients will be admitted today?
• Data Source: EPIC Clarity data
Reason for visit
Age
Gender
Vital signs
Current wait time
Number of orders
Admit in past 30 days
Type of insurance
CASE STUDY: DEMAND FORECASTING OF ADMISSION FROM ED
Features from Natural Language Text
• A majority of the rich relevant content lies in unstructured notes that
are contributed by doctors and nurses from patient interactions.
• Data Source: Emergency Department Triage notes and other ED notes
Type of Pain
Intensity of Pain
Body part of region
Symptoms
Onset of symptoms
Attempted home remedy
Accuracy Baseline: Human manual prediction
ML with structured data
ML with NLP
USING SPARK NLP
• Homepage: https://nlp.johnsnowlabs.com
– Getting Started, Documentation, Examples, Videos, Blogs
– Join the Slack Community
• GitHub: https://github.com/johnsnowlabs/spark-nlp
– Open Issues & Feature Requests
– Contribute!
• The library has Scala and Python 2 & 3 API’s
• Get directly from maven-central or spark-packages
• Tested on all Spark 2.x versions
david@pacific.ai
@davidtalby
in/davidtalby
THANK YOU!

More Related Content

What's hot

Drug discovery using ai
Drug discovery using aiDrug discovery using ai
Drug discovery using aiSukant Khurana
 
Heart disease prediction system
Heart disease prediction systemHeart disease prediction system
Heart disease prediction systemSWAMI06
 
Machine Learning in Healthcare
Machine Learning in HealthcareMachine Learning in Healthcare
Machine Learning in HealthcareBigR.io
 
Drivers of Remote Patient Monitoring (RPM)
Drivers of Remote Patient Monitoring (RPM)Drivers of Remote Patient Monitoring (RPM)
Drivers of Remote Patient Monitoring (RPM)CRF Health
 
Data Mining : Healthcare Application
Data Mining : Healthcare ApplicationData Mining : Healthcare Application
Data Mining : Healthcare Applicationosman ansari
 
Machine Learning for Disease Prediction
Machine Learning for Disease PredictionMachine Learning for Disease Prediction
Machine Learning for Disease PredictionMustafa Oğuz
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”Dr. C.V. Suresh Babu
 
Data science in health care
Data science in health careData science in health care
Data science in health careChetan Khanzode
 
Artificial intelligence enters the medical field
Artificial intelligence enters the medical fieldArtificial intelligence enters the medical field
Artificial intelligence enters the medical fieldRuchi Jain
 
A Heart Disease Prediction Model using Logistic Regression
A Heart Disease Prediction Model using Logistic RegressionA Heart Disease Prediction Model using Logistic Regression
A Heart Disease Prediction Model using Logistic Regressionijtsrd
 
Electronic health records and machine learning
Electronic health records and machine learningElectronic health records and machine learning
Electronic health records and machine learningEman Abdelrazik
 
Artificial Intelligence in Medicine
Artificial Intelligence in MedicineArtificial Intelligence in Medicine
Artificial Intelligence in MedicineNancy Gertrudiz
 
Tools and techniques adopted for big data analytics
Tools and techniques adopted for big data analyticsTools and techniques adopted for big data analytics
Tools and techniques adopted for big data analyticsJOSEPH FRANCIS
 
Heart disease prediction
Heart disease predictionHeart disease prediction
Heart disease predictionAriful Haque
 
AI in Healthcare: From Hype to Impact (updated)
AI in Healthcare: From Hype to Impact (updated)AI in Healthcare: From Hype to Impact (updated)
AI in Healthcare: From Hype to Impact (updated)Mei Chen, PhD
 
Artificial Intelligence in Health Care
Artificial Intelligence in Health Care Artificial Intelligence in Health Care
Artificial Intelligence in Health Care 247 Labs Inc
 
AI in Healthcare: Defining New Health
AI in Healthcare: Defining New HealthAI in Healthcare: Defining New Health
AI in Healthcare: Defining New HealthKumaraguru Veerasamy
 

What's hot (20)

Drug discovery using ai
Drug discovery using aiDrug discovery using ai
Drug discovery using ai
 
Heart disease prediction system
Heart disease prediction systemHeart disease prediction system
Heart disease prediction system
 
Machine Learning in Healthcare
Machine Learning in HealthcareMachine Learning in Healthcare
Machine Learning in Healthcare
 
Drivers of Remote Patient Monitoring (RPM)
Drivers of Remote Patient Monitoring (RPM)Drivers of Remote Patient Monitoring (RPM)
Drivers of Remote Patient Monitoring (RPM)
 
Data Mining : Healthcare Application
Data Mining : Healthcare ApplicationData Mining : Healthcare Application
Data Mining : Healthcare Application
 
Machine Learning for Disease Prediction
Machine Learning for Disease PredictionMachine Learning for Disease Prediction
Machine Learning for Disease Prediction
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 
AI in Healthcare
AI in HealthcareAI in Healthcare
AI in Healthcare
 
ARTIFICIAL INTELLIGENCE ROLE IN HEALTH CARE Dr.T.V.Rao MD
ARTIFICIAL INTELLIGENCE ROLE IN HEALTH CARE  Dr.T.V.Rao MDARTIFICIAL INTELLIGENCE ROLE IN HEALTH CARE  Dr.T.V.Rao MD
ARTIFICIAL INTELLIGENCE ROLE IN HEALTH CARE Dr.T.V.Rao MD
 
Data science in health care
Data science in health careData science in health care
Data science in health care
 
Artificial intelligence enters the medical field
Artificial intelligence enters the medical fieldArtificial intelligence enters the medical field
Artificial intelligence enters the medical field
 
A Heart Disease Prediction Model using Logistic Regression
A Heart Disease Prediction Model using Logistic RegressionA Heart Disease Prediction Model using Logistic Regression
A Heart Disease Prediction Model using Logistic Regression
 
Electronic health records and machine learning
Electronic health records and machine learningElectronic health records and machine learning
Electronic health records and machine learning
 
Artificial Intelligence in Medicine
Artificial Intelligence in MedicineArtificial Intelligence in Medicine
Artificial Intelligence in Medicine
 
Tools and techniques adopted for big data analytics
Tools and techniques adopted for big data analyticsTools and techniques adopted for big data analytics
Tools and techniques adopted for big data analytics
 
Heart disease prediction
Heart disease predictionHeart disease prediction
Heart disease prediction
 
AI in Healthcare: From Hype to Impact (updated)
AI in Healthcare: From Hype to Impact (updated)AI in Healthcare: From Hype to Impact (updated)
AI in Healthcare: From Hype to Impact (updated)
 
Tesxt mining
Tesxt miningTesxt mining
Tesxt mining
 
Artificial Intelligence in Health Care
Artificial Intelligence in Health Care Artificial Intelligence in Health Care
Artificial Intelligence in Health Care
 
AI in Healthcare: Defining New Health
AI in Healthcare: Defining New HealthAI in Healthcare: Defining New Health
AI in Healthcare: Defining New Health
 

Similar to Natural Language Understanding in Healthcare

State of the Art Natural Language Processing at Scale with Alexander Thomas a...
State of the Art Natural Language Processing at Scale with Alexander Thomas a...State of the Art Natural Language Processing at Scale with Alexander Thomas a...
State of the Art Natural Language Processing at Scale with Alexander Thomas a...Databricks
 
Deep learning for natural language understanding
Deep learning for natural language understandingDeep learning for natural language understanding
Deep learning for natural language understandingDavid Talby
 
Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021David Talby
 
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...Databricks
 
NLP tutorial at AIME 2020
NLP tutorial at AIME 2020NLP tutorial at AIME 2020
NLP tutorial at AIME 2020Rui Zhang
 
The Translational Medicine
The Translational MedicineThe Translational Medicine
The Translational MedicineJoanne Luciano
 
Recent Advances in Deep Learning Techniques for Electronic Health Record
Recent Advances in Deep Learning Techniques for Electronic Health RecordRecent Advances in Deep Learning Techniques for Electronic Health Record
Recent Advances in Deep Learning Techniques for Electronic Health Recordkingstdio
 
Apache Spark NLP: Extending Spark ML to Deliver Fast, Scalable & Unified Nat...
 Apache Spark NLP: Extending Spark ML to Deliver Fast, Scalable & Unified Nat... Apache Spark NLP: Extending Spark ML to Deliver Fast, Scalable & Unified Nat...
Apache Spark NLP: Extending Spark ML to Deliver Fast, Scalable & Unified Nat...Databricks
 
What if we never agree on a common health information model?
What if we never agree on a common health information model?What if we never agree on a common health information model?
What if we never agree on a common health information model?Koray Atalag
 
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...RussellHanson
 
2011 12 08 - LOINC Introduction
2011 12 08 - LOINC Introduction2011 12 08 - LOINC Introduction
2011 12 08 - LOINC Introductiondvreeman
 
Data Mining in Rediology reports
Data Mining in Rediology reportsData Mining in Rediology reports
Data Mining in Rediology reportsSaeed Mehrabi
 
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...Remedy Informatics
 
New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022David Talby
 
How to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical TrialsHow to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical TrialsDavid Talby
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshopGenomeInABottle
 
Qualitative analysis boot camp final presentation slides
Qualitative analysis boot camp final presentation slidesQualitative analysis boot camp final presentation slides
Qualitative analysis boot camp final presentation slidesAlexandra Howson MA, PhD, CHCP
 
Qualitative analysis boot camp final presentation slides
Qualitative analysis boot camp final presentation slidesQualitative analysis boot camp final presentation slides
Qualitative analysis boot camp final presentation slidesAlexandra Howson MA, PhD, CHCP
 
How deep learning reshapes medicine
How deep learning reshapes medicineHow deep learning reshapes medicine
How deep learning reshapes medicineHongyoon Choi
 

Similar to Natural Language Understanding in Healthcare (20)

State of the Art Natural Language Processing at Scale with Alexander Thomas a...
State of the Art Natural Language Processing at Scale with Alexander Thomas a...State of the Art Natural Language Processing at Scale with Alexander Thomas a...
State of the Art Natural Language Processing at Scale with Alexander Thomas a...
 
Deep learning for natural language understanding
Deep learning for natural language understandingDeep learning for natural language understanding
Deep learning for natural language understanding
 
Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021
 
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
 
NLP tutorial at AIME 2020
NLP tutorial at AIME 2020NLP tutorial at AIME 2020
NLP tutorial at AIME 2020
 
The Translational Medicine
The Translational MedicineThe Translational Medicine
The Translational Medicine
 
Recent Advances in Deep Learning Techniques for Electronic Health Record
Recent Advances in Deep Learning Techniques for Electronic Health RecordRecent Advances in Deep Learning Techniques for Electronic Health Record
Recent Advances in Deep Learning Techniques for Electronic Health Record
 
Apache Spark NLP: Extending Spark ML to Deliver Fast, Scalable & Unified Nat...
 Apache Spark NLP: Extending Spark ML to Deliver Fast, Scalable & Unified Nat... Apache Spark NLP: Extending Spark ML to Deliver Fast, Scalable & Unified Nat...
Apache Spark NLP: Extending Spark ML to Deliver Fast, Scalable & Unified Nat...
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
What if we never agree on a common health information model?
What if we never agree on a common health information model?What if we never agree on a common health information model?
What if we never agree on a common health information model?
 
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
 
2011 12 08 - LOINC Introduction
2011 12 08 - LOINC Introduction2011 12 08 - LOINC Introduction
2011 12 08 - LOINC Introduction
 
Data Mining in Rediology reports
Data Mining in Rediology reportsData Mining in Rediology reports
Data Mining in Rediology reports
 
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
 
New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022
 
How to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical TrialsHow to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical Trials
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Qualitative analysis boot camp final presentation slides
Qualitative analysis boot camp final presentation slidesQualitative analysis boot camp final presentation slides
Qualitative analysis boot camp final presentation slides
 
Qualitative analysis boot camp final presentation slides
Qualitative analysis boot camp final presentation slidesQualitative analysis boot camp final presentation slides
Qualitative analysis boot camp final presentation slides
 
How deep learning reshapes medicine
How deep learning reshapes medicineHow deep learning reshapes medicine
How deep learning reshapes medicine
 

More from David Talby

Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...David Talby
 
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st WorldTurning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st WorldDavid Talby
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
 
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...David Talby
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 editionDavid Talby
 
Build your open source data science platform
Build your open source data science platformBuild your open source data science platform
Build your open source data science platformDavid Talby
 
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...David Talby
 
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System
Architecting a Predictive,  Petabyte-Scale, Self-Learning Fraud Detection SystemArchitecting a Predictive,  Petabyte-Scale, Self-Learning Fraud Detection System
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection SystemDavid Talby
 
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...David Talby
 

More from David Talby (9)

Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...
 
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st WorldTurning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
 
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
Build your open source data science platform
Build your open source data science platformBuild your open source data science platform
Build your open source data science platform
 
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
 
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System
Architecting a Predictive,  Petabyte-Scale, Self-Learning Fraud Detection SystemArchitecting a Predictive,  Petabyte-Scale, Self-Learning Fraud Detection System
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System
 
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
 

Recently uploaded

Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 

Recently uploaded (20)

Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 

Natural Language Understanding in Healthcare

  • 1. Dr. David Talby NATURAL LANGUAGE UNDERSTANDING IN HEALTHCARE: STATE OF THE ART NLP, MACHINE LEARNING AND DEEP LEARNING WITH OPEN SOURCE SOFTWARE
  • 2. CONTENTS 1. THE IMPORTANCE OF NLP IN HEALTHCARE AI 2. NLP IS ULTRA DOMAIN SPECIFIC 3. STATE OF THE ART RESEARCH RESULTS 4. SPARK NLP: PRODUCTION GRADE & AT SCALE
  • 3. AI VS. DOCTORS Deep Learning Computer Vision Access to Care Diagnostic Accuracy
  • 4. NLP IN HEALTHCARE Deep Learning NLP Efficiency Accuracy Radiology Diagnostic Mental Health Safety Events Inpatient Pre- Auth Key Opinion Leaders Research Meta Analysis Clinical Coding Financial Anti- Fraud Adverse Events Drug Development Recruit for Trials
  • 5. RISK PREDICTION CASE STUDY: DETECTING SEPSIS “Compared to previous work that only used structured data such as vital signs and demographic information, utilizing free text drastically improves the discriminatory ability (increase in AUC from 0.67 to 0.86) of identifying infection.”
  • 6. COHORT SELECTION CASE STUDY: ONCOLOGY “Using the combination of structured and unstructured data, 8324 patients were identified as having advanced NSCLC. Of these patients, only 2472 were also in the cohort generated using structured data only. Further, 1090 patients would be included in the structured data only cohort who should have been excluded based on additional data.”
  • 7. CONTENTS 1. THE IMPORTANCE OF NLP IN HEALTHCARE AI 2. NLP IS ULTRA DOMAIN SPECIFIC 3. STATE OF THE ART RESEARCH RESULTS 4. SPARK NLP: PRODUCTION GRADE & AT SCALE
  • 8. WHAT MAKES LANGUAGE HARD • Nuanced – Sure / I agree / Absolutely! / Whatever / Yes sir / Just to see you smile ❤️ • Fuzzy – Blue, New, Tall, Child, Tell, Do • Contextual – “Patient denies alcohol abuse” • Medium specific – “SGTM c u in 15” • Domain specific – All forward-looking statements included in this document are based on information available to us on the date hereof, and we assume no obligation to revise or publicly release any revision to any such forward-looking statement, except as may otherwise be required by law.
  • 9. We all speak many languages. THE FIRST RULE OF NLP:
  • 10. ED Triage Notes states started last night, upper abd, took alka seltzer approx 0500, no relief. nausea no vomiting Since yeatreday 10/10 "constant Tylenol 1 hr ago. +nausea. diaphoretic. Mid abd radiates to back Generalized abd radiating to lower x 3 days accompanied by dark stools. Now with bloody stool this am. Denies dizzy, sob, fatigue. Visiting from Japan on business.” Features Type of Pain Intensity of Pain Body part of region Symptoms Onset of symptoms Attempted home remedy EMERGENCY ROOM LANGUAGE
  • 11. Different Vocabulary Different Grammar Different Context Different Meaning Different Language Models Tokenizer Normalizer Lemmatizer Fact Extraction Part of Speech Tagger Spell Checker Coreference Resolution Dependency Parser Sentence Splitting Negation Detection Named Entity Recognition Sentiment Analysis Intent Classification Summarization Word Embeddings Emotion Detection Question Answering Relevance Ranking Best Next Action Translation
  • 12. CONTENTS 1. THE IMPORTANCE OF NLP IN HEALTHCARE AI 2. NLP IS ULTRA DOMAIN SPECIFIC 3. STATE OF THE ART RESEARCH RESULTS 4. SPARK NLP: PRODUCTION GRADE & AT SCALE
  • 14. “DEEP LEARNED” NER “Entity Recognition from Clinical Texts via Recurrent Neural Network”. Liu et al., BMC Medical Informatics & Decision Making, July 2017. F-Score Dataset Task 85.81% 2010 i2b2 Medical concept extraction 92.29% 2012 i2b2 Clinical event detection 94.37% 2014 i2b2 De-identification
  • 16. “DEEP LEARNED” ENTITY RESOLUTION “CNN-based ranking for biomedical entity normalization”. Li et al., BMC Bioinformatics, October 2017. F-Score Dataset Task 90.30% ShARe / CLEF Disease & problem norm. 92.29% NCBI Disease norm. in literature
  • 17. ASSERTION STATUS DETECTION Prescribing sick days due to diagnosis of influenza. Positive Jane complains about flu-like symptoms. Speculative Jane’s RIDT came back clean. Negative Jane is at risk for flu if she’s not vaccinated. Conditional Jane’s older brother had the flu last month. Family history Jane had a severe case of flu last year. Patient history
  • 18. “DEEP LEARNED” ASSERTION STATUS DETECTION “Improving Classification of Medical Assertions in Clinical Notes“ Kim et al., In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011. Dataset Metric 94.17% 4th i2b2/VA Mirco-averaged F1 79.76% Marco-averaged F1
  • 20. BIOMEDICAL WORD EMBEDDINGS “How to Train Good Word Embeddings for Biomedical NLP”. Kim et al., In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011. • Intrinsic measures (similarity between concepts) • Extrinsic measures (improving NER) • Hyperparameter optimization • Sub-sampling • Minimum-count • Learning rate • Vector dimension • Context window size • Available under an open license here
  • 21. CONTENTS 1. THE IMPORTANCE OF NLP IN HEALTHCARE AI 2. NLP IS ULTRA DOMAIN SPECIFIC 3. STATE OF THE ART RESEARCH RESULTS 4. SPARK NLP: PRODUCTION GRADE & AT SCALE
  • 22. NLP FOR APACHE SPARK Data Sources API Spark Core API (RDD’s, Project Tungsten) Spark SQL API (DataFrame, Catalyst Optimizer) Spark ML API (Pipeline, Transformer, Estimator) Part of Speech Tagger Named Entity Recognition Sentiment Analysis Spell Checker Tokenizer Stemmer Lemmatizer Entity Extraction Topic Modeling Word2Vec TF-IDF String distance calculation N-grams calculation Stop word removal Train/Test & Cross-Validate Ensembles High Performance Natural Language Understanding at Scale Design Goals • State of the art Performance & Scale • Frictionless Reuse • Enterprise Grade Built on the Spark ML API’s Apache 2.0 Licensed Active development & support
  • 23. NLP FOR APACHE SPARK: COMBINED NLP & ML PIPELINES pipeline = pyspark.ml.Pipeline(stages=[ document_assembler, tokenizer, stemmer, normalizer, stopword_remover, tf, idf, lda]) topic_model = pipeline.fit(df) Spark NLP annotators Spark ML featurizers Spark ML LDA implementation Single execution plan for whole pipeline
  • 24. PERFORMANCE BENCHMARKS • Training was 80x faster to train on 2.6MB • Training was 38x faster on 100k • Training on 100k & 2.6MB took roughly the same • Additional near-linear speedup on a cluster • Prediction was 1.6x faster on 75MB • Prediction was 1.4x faster on 15MB • Adding NLP stages takes roughly the same • Additional near-linear speedup on a cluster
  • 25. HEALTHCARE EXTENSIONS Data Sources API Spark Core API (RDD’s, Project Tungsten) Spark SQL API (DataFrame, Catalyst Optimizer) Spark ML API (Pipeline, Transformer, Estimator) Part of Speech Tagger Named Entity Recognition Sentiment Analysis Spell Checker Tokenizer Stemmer Lemmatizer Entity Extraction Topic Modeling Word2Vec TF-IDF String distance calculation N-grams calculation Stop word removal Train/Test & Cross-Validate Ensembles High Performance Natural Language Understanding at Scale com.johnsnowlabs.nlp.clinical.* Healthcare specific NLP annotators for Spark in Scala, Java or Python: • Entity Recognition • Value Extraction • Word Embeddings • Assertion Status • Sentiment Analysis • Spell Checking, … data.johnsnowlabs.com/health 1,800+ Expert curated, clean, linked, enriched & always up to date data: • Terminology • Providers • Demographics • Clinical Guidelines • Genes • Measures, …
  • 26. CASE STUDY: DEMAND FORECASTING OF ADMISSIONS FROM ED Features from Structured Data • How many patients will be admitted today? • Data Source: EPIC Clarity data Reason for visit Age Gender Vital signs Current wait time Number of orders Admit in past 30 days Type of insurance
  • 27. CASE STUDY: DEMAND FORECASTING OF ADMISSION FROM ED Features from Natural Language Text • A majority of the rich relevant content lies in unstructured notes that are contributed by doctors and nurses from patient interactions. • Data Source: Emergency Department Triage notes and other ED notes Type of Pain Intensity of Pain Body part of region Symptoms Onset of symptoms Attempted home remedy Accuracy Baseline: Human manual prediction ML with structured data ML with NLP
  • 28. USING SPARK NLP • Homepage: https://nlp.johnsnowlabs.com – Getting Started, Documentation, Examples, Videos, Blogs – Join the Slack Community • GitHub: https://github.com/johnsnowlabs/spark-nlp – Open Issues & Feature Requests – Contribute! • The library has Scala and Python 2 & 3 API’s • Get directly from maven-central or spark-packages • Tested on all Spark 2.x versions

Editor's Notes

  1. There is not one “language” – every vertical and communication channel has its own jargon that includes vocabulary, grammar, assumptions and semantics. For example – in these ED triage notes, none of the sentences is in valid English, and the words “patient” and “pain” do not appear.