SlideShare a Scribd company logo
1 of 36
Download to read offline
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Analyzing Text: Discovering Insights for
the Healthcare Industry
Why was this cited? Generating semantic explanations for the
CORD-19 corpus
Tomas Kliegr
Prague University of Economics and Businesss
Czech Republic
tomas.kliegr@vse.cz
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Prague University of Economics and Business
• The largest university in the
field of economics, business
and information technology in
Czechia, 15 000 students (BSc.,
MSc, MBA, Ph.D)
• English Master programmes
– Information Systems
Management
– Economic and Data Analysis
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Data Science Curriculum
• Programming is a mandatory part of our Applied Informatics BSc.
• However, we use visual approaches non-programming approach
in our data introductory data science course
• Up to 200 students/semester
3
2012 2014
credit risk case study
2021
CLUSTERING
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
The problem
4
Imagine you manage a research team.
You need to pair the expertise of your staff
with the needs of the wider research
community, which will build upon your
results.
How to find out that research made impact?
One of the main KPIs in science is the
number of citations.
What made past research successful –
highly cited?
We will show how to leverage existing
freely available research articles to get the
answers.
Part of CORD-19 dataset containing more than 400.000
articles related to Sars-Cov-2. Source: VOSViewer
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
5
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
6
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
7
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
8
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Example
9
Pratelli, A. (2008). Canine coronavirus inactivation with physical and chemical
agents. The Veterinary Journal, 177(1), 71-79.
Increases citation probability Decreases citation probability
• Our original approach was implemented in Python and involved many trial and error
iterations and hundreds of hours of computation time.
Lucie Beranová, Marcin Joachimiak, Tomáš Kliegr, Gollam Rabby, Vilém Sklenák.
Why was this cited? Generating semantic explanations for the CORD-19 corpus.
Under preparation.
• In this tutorial, we show how the modelling part can be recreated without
programming skills using cloud-based machine learning.
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Data preprocessing
10
> 300.000 articles
36.000
3.000
Open citations
available
CORD-on-FHIR subset
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
11
A dive into the data
Target
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Upload data
12
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Processing of text data
13
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Text processing options
"structure of coronavirus main proteinase
reveals"
• Tokenization and Unigrams
– structure;of;coronavirus ;main;proteinase;reveals
• Bigrams:
– structure of; of coronavirus; coronavirus main;
main proteinase, proteinase releals;
14
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Text processing options
"Structure of coronavirus main proteinase
reveals"
• Stemming: proteinase => protein
• Stop-word removal:
– structure;of;coronavirus ;main;proteinase;reveals
15
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Adding a new column
• Since we aim for a classification task, we will
add a new column indicating if a paper is
above or under median of citation count.
16
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
17
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Creating train/test split
• To validate our evaluation model, we need to
create a train/test split
18
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Setting up the split
• Using a random seed will ensure we get the
same split each time
19
1
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Creating a logistic regression model
20
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Setting up the model
21
Suggested to be
removed based on
automatic feature
importance
assessment
Manualy remove
paper id
Check that target is
set correctly
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Remove predictors
22
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Evaluation
23
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Evaluation
24
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Inspecting model – which topics?
25
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
How to NOT phrase the title?
26
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Inspecting model – which journals?
27
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Size of author team
28
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
.. and patience will also help
29
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Decision tree ensemble
30
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Evaluating the ensemble
31
Disadvantages of ensembles:
• Lower interpretability (accuracy-interpratibility tradefoff)
• Longer learning time
• Longer time required to apply the model
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Comparing models
32
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Deployment
33
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Other insights from the full analysis
34
„Lucie Beranová, Marcin Joachimiak, Tomáš Kliegr*, Gollam Rabby, Vilém Sklenák.
Why was this cited? Generating semantic explanations for the CORD-19 corpus.
Under preparation.“ Preprints shared on request to *.
• Citation „biases“
• Articles with authors who have western sounding
names are better cited
• Phylogenetic distance from human virus: Feline
(FIPV, FCOV) and canine coronaviruses (CCOV) are
lowly cited possibly because these viruses are
more distant from the human virus than camel
and bat viruses.
• Accuracy-interpretability trade-off
• Embeddings-based language models, TF-IDF
• Random Forests, Rule learning, Neural networks
• Directly explainable models (rules) vs explanation
algorithms like LIME or Shapley values
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Credits
• Colleagues and Ph.D. students at VSE:
– Ing. Lucie Beranová (help with mining in Python)
– Gollam Rabby, MSc. (citation counts, proof of concept)
– prof. Vilém Sklenák – bibliometric expert
• Dr. Marcin Joachimiak - Computational Biosciences, LBL
Berkeley
– Interpretation of results
35
#BigMLSchool Twitter: @kliegr
Web: kliegr.eu
Citation patterns in COVID-19
related biomedical literature
Thanks for your
attention!
Tomáš Kliegr, UEP
tomas.kliegr@vse.cz
Open Doors Day on
YouTube March 3, 2021
https://fis.vse.cz/english/

More Related Content

What's hot

BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...
BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...
BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...LOS BANOS NATIONAL HIGH SCHOOL
 
Data science in 10 steps
Data science in 10 stepsData science in 10 steps
Data science in 10 stepsQuantUniversity
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxShanmugasundaram M
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceKoo Ping Shung
 
Fundamental of data analytics
Fundamental of data analyticsFundamental of data analytics
Fundamental of data analyticsEhsanMalik17
 
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...Edureka!
 
New professional careers in data
New professional careers in dataNew professional careers in data
New professional careers in dataDavid Rostcheck
 
Machine learning in action at Pipedrive
Machine learning in action at PipedriveMachine learning in action at Pipedrive
Machine learning in action at PipedriveAndré Karpištšenko
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceMahir Haque
 
Data Science for Business Managers - The bare minimum a manager should know
Data Science for Business Managers - The bare minimum a manager should knowData Science for Business Managers - The bare minimum a manager should know
Data Science for Business Managers - The bare minimum a manager should knowAkin Osman Kazakci
 
Datasciencein E-commerce industry
Datasciencein E-commerce industryDatasciencein E-commerce industry
Datasciencein E-commerce industryRakuten Group, Inc.
 
data scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st centurydata scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st centuryFrank Kienle
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientistVijayMohan Vasu
 
Adding Open Data Value to 'Closed Data' Problems
Adding Open Data Value to 'Closed Data' ProblemsAdding Open Data Value to 'Closed Data' Problems
Adding Open Data Value to 'Closed Data' ProblemsSimon Price
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and AnalyticsSrinath Perera
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsAkin Osman Kazakci
 
E-commerce企業におけるビッグデータ活用の取り組みと今後の展望
E-commerce企業におけるビッグデータ活用の取り組みと今後の展望E-commerce企業におけるビッグデータ活用の取り組みと今後の展望
E-commerce企業におけるビッグデータ活用の取り組みと今後の展望Rakuten Group, Inc.
 
Planning Your Data Science Projects
Planning Your Data Science ProjectsPlanning Your Data Science Projects
Planning Your Data Science ProjectsSpotle.ai
 

What's hot (20)

BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...
BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...
BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...
 
Data science in 10 steps
Data science in 10 stepsData science in 10 steps
Data science in 10 steps
 
Data science 101
Data science 101Data science 101
Data science 101
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Fundamental of data analytics
Fundamental of data analyticsFundamental of data analytics
Fundamental of data analytics
 
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
 
New professional careers in data
New professional careers in dataNew professional careers in data
New professional careers in data
 
Machine learning in action at Pipedrive
Machine learning in action at PipedriveMachine learning in action at Pipedrive
Machine learning in action at Pipedrive
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data Science for Business Managers - The bare minimum a manager should know
Data Science for Business Managers - The bare minimum a manager should knowData Science for Business Managers - The bare minimum a manager should know
Data Science for Business Managers - The bare minimum a manager should know
 
Datasciencein E-commerce industry
Datasciencein E-commerce industryDatasciencein E-commerce industry
Datasciencein E-commerce industry
 
data scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st centurydata scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st century
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientist
 
Adding Open Data Value to 'Closed Data' Problems
Adding Open Data Value to 'Closed Data' ProblemsAdding Open Data Value to 'Closed Data' Problems
Adding Open Data Value to 'Closed Data' Problems
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analytics
 
E-commerce企業におけるビッグデータ活用の取り組みと今後の展望
E-commerce企業におけるビッグデータ活用の取り組みと今後の展望E-commerce企業におけるビッグデータ活用の取り組みと今後の展望
E-commerce企業におけるビッグデータ活用の取り組みと今後の展望
 
Planning Your Data Science Projects
Planning Your Data Science ProjectsPlanning Your Data Science Projects
Planning Your Data Science Projects
 
DataMind Pitch August 2013
DataMind Pitch August 2013DataMind Pitch August 2013
DataMind Pitch August 2013
 

Similar to Analyzing Citation Patterns in COVID-19 Literature

Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Bill Liu
 
Big Data and AI for Covid-19
Big Data and AI for Covid-19Big Data and AI for Covid-19
Big Data and AI for Covid-19Andrew Zhang
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Sciencedgarijo
 
Model repositories and standard formats for model reusability
Model repositories and standard formats for model reusabilityModel repositories and standard formats for model reusability
Model repositories and standard formats for model reusabilityUniversity Medicine Greifswald
 
Data-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge GraphsData-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge GraphsMichel Dumontier
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...GigaScience, BGI Hong Kong
 
Knowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sureKnowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sureSteffen Staab
 
PubChem as a resource for chemical information education
PubChem as a resource for chemical information educationPubChem as a resource for chemical information education
PubChem as a resource for chemical information educationSunghwan Kim
 
Weak supervised learning - Kristina Khvatova
Weak supervised learning - Kristina KhvatovaWeak supervised learning - Kristina Khvatova
Weak supervised learning - Kristina KhvatovaData Science Milan
 
A global integrative ecosystem for digital pathology: how can we get there?
A global integrative ecosystem for digital pathology: how can we get there?A global integrative ecosystem for digital pathology: how can we get there?
A global integrative ecosystem for digital pathology: how can we get there?Yves Sucaet
 
The Ubiquity Partner Network: Global Support for Publishing
The Ubiquity Partner Network: Global Support for PublishingThe Ubiquity Partner Network: Global Support for Publishing
The Ubiquity Partner Network: Global Support for PublishingBrian Hole
 
From Open Access to Open Data
From Open Access to Open DataFrom Open Access to Open Data
From Open Access to Open DataBrian Hole
 
Whole slide imaging: beyond pathology (Pittsburgh Computational Pathology Lec...
Whole slide imaging: beyond pathology (Pittsburgh Computational Pathology Lec...Whole slide imaging: beyond pathology (Pittsburgh Computational Pathology Lec...
Whole slide imaging: beyond pathology (Pittsburgh Computational Pathology Lec...Yves Sucaet
 
Gene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KGene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KBenjamin Good
 
Adding value to scientific results: COMBINE standards & guidelines for system...
Adding value to scientific results: COMBINE standards & guidelines for system...Adding value to scientific results: COMBINE standards & guidelines for system...
Adding value to scientific results: COMBINE standards & guidelines for system...University Medicine Greifswald
 
sbv IMPROVER: an industry initiative to harness the wisdom of the crowd in sc...
sbv IMPROVER: an industry initiative to harness the wisdom of the crowd in sc...sbv IMPROVER: an industry initiative to harness the wisdom of the crowd in sc...
sbv IMPROVER: an industry initiative to harness the wisdom of the crowd in sc...Crowdsourcing Week
 

Similar to Analyzing Citation Patterns in COVID-19 Literature (20)

Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19
 
Big Data and AI for Covid-19
Big Data and AI for Covid-19Big Data and AI for Covid-19
Big Data and AI for Covid-19
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
Model repositories and standard formats for model reusability
Model repositories and standard formats for model reusabilityModel repositories and standard formats for model reusability
Model repositories and standard formats for model reusability
 
Data-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge GraphsData-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge Graphs
 
Omprn 2018 module1_final
Omprn 2018 module1_finalOmprn 2018 module1_final
Omprn 2018 module1_final
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Knowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sureKnowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sure
 
Kohlmeier "Innovations in Academic Search & Discovery - A Case Study From the...
Kohlmeier "Innovations in Academic Search & Discovery - A Case Study From the...Kohlmeier "Innovations in Academic Search & Discovery - A Case Study From the...
Kohlmeier "Innovations in Academic Search & Discovery - A Case Study From the...
 
PubChem as a resource for chemical information education
PubChem as a resource for chemical information educationPubChem as a resource for chemical information education
PubChem as a resource for chemical information education
 
Weak supervised learning - Kristina Khvatova
Weak supervised learning - Kristina KhvatovaWeak supervised learning - Kristina Khvatova
Weak supervised learning - Kristina Khvatova
 
Data and model management in Systems Biology
Data and model management in Systems BiologyData and model management in Systems Biology
Data and model management in Systems Biology
 
A global integrative ecosystem for digital pathology: how can we get there?
A global integrative ecosystem for digital pathology: how can we get there?A global integrative ecosystem for digital pathology: how can we get there?
A global integrative ecosystem for digital pathology: how can we get there?
 
The Ubiquity Partner Network: Global Support for Publishing
The Ubiquity Partner Network: Global Support for PublishingThe Ubiquity Partner Network: Global Support for Publishing
The Ubiquity Partner Network: Global Support for Publishing
 
From Open Access to Open Data
From Open Access to Open DataFrom Open Access to Open Data
From Open Access to Open Data
 
Whole slide imaging: beyond pathology (Pittsburgh Computational Pathology Lec...
Whole slide imaging: beyond pathology (Pittsburgh Computational Pathology Lec...Whole slide imaging: beyond pathology (Pittsburgh Computational Pathology Lec...
Whole slide imaging: beyond pathology (Pittsburgh Computational Pathology Lec...
 
Gene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KGene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2K
 
Adding value to scientific results: COMBINE standards & guidelines for system...
Adding value to scientific results: COMBINE standards & guidelines for system...Adding value to scientific results: COMBINE standards & guidelines for system...
Adding value to scientific results: COMBINE standards & guidelines for system...
 
Pine education-platform
Pine education-platformPine education-platform
Pine education-platform
 
sbv IMPROVER: an industry initiative to harness the wisdom of the crowd in sc...
sbv IMPROVER: an industry initiative to harness the wisdom of the crowd in sc...sbv IMPROVER: an industry initiative to harness the wisdom of the crowd in sc...
sbv IMPROVER: an industry initiative to harness the wisdom of the crowd in sc...
 

More from BigML, Inc

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingBigML, Inc
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationBigML, Inc
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceBigML, Inc
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesBigML, Inc
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector BigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionBigML, Inc
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLBigML, Inc
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLBigML, Inc
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyBigML, Inc
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorBigML, Inc
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsBigML, Inc
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsBigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleBigML, Inc
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIBigML, Inc
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object DetectionBigML, Inc
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image ProcessingBigML, Inc
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureBigML, Inc
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorBigML, Inc
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotBigML, Inc
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...BigML, Inc
 

More from BigML, Inc (20)

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
 

Recently uploaded

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 

Recently uploaded (20)

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 

Analyzing Citation Patterns in COVID-19 Literature

  • 1. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Analyzing Text: Discovering Insights for the Healthcare Industry Why was this cited? Generating semantic explanations for the CORD-19 corpus Tomas Kliegr Prague University of Economics and Businesss Czech Republic tomas.kliegr@vse.cz
  • 2. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Prague University of Economics and Business • The largest university in the field of economics, business and information technology in Czechia, 15 000 students (BSc., MSc, MBA, Ph.D) • English Master programmes – Information Systems Management – Economic and Data Analysis
  • 3. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Data Science Curriculum • Programming is a mandatory part of our Applied Informatics BSc. • However, we use visual approaches non-programming approach in our data introductory data science course • Up to 200 students/semester 3 2012 2014 credit risk case study 2021 CLUSTERING
  • 4. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature The problem 4 Imagine you manage a research team. You need to pair the expertise of your staff with the needs of the wider research community, which will build upon your results. How to find out that research made impact? One of the main KPIs in science is the number of citations. What made past research successful – highly cited? We will show how to leverage existing freely available research articles to get the answers. Part of CORD-19 dataset containing more than 400.000 articles related to Sars-Cov-2. Source: VOSViewer
  • 5. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature 5
  • 6. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature 6
  • 7. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature 7
  • 8. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature 8
  • 9. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Example 9 Pratelli, A. (2008). Canine coronavirus inactivation with physical and chemical agents. The Veterinary Journal, 177(1), 71-79. Increases citation probability Decreases citation probability • Our original approach was implemented in Python and involved many trial and error iterations and hundreds of hours of computation time. Lucie Beranová, Marcin Joachimiak, Tomáš Kliegr, Gollam Rabby, Vilém Sklenák. Why was this cited? Generating semantic explanations for the CORD-19 corpus. Under preparation. • In this tutorial, we show how the modelling part can be recreated without programming skills using cloud-based machine learning.
  • 10. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Data preprocessing 10 > 300.000 articles 36.000 3.000 Open citations available CORD-on-FHIR subset
  • 11. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature 11 A dive into the data Target
  • 12. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Upload data 12
  • 13. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Processing of text data 13
  • 14. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Text processing options "structure of coronavirus main proteinase reveals" • Tokenization and Unigrams – structure;of;coronavirus ;main;proteinase;reveals • Bigrams: – structure of; of coronavirus; coronavirus main; main proteinase, proteinase releals; 14
  • 15. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Text processing options "Structure of coronavirus main proteinase reveals" • Stemming: proteinase => protein • Stop-word removal: – structure;of;coronavirus ;main;proteinase;reveals 15
  • 16. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Adding a new column • Since we aim for a classification task, we will add a new column indicating if a paper is above or under median of citation count. 16
  • 17. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature 17
  • 18. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Creating train/test split • To validate our evaluation model, we need to create a train/test split 18
  • 19. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Setting up the split • Using a random seed will ensure we get the same split each time 19 1
  • 20. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Creating a logistic regression model 20
  • 21. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Setting up the model 21 Suggested to be removed based on automatic feature importance assessment Manualy remove paper id Check that target is set correctly
  • 22. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Remove predictors 22
  • 23. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Evaluation 23
  • 24. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Evaluation 24
  • 25. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Inspecting model – which topics? 25
  • 26. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature How to NOT phrase the title? 26
  • 27. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Inspecting model – which journals? 27
  • 28. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Size of author team 28
  • 29. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature .. and patience will also help 29
  • 30. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Decision tree ensemble 30
  • 31. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Evaluating the ensemble 31 Disadvantages of ensembles: • Lower interpretability (accuracy-interpratibility tradefoff) • Longer learning time • Longer time required to apply the model
  • 32. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Comparing models 32
  • 33. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Deployment 33
  • 34. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Other insights from the full analysis 34 „Lucie Beranová, Marcin Joachimiak, Tomáš Kliegr*, Gollam Rabby, Vilém Sklenák. Why was this cited? Generating semantic explanations for the CORD-19 corpus. Under preparation.“ Preprints shared on request to *. • Citation „biases“ • Articles with authors who have western sounding names are better cited • Phylogenetic distance from human virus: Feline (FIPV, FCOV) and canine coronaviruses (CCOV) are lowly cited possibly because these viruses are more distant from the human virus than camel and bat viruses. • Accuracy-interpretability trade-off • Embeddings-based language models, TF-IDF • Random Forests, Rule learning, Neural networks • Directly explainable models (rules) vs explanation algorithms like LIME or Shapley values
  • 35. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Credits • Colleagues and Ph.D. students at VSE: – Ing. Lucie Beranová (help with mining in Python) – Gollam Rabby, MSc. (citation counts, proof of concept) – prof. Vilém Sklenák – bibliometric expert • Dr. Marcin Joachimiak - Computational Biosciences, LBL Berkeley – Interpretation of results 35
  • 36. #BigMLSchool Twitter: @kliegr Web: kliegr.eu Citation patterns in COVID-19 related biomedical literature Thanks for your attention! Tomáš Kliegr, UEP tomas.kliegr@vse.cz Open Doors Day on YouTube March 3, 2021 https://fis.vse.cz/english/