SlideShare uma empresa Scribd logo
1 de 19
REVIEW OF EVALUATION METRICS USED IN LITERATURE + PROPOSED IDEA
DREAMS, EDCC 2021
13 September 2021
Evaluation of Human-in-the-Loop Learning
based Autonomous Systems
Prajit Thazhurazhikath Rajendran, Huascar Espinoza, Chokri Mraidha (CEA, DILS-
LSEA),
Agnes Delaborde (LNE)
| 2
DREAMS 2021 | Prajit T Rajendran
Safety challenges of DL/AI components
• The use of DL/AI components in autonomous
systems comes with various challenges:
• Vulnerable to out of distribution data
• Adversarial inputs
• Anomalies
• Lack of transparency
• Stochastic nature
• Unknown unknowns
• Uncertainty
• Safety is an emergent property- it is not as a
property of any particular component individually
• Regulation/qualification/certification of such DL/AI
components is an ongoing work by the community
• Traditional approaches do not facilitate safe learning
• Humans can guide the system to safe behavior with
their knowledge, experience and adaptability
normal
anomaly
Out-of-Distribution Samples
| 3
DREAMS 2021 | Prajit T Rajendran
Categories of human-in-the-loop learning methods
Active learning
• Semi-supervised ML where only a subset of the training data is labelled
• Human queried interactively to label data points of interest from the unlabelled set
• PROS: Reduces data labelling requirement
• CONS: Selecting the right points to query is important
| 4
DREAMS 2021 | Prajit T Rajendran
Categories of human-in-the-loop learning methods
Demonstration
• Human is in full control and provides demonstrations to train the agent
• Agent can mimic human data to use as a safe starting point
• PROS: Leads to safer policies
• CONS: More human effort needed, may be subjective, train-test distribution shift
| 5
DREAMS 2021 | Prajit T Rajendran
Categories of human-in-the-loop learning methods
Intervention
• Human, agent share control and human intervenes when necessary
• Human takes over control to avoid catastrophic states and agent learns from these
• PROS: Leads to safer policies
• CONS: Need to keep human in the loop for long, slow response time
| 6
DREAMS 2021 | Prajit T Rajendran
Categories of human-in-the-loop learning methods
Evaluation
• Agent in full control and human provides feedback for tasks
• Human gives feedback based on known objective or preference, which the agent
learns
• PROS: Leads to safer policies
• CONS: Need to keep human in the loop for long, credit attribution problem,
subjective feedback
| 7
DREAMS 2021 | Prajit T Rajendran
Common metrics in HITL learning methods
Rate of task completion
Safety
Performance
Data
requirement
User trust
Time
User
satisfaction
Rate of catastrophies
Response time
Training time
Subjective measures
Likert scale
Binary feedback
Type of interactions
Number of queries
Average reward
Deviation from thresholds
Number of interventions
| 8
DREAMS 2021 | Prajit T Rajendran
Common metrics in HITL learning methods
Safety
• Learning from intervention used
• Human intervenes to avoid undesirable events or catastrophies
• Policy constrained to safer regions
• Evaluated based on number of occurences of catastrophies
Trial without error- Towards safe RL via human intervention, William Saunders et.al
Trial without error- Towards safe RL via human intervention, William Saunders et.al
| 9
DREAMS 2021 | Prajit T Rajendran
Common metrics in HITL learning methods
Performance
One shot imitation learning, Yan Duan et.al
One shot imitation learning, Yan Duan et.al
• Learning from demonstration + meta-learning used
• Train networks that are not specific to one task and can adapt to
new tasks
• Evaluated based on average rate of success/task completion
| 10
DREAMS 2021 | Prajit T Rajendran
Common metrics in HITL learning methods
Time
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
• DAgger approaches: Learning from demonstration + intervention
• Start with imitation of expert policy, collect data
• Train the next policy under the aggregate of all collected datasets
• Hand over control to expert if necessary based on rulesets
• Evaluated based on number of training iterations needed to reach
a significant level of performance
Dropout Dagger- A Bayesian approach to safe imitation learning- Kunal Menda et.al
| 11
DREAMS 2021 | Prajit T Rajendran
Common metrics in HITL learning methods
Data
requirement
Overcoming blindspots in the real world: Leveraging complementary
abilities for joint execution, Ramya Ramakrishnan et.al
Overcoming blindspots in the real world: Leveraging complementary
abilities for joint execution, Ramya Ramakrishnan et.al
• Learning from demonstration + intervention used
• Agent and human both are considered to have blindspots
• Choose actor (human vs agent) based on blindspot activation level
• Evaluated based on number of human queries needed vs average
reward
| 12
DREAMS 2021 | Prajit T Rajendran
Common metrics in HITL learning methods
User
satisfaction
• Trust ~ Extent that the human agrees with the AI
• Questionnaire about use of system, biological
data, number of interventions, “humanness” etc
• Users could be quick to distrust AI with easily
identifiable incorrect result
• Interpretability improves trust
User trust
• Satisfaction w.r.t interaction, performance,
design
• Could be subjective
• Questionnaires, evaluative feedbacks
• Necessary for successful adoption and
widespread use in society
| 13
DREAMS 2021 | Prajit T Rajendran
Limitations of prior approaches
• Assumptions made about humans (even experts) being always correct
• Interactions between human and AI may not always be flawless
• Uncertainty of DL components not considered
• Presence of errors in data
• No existing measure for data quality
• Data quality may be defined in terms of completeness, accuracy and efficiency
Cognitive overload Slow response Incorrect response Lack of attention
Errors in perception Errors in planning Errors in execution
| 14
DREAMS 2021 | Prajit T Rajendran
Proposed approach
• Hypothesis: Bad demonstration samples affect safety; Full self-exploration by system is
also infeasible
• Premise: Infeasible to start training afresh due to large training time, unsafe exploration
Data store
Unsupervised
anomaly detector
Feature
extractor
Environment
dynamics
Anomaly
predictor Policy
learning
module
Correct samples
Erroneous
samples
Non-exploratory
training phase
Candidate
samples
Human-in-the-loop
Environment
Exploratory
training phase
Historical data, demonstrations etc.
| 15
DREAMS 2021 | Prajit T Rajendran
Proposed approach
• Non-exploratory training phase:
• Data from the data store is used to train the anomaly predictor and policy learning modules
• Can use human-in-the-loop to classify outliers as correct or erroneous
• Correct samples can directly be used for policy training
• Erroneous samples can be used to predict future anomalies/faults by combining with model of
environment dynamics
Data store
Unsupervised
anomaly detector
Feature
extractor
Environment
dynamics
Anomaly
predictor Policy
learning
module
Correct samples
Erroneous
samples
Non-exploratory
training phase
Candidate
samples
Human-in-the-loop
Environment
Exploratory
training phase
Historical data, demonstrations etc.
| 16
DREAMS 2021 | Prajit T Rajendran
Proposed approach
• Exploratory training phase:
• System interacts with the environment but chooses actions based on predicted anomaly score
• Facilitates safe exploration by taking previous human feedback into consideration
Data store
Unsupervised
anomaly detector
Feature
extractor
Environment
dynamics
Anomaly
predictor Policy
learning
module
Correct samples
Erroneous
samples
Non-exploratory
training phase
Candidate
samples
Human-in-the-loop
Environment
Exploratory
training phase
Historical data, demonstrations etc.
| 17
DREAMS 2021 | Prajit T Rajendran
Future work
• Evaluation of suitable datasets used in autonomous systems policy/control development
• Development of experimental procedure for design and test of proposed model
• Implementation of human-in-the-loop sample classifier, and anomaly predictor
• Evaluation of system on pre-decided metrics on target domain
| 18
DREAMS 2021 | Prajit T Rajendran
Conclusions
• Identified necessity of human-in-the-loop learning, discussed its categories
• Explored the various evaluation metrics of human-in-the-loop approaches presented in
literature
• Defined the requirements for ”quality data“ with characteristics such as accuracy,
completeness or efficiency
• Proposed a method to measure and improve data quality in human-in-the-loop
approaches
Commissariat à l’énergie atomique et aux énergies alternatives
Institut List | CEA SACLAY NANO-INNOV | BAT. 861 – PC142
91191 Gif-sur-Yvette Cedex - FRANCE
www-list.cea.fr
Établissement public à caractère industriel et commercial | RCS Paris B 775 685 019
Prajit Thazhurazhikath Rajendran
prajit.thazhurazhikath@cea.fr Thank you

Mais conteúdo relacionado

Semelhante a DREAMS 2021 presentation- Evaluation metrics in human-in-the-loop for autonomous systems

Iisrt shiju george (cs)
Iisrt shiju george (cs)Iisrt shiju george (cs)
Iisrt shiju george (cs)
IISRT
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
Edge AI and Vision Alliance
 
Chapter 9 – Proofreading Exercise 1 2 3 4 .docx
Chapter 9 – Proofreading Exercise 1 2 3 4 .docxChapter 9 – Proofreading Exercise 1 2 3 4 .docx
Chapter 9 – Proofreading Exercise 1 2 3 4 .docx
mccormicknadine86
 

Semelhante a DREAMS 2021 presentation- Evaluation metrics in human-in-the-loop for autonomous systems (20)

An Intelligent Career Guidance System using Machine Learning
An Intelligent Career Guidance System using Machine LearningAn Intelligent Career Guidance System using Machine Learning
An Intelligent Career Guidance System using Machine Learning
 
Learning Analytics for Learning
Learning Analytics for LearningLearning Analytics for Learning
Learning Analytics for Learning
 
Iisrt shiju george (cs)
Iisrt shiju george (cs)Iisrt shiju george (cs)
Iisrt shiju george (cs)
 
Neeraj Trivedi - Training of district officials in Bihar
Neeraj Trivedi - Training of district officials in BiharNeeraj Trivedi - Training of district officials in Bihar
Neeraj Trivedi - Training of district officials in Bihar
 
Social Recommendation a Review.pptx
Social Recommendation a Review.pptxSocial Recommendation a Review.pptx
Social Recommendation a Review.pptx
 
Desire2Learn Analytics Oklahoma RUF
Desire2Learn Analytics Oklahoma RUFDesire2Learn Analytics Oklahoma RUF
Desire2Learn Analytics Oklahoma RUF
 
AUTOMATED PROCTORING SYSTEM
AUTOMATED PROCTORING SYSTEMAUTOMATED PROCTORING SYSTEM
AUTOMATED PROCTORING SYSTEM
 
Learning Management Systems Evaluation based on Neutrosophic sets
Learning Management Systems Evaluation based on Neutrosophic setsLearning Management Systems Evaluation based on Neutrosophic sets
Learning Management Systems Evaluation based on Neutrosophic sets
 
Optimizing Data Synthesis and Visualization in Real-Time Decision-Making
Optimizing Data Synthesis and Visualization in Real-Time Decision-MakingOptimizing Data Synthesis and Visualization in Real-Time Decision-Making
Optimizing Data Synthesis and Visualization in Real-Time Decision-Making
 
project presentation on image processing
project presentation on  image processingproject presentation on  image processing
project presentation on image processing
 
Policy imperatives for systems-oriented approaches to scaling up
Policy imperatives for systems-oriented approaches to scaling upPolicy imperatives for systems-oriented approaches to scaling up
Policy imperatives for systems-oriented approaches to scaling up
 
Needs Assessment
Needs AssessmentNeeds Assessment
Needs Assessment
 
Fact finding techniques
Fact finding techniquesFact finding techniques
Fact finding techniques
 
NPTL - Machine Learning by Madhur Jatiya.pdf
NPTL - Machine Learning by Madhur Jatiya.pdfNPTL - Machine Learning by Madhur Jatiya.pdf
NPTL - Machine Learning by Madhur Jatiya.pdf
 
ICS3211_lecture 03 2023.pdf
ICS3211_lecture 03 2023.pdfICS3211_lecture 03 2023.pdf
ICS3211_lecture 03 2023.pdf
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
training evaluation
 training evaluation training evaluation
training evaluation
 
Chapter 9 – Proofreading Exercise 1 2 3 4 .docx
Chapter 9 – Proofreading Exercise 1 2 3 4 .docxChapter 9 – Proofreading Exercise 1 2 3 4 .docx
Chapter 9 – Proofreading Exercise 1 2 3 4 .docx
 
Operations Research
Operations ResearchOperations Research
Operations Research
 
Ellen Wagner: Putting Data to Work
Ellen Wagner: Putting Data to WorkEllen Wagner: Putting Data to Work
Ellen Wagner: Putting Data to Work
 

Último

bams-3rd-case-presentation-scabies-12-05-2020.pptx
bams-3rd-case-presentation-scabies-12-05-2020.pptxbams-3rd-case-presentation-scabies-12-05-2020.pptx
bams-3rd-case-presentation-scabies-12-05-2020.pptx
JocylDuran
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
varanasisatyanvesh
 
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
aqpto5bt
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
wsppdmt
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
jk0tkvfv
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
23050636
 
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontangobat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
siskavia95
 

Último (20)

bams-3rd-case-presentation-scabies-12-05-2020.pptx
bams-3rd-case-presentation-scabies-12-05-2020.pptxbams-3rd-case-presentation-scabies-12-05-2020.pptx
bams-3rd-case-presentation-scabies-12-05-2020.pptx
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
 
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive FutureFuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
 
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
jll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdf
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontangobat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
 
Solution manual for managerial accounting 8th edition by john wild ken shaw b...
Solution manual for managerial accounting 8th edition by john wild ken shaw b...Solution manual for managerial accounting 8th edition by john wild ken shaw b...
Solution manual for managerial accounting 8th edition by john wild ken shaw b...
 

DREAMS 2021 presentation- Evaluation metrics in human-in-the-loop for autonomous systems

  • 1. REVIEW OF EVALUATION METRICS USED IN LITERATURE + PROPOSED IDEA DREAMS, EDCC 2021 13 September 2021 Evaluation of Human-in-the-Loop Learning based Autonomous Systems Prajit Thazhurazhikath Rajendran, Huascar Espinoza, Chokri Mraidha (CEA, DILS- LSEA), Agnes Delaborde (LNE)
  • 2. | 2 DREAMS 2021 | Prajit T Rajendran Safety challenges of DL/AI components • The use of DL/AI components in autonomous systems comes with various challenges: • Vulnerable to out of distribution data • Adversarial inputs • Anomalies • Lack of transparency • Stochastic nature • Unknown unknowns • Uncertainty • Safety is an emergent property- it is not as a property of any particular component individually • Regulation/qualification/certification of such DL/AI components is an ongoing work by the community • Traditional approaches do not facilitate safe learning • Humans can guide the system to safe behavior with their knowledge, experience and adaptability normal anomaly Out-of-Distribution Samples
  • 3. | 3 DREAMS 2021 | Prajit T Rajendran Categories of human-in-the-loop learning methods Active learning • Semi-supervised ML where only a subset of the training data is labelled • Human queried interactively to label data points of interest from the unlabelled set • PROS: Reduces data labelling requirement • CONS: Selecting the right points to query is important
  • 4. | 4 DREAMS 2021 | Prajit T Rajendran Categories of human-in-the-loop learning methods Demonstration • Human is in full control and provides demonstrations to train the agent • Agent can mimic human data to use as a safe starting point • PROS: Leads to safer policies • CONS: More human effort needed, may be subjective, train-test distribution shift
  • 5. | 5 DREAMS 2021 | Prajit T Rajendran Categories of human-in-the-loop learning methods Intervention • Human, agent share control and human intervenes when necessary • Human takes over control to avoid catastrophic states and agent learns from these • PROS: Leads to safer policies • CONS: Need to keep human in the loop for long, slow response time
  • 6. | 6 DREAMS 2021 | Prajit T Rajendran Categories of human-in-the-loop learning methods Evaluation • Agent in full control and human provides feedback for tasks • Human gives feedback based on known objective or preference, which the agent learns • PROS: Leads to safer policies • CONS: Need to keep human in the loop for long, credit attribution problem, subjective feedback
  • 7. | 7 DREAMS 2021 | Prajit T Rajendran Common metrics in HITL learning methods Rate of task completion Safety Performance Data requirement User trust Time User satisfaction Rate of catastrophies Response time Training time Subjective measures Likert scale Binary feedback Type of interactions Number of queries Average reward Deviation from thresholds Number of interventions
  • 8. | 8 DREAMS 2021 | Prajit T Rajendran Common metrics in HITL learning methods Safety • Learning from intervention used • Human intervenes to avoid undesirable events or catastrophies • Policy constrained to safer regions • Evaluated based on number of occurences of catastrophies Trial without error- Towards safe RL via human intervention, William Saunders et.al Trial without error- Towards safe RL via human intervention, William Saunders et.al
  • 9. | 9 DREAMS 2021 | Prajit T Rajendran Common metrics in HITL learning methods Performance One shot imitation learning, Yan Duan et.al One shot imitation learning, Yan Duan et.al • Learning from demonstration + meta-learning used • Train networks that are not specific to one task and can adapt to new tasks • Evaluated based on average rate of success/task completion
  • 10. | 10 DREAMS 2021 | Prajit T Rajendran Common metrics in HITL learning methods Time A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning • DAgger approaches: Learning from demonstration + intervention • Start with imitation of expert policy, collect data • Train the next policy under the aggregate of all collected datasets • Hand over control to expert if necessary based on rulesets • Evaluated based on number of training iterations needed to reach a significant level of performance Dropout Dagger- A Bayesian approach to safe imitation learning- Kunal Menda et.al
  • 11. | 11 DREAMS 2021 | Prajit T Rajendran Common metrics in HITL learning methods Data requirement Overcoming blindspots in the real world: Leveraging complementary abilities for joint execution, Ramya Ramakrishnan et.al Overcoming blindspots in the real world: Leveraging complementary abilities for joint execution, Ramya Ramakrishnan et.al • Learning from demonstration + intervention used • Agent and human both are considered to have blindspots • Choose actor (human vs agent) based on blindspot activation level • Evaluated based on number of human queries needed vs average reward
  • 12. | 12 DREAMS 2021 | Prajit T Rajendran Common metrics in HITL learning methods User satisfaction • Trust ~ Extent that the human agrees with the AI • Questionnaire about use of system, biological data, number of interventions, “humanness” etc • Users could be quick to distrust AI with easily identifiable incorrect result • Interpretability improves trust User trust • Satisfaction w.r.t interaction, performance, design • Could be subjective • Questionnaires, evaluative feedbacks • Necessary for successful adoption and widespread use in society
  • 13. | 13 DREAMS 2021 | Prajit T Rajendran Limitations of prior approaches • Assumptions made about humans (even experts) being always correct • Interactions between human and AI may not always be flawless • Uncertainty of DL components not considered • Presence of errors in data • No existing measure for data quality • Data quality may be defined in terms of completeness, accuracy and efficiency Cognitive overload Slow response Incorrect response Lack of attention Errors in perception Errors in planning Errors in execution
  • 14. | 14 DREAMS 2021 | Prajit T Rajendran Proposed approach • Hypothesis: Bad demonstration samples affect safety; Full self-exploration by system is also infeasible • Premise: Infeasible to start training afresh due to large training time, unsafe exploration Data store Unsupervised anomaly detector Feature extractor Environment dynamics Anomaly predictor Policy learning module Correct samples Erroneous samples Non-exploratory training phase Candidate samples Human-in-the-loop Environment Exploratory training phase Historical data, demonstrations etc.
  • 15. | 15 DREAMS 2021 | Prajit T Rajendran Proposed approach • Non-exploratory training phase: • Data from the data store is used to train the anomaly predictor and policy learning modules • Can use human-in-the-loop to classify outliers as correct or erroneous • Correct samples can directly be used for policy training • Erroneous samples can be used to predict future anomalies/faults by combining with model of environment dynamics Data store Unsupervised anomaly detector Feature extractor Environment dynamics Anomaly predictor Policy learning module Correct samples Erroneous samples Non-exploratory training phase Candidate samples Human-in-the-loop Environment Exploratory training phase Historical data, demonstrations etc.
  • 16. | 16 DREAMS 2021 | Prajit T Rajendran Proposed approach • Exploratory training phase: • System interacts with the environment but chooses actions based on predicted anomaly score • Facilitates safe exploration by taking previous human feedback into consideration Data store Unsupervised anomaly detector Feature extractor Environment dynamics Anomaly predictor Policy learning module Correct samples Erroneous samples Non-exploratory training phase Candidate samples Human-in-the-loop Environment Exploratory training phase Historical data, demonstrations etc.
  • 17. | 17 DREAMS 2021 | Prajit T Rajendran Future work • Evaluation of suitable datasets used in autonomous systems policy/control development • Development of experimental procedure for design and test of proposed model • Implementation of human-in-the-loop sample classifier, and anomaly predictor • Evaluation of system on pre-decided metrics on target domain
  • 18. | 18 DREAMS 2021 | Prajit T Rajendran Conclusions • Identified necessity of human-in-the-loop learning, discussed its categories • Explored the various evaluation metrics of human-in-the-loop approaches presented in literature • Defined the requirements for ”quality data“ with characteristics such as accuracy, completeness or efficiency • Proposed a method to measure and improve data quality in human-in-the-loop approaches
  • 19. Commissariat à l’énergie atomique et aux énergies alternatives Institut List | CEA SACLAY NANO-INNOV | BAT. 861 – PC142 91191 Gif-sur-Yvette Cedex - FRANCE www-list.cea.fr Établissement public à caractère industriel et commercial | RCS Paris B 775 685 019 Prajit Thazhurazhikath Rajendran prajit.thazhurazhikath@cea.fr Thank you