Solution manual for managerial accounting 8th edition by john wild ken shaw b...
DREAMS 2021 presentation- Evaluation metrics in human-in-the-loop for autonomous systems
1. REVIEW OF EVALUATION METRICS USED IN LITERATURE + PROPOSED IDEA
DREAMS, EDCC 2021
13 September 2021
Evaluation of Human-in-the-Loop Learning
based Autonomous Systems
Prajit Thazhurazhikath Rajendran, Huascar Espinoza, Chokri Mraidha (CEA, DILS-
LSEA),
Agnes Delaborde (LNE)
2. | 2
DREAMS 2021 | Prajit T Rajendran
Safety challenges of DL/AI components
• The use of DL/AI components in autonomous
systems comes with various challenges:
• Vulnerable to out of distribution data
• Adversarial inputs
• Anomalies
• Lack of transparency
• Stochastic nature
• Unknown unknowns
• Uncertainty
• Safety is an emergent property- it is not as a
property of any particular component individually
• Regulation/qualification/certification of such DL/AI
components is an ongoing work by the community
• Traditional approaches do not facilitate safe learning
• Humans can guide the system to safe behavior with
their knowledge, experience and adaptability
normal
anomaly
Out-of-Distribution Samples
3. | 3
DREAMS 2021 | Prajit T Rajendran
Categories of human-in-the-loop learning methods
Active learning
• Semi-supervised ML where only a subset of the training data is labelled
• Human queried interactively to label data points of interest from the unlabelled set
• PROS: Reduces data labelling requirement
• CONS: Selecting the right points to query is important
4. | 4
DREAMS 2021 | Prajit T Rajendran
Categories of human-in-the-loop learning methods
Demonstration
• Human is in full control and provides demonstrations to train the agent
• Agent can mimic human data to use as a safe starting point
• PROS: Leads to safer policies
• CONS: More human effort needed, may be subjective, train-test distribution shift
5. | 5
DREAMS 2021 | Prajit T Rajendran
Categories of human-in-the-loop learning methods
Intervention
• Human, agent share control and human intervenes when necessary
• Human takes over control to avoid catastrophic states and agent learns from these
• PROS: Leads to safer policies
• CONS: Need to keep human in the loop for long, slow response time
6. | 6
DREAMS 2021 | Prajit T Rajendran
Categories of human-in-the-loop learning methods
Evaluation
• Agent in full control and human provides feedback for tasks
• Human gives feedback based on known objective or preference, which the agent
learns
• PROS: Leads to safer policies
• CONS: Need to keep human in the loop for long, credit attribution problem,
subjective feedback
7. | 7
DREAMS 2021 | Prajit T Rajendran
Common metrics in HITL learning methods
Rate of task completion
Safety
Performance
Data
requirement
User trust
Time
User
satisfaction
Rate of catastrophies
Response time
Training time
Subjective measures
Likert scale
Binary feedback
Type of interactions
Number of queries
Average reward
Deviation from thresholds
Number of interventions
8. | 8
DREAMS 2021 | Prajit T Rajendran
Common metrics in HITL learning methods
Safety
• Learning from intervention used
• Human intervenes to avoid undesirable events or catastrophies
• Policy constrained to safer regions
• Evaluated based on number of occurences of catastrophies
Trial without error- Towards safe RL via human intervention, William Saunders et.al
Trial without error- Towards safe RL via human intervention, William Saunders et.al
9. | 9
DREAMS 2021 | Prajit T Rajendran
Common metrics in HITL learning methods
Performance
One shot imitation learning, Yan Duan et.al
One shot imitation learning, Yan Duan et.al
• Learning from demonstration + meta-learning used
• Train networks that are not specific to one task and can adapt to
new tasks
• Evaluated based on average rate of success/task completion
10. | 10
DREAMS 2021 | Prajit T Rajendran
Common metrics in HITL learning methods
Time
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
• DAgger approaches: Learning from demonstration + intervention
• Start with imitation of expert policy, collect data
• Train the next policy under the aggregate of all collected datasets
• Hand over control to expert if necessary based on rulesets
• Evaluated based on number of training iterations needed to reach
a significant level of performance
Dropout Dagger- A Bayesian approach to safe imitation learning- Kunal Menda et.al
11. | 11
DREAMS 2021 | Prajit T Rajendran
Common metrics in HITL learning methods
Data
requirement
Overcoming blindspots in the real world: Leveraging complementary
abilities for joint execution, Ramya Ramakrishnan et.al
Overcoming blindspots in the real world: Leveraging complementary
abilities for joint execution, Ramya Ramakrishnan et.al
• Learning from demonstration + intervention used
• Agent and human both are considered to have blindspots
• Choose actor (human vs agent) based on blindspot activation level
• Evaluated based on number of human queries needed vs average
reward
12. | 12
DREAMS 2021 | Prajit T Rajendran
Common metrics in HITL learning methods
User
satisfaction
• Trust ~ Extent that the human agrees with the AI
• Questionnaire about use of system, biological
data, number of interventions, “humanness” etc
• Users could be quick to distrust AI with easily
identifiable incorrect result
• Interpretability improves trust
User trust
• Satisfaction w.r.t interaction, performance,
design
• Could be subjective
• Questionnaires, evaluative feedbacks
• Necessary for successful adoption and
widespread use in society
13. | 13
DREAMS 2021 | Prajit T Rajendran
Limitations of prior approaches
• Assumptions made about humans (even experts) being always correct
• Interactions between human and AI may not always be flawless
• Uncertainty of DL components not considered
• Presence of errors in data
• No existing measure for data quality
• Data quality may be defined in terms of completeness, accuracy and efficiency
Cognitive overload Slow response Incorrect response Lack of attention
Errors in perception Errors in planning Errors in execution
14. | 14
DREAMS 2021 | Prajit T Rajendran
Proposed approach
• Hypothesis: Bad demonstration samples affect safety; Full self-exploration by system is
also infeasible
• Premise: Infeasible to start training afresh due to large training time, unsafe exploration
Data store
Unsupervised
anomaly detector
Feature
extractor
Environment
dynamics
Anomaly
predictor Policy
learning
module
Correct samples
Erroneous
samples
Non-exploratory
training phase
Candidate
samples
Human-in-the-loop
Environment
Exploratory
training phase
Historical data, demonstrations etc.
15. | 15
DREAMS 2021 | Prajit T Rajendran
Proposed approach
• Non-exploratory training phase:
• Data from the data store is used to train the anomaly predictor and policy learning modules
• Can use human-in-the-loop to classify outliers as correct or erroneous
• Correct samples can directly be used for policy training
• Erroneous samples can be used to predict future anomalies/faults by combining with model of
environment dynamics
Data store
Unsupervised
anomaly detector
Feature
extractor
Environment
dynamics
Anomaly
predictor Policy
learning
module
Correct samples
Erroneous
samples
Non-exploratory
training phase
Candidate
samples
Human-in-the-loop
Environment
Exploratory
training phase
Historical data, demonstrations etc.
16. | 16
DREAMS 2021 | Prajit T Rajendran
Proposed approach
• Exploratory training phase:
• System interacts with the environment but chooses actions based on predicted anomaly score
• Facilitates safe exploration by taking previous human feedback into consideration
Data store
Unsupervised
anomaly detector
Feature
extractor
Environment
dynamics
Anomaly
predictor Policy
learning
module
Correct samples
Erroneous
samples
Non-exploratory
training phase
Candidate
samples
Human-in-the-loop
Environment
Exploratory
training phase
Historical data, demonstrations etc.
17. | 17
DREAMS 2021 | Prajit T Rajendran
Future work
• Evaluation of suitable datasets used in autonomous systems policy/control development
• Development of experimental procedure for design and test of proposed model
• Implementation of human-in-the-loop sample classifier, and anomaly predictor
• Evaluation of system on pre-decided metrics on target domain
18. | 18
DREAMS 2021 | Prajit T Rajendran
Conclusions
• Identified necessity of human-in-the-loop learning, discussed its categories
• Explored the various evaluation metrics of human-in-the-loop approaches presented in
literature
• Defined the requirements for ”quality data“ with characteristics such as accuracy,
completeness or efficiency
• Proposed a method to measure and improve data quality in human-in-the-loop
approaches
19. Commissariat à l’énergie atomique et aux énergies alternatives
Institut List | CEA SACLAY NANO-INNOV | BAT. 861 – PC142
91191 Gif-sur-Yvette Cedex - FRANCE
www-list.cea.fr
Établissement public à caractère industriel et commercial | RCS Paris B 775 685 019
Prajit Thazhurazhikath Rajendran
prajit.thazhurazhikath@cea.fr Thank you