Sie haben von Machine Learning gehört, aber wissen nicht ganz genau, was es ist oder wofür es gut sein soll? Erfharen Sie mehr zu Andwendungsfällen im Bereich Security, IT Operations, Business Analytics und Internet of Things / Industrie 4.0. Sehen Sie Machine Learning in Action in splunk und finden Sie heraus, wo Ihre Datenreise als nächstes hinführt.
2. 2
Disclaimer
During the course of this presentation, we may make forward looking statements regarding future events
or the expected performance of the company. We caution you that such statements reflect our current
expectations and estimates based on factors currently known to us and that actual events or results
could differ materially. For important factors that may cause actual results to differ from those contained
in our forward-looking statements, please review our filings with the SEC. The forward-looking
statements made in the this presentation are being made as of the time and date of its live presentation.
If reviewed after its live presentation, this presentation may not contain current or accurate information.
We do not assume any obligation to update any forward looking statements we may make.
In addition, any information about our roadmap outlines our general product direction and is subject to
change at any time without notice. It is for informational purposes only and shall not, be incorporated
into any contract or other commitment. Splunk undertakes no obligation either to develop the features
or functionality described or to include any such feature or functionality in a future release.
4. 4
Menschen sind gut im Lernen,
sind aber verloren im Volumen und den
Details großer Datenmengen…
5. 5
ML umgibt uns alle
BEISPIELE
• Gesichtserkennung: klassifikation von Gesichtern in Fotos
• Spam Filter: Identifikation von Spamnachrichten
• Empfehlungssystem: Vorschläge / Vorhersage was
Kunden wahrscheinlich als nächstes kaufen (wollen)
• Betrugserkennung: Identifikation von Transaktionen, die
auf Betrug zurückzuführen sind
• Wettervorhersage: Vorhersage ob es morgen regnet oder
nicht. Einschätzung von Tagesminima und –maxima.
6. 6
Warum brauchen wir Machine Learning?
- Entscheidungsfindung
- Vorhersage von KPIs
- Alarmierung bei Abweichungen
- Aufdecken unbekannter Muster
und Beziehungen
- Automation
Dies alles benötigt verschiedene
Daten aus vielen Quellen. Mengen
an unstrukturierten Daten, Echtzeit
Daten.
7. 7
Platform für Maschinendaten
DevelopVisualize PredictAlertSearch
Engineers Data
Analysts
Security
Analysts
Business
Users
Native Inputs
TCP, UDP, Logs, Scripts, Wire, Mobile
Industrial Data
SCADA, AMI, Meter Reads
Modular Inputs
MQTT, AMQP, COAP, REST, JMS
HTTP Event Collector
Token Authenticated Events
Real Time
Technology Partnerships
Kepware, AWS IoT, Cisco, Palo Alto
Maintenance
Info
Asset
Info
Data
Stores
External Lookups/Enrichment
7
OT
Industrial Assets
IT
Consumer and
Mobile Devices
8. 8
Erkenntnisse für das Geschäft in Echtzeit
Daten aus der Vergangenheit Daten in Echtzeit Statistische Vorhersage
T – a few days T + a few days
Security Operations Center
IT Operations Center
Business Operations Center
Predictive
(Models)
Descriptive
(BI Tools, Data Lakes) Grauzone
10. 10
Was ist Machine Learning?
Was: “Field of study that gives computers the ability to learn
without being explicitly programmed” – A. Samuel, 1959
Wie: Generalisierung (Lernen) anhand von Beispielen (Daten)
Vereinfachter ML workflow:
– EXPLORATION von Daten
– FIT Modelle auf Basis der Daten
– APPLY Modelle in Produktion
– VALIDATE Modellverifikation
– REPEAT
11. 11
Wie Maschinen Lernen
Vorhersage [Prediction]
• Wenn wir dicke Wolken und einen bedeckten Himmel
sehen, sagen wir voraus, dass es (wahrscheinlich) regnen
wird.
Abschätzung [Estimation/Regression]
• Abschätzung wie viel eine Wohnung kostet auf Basis von
Lage, Umgebung, Marktpreise etc.
[Classification/Clustering]
• Bestimmung des Geschlechts einer Person anhand Ihrer
Eigenschaften wie Haarfarbe, Kleidung etc.
Anomalien erkennen [Anomaly Detection]
• Identifikation von Ausreißern oder anderen
ungewöhnlichen Datenpunkten
[Reinforcement Learning]
• Verbesserung des Verhaltens durch Lernen aus Fehlern
Wir alle haben Erfahrungen mit
Lernen gesammelt. Aber: was
steht hinter der Erfahrung?
Wie übersetzen wir dieses
Wissen in Code?
12. 12
Haupttypen von Machine Learning
1. Supervised Learning (überwachtes Lernen):
Generalisierung von beschriebenen, klassifizierten (labeled) Datenpunkten
13. 13
Haupttypen von Machine Learning
2. Unsupervised Learning (unüberwachtes Lernen):
Generalisierung von unbeschriebenen (unlabeled) Datenpunkten
14. 14
3. Reinforcement Learning:
• System wird belohnt (oder bestraft) auf Basis der Qualität seiner Ergebnisse
• Die Handlung führt zu einer Veränderung in der Welt und hat ein Fehlermaß
Haupttypen von Machine Learning
16. 16
Übersicht von Machine Learning in Splunk
Splunk Enterprise
Suchbefehle (SPL) Splunk Premium Apps Custom ML
Platform for Operational Intelligence
17. 17
Machine Learning in SPL
Splunk’s Seach Processing Language (SPL) ist eine mächtige, flexible und
erweiterbare Suchsprache, welche Machine Learning Kommandos enthält
anomalydetection
18. 18
Splunk IT Service Intelligence
Daten
sammeln
Definition von
Services, Entitäten
und KPIs
Monitoring und
Troubleshooting
Analysieren und
detektieren
Data-Defined, Data-Driven Service Insights
Packaged ML:
Adaptive Thresholds and Anomaly Detection
Eine der Splunk Premium Apps
19. 19
Splunk Machine Learning Toolkit (App)
Assistenten: Geführte Modellbildung, -tests und
Deployment für grundlegende ML Ansätze
Showcases: Interaktive Beispiele für typische
Anwendungsfälle aus Bereichen IT, security, business, IoT
Algorithms: 25+ Standard Algorithmen verfügbar
SPL ML Commands: Neue Suchbefehle um Modelle
zu erstellen, zu testen und zu operationalisieren
Python for Scientific Computing Library: 300+
open source Algorithmen verfügbar zur schnellen Nutzung
Erstellung angepasster Analytics für jeden Anwendungsbereich
Erweiterung der Funktionalität der Splunk Platform für UI gestützte Modellierung
21. 21
ITSI,
UBA
Domänen-/
Expertenwissen
(IT, Security, …)
Data
Science
Expertise
Splunk
Expertise
Customized Machine Learning – Erfolgsformel
Identifikation von Anwendungsfällen
Relevant zur Entscheidungsfindung
Priorisierung anhand
des Geschäftsnutzens
SPL
Datenvorbereitung
Bereinigung, Transformation…
Statistik / Mathematischer Hintergrund
Auswahl geeigneter Algorithmen
Erstellung von Modellen
Splunk ML Toolkit
ermöglicht und vereinfacht
anhand von Beispielen und Assistenten
Operationaler Erfolg
22. 22
Zusammenfassung: ML Worflow
Problem: <Irgendetwas in Welt> erzeugt großen Zeit- oder Kostenaufwand. Hypothese des Wertes.
Lösung: Erstellung eines ML Modells um <mögliche Vorfälle> vorherzusagen, vorausschauen zu handeln.
Operationalisierung
1. Alle relevanten Daten zum Problem beschaffen; Exploration der Daten
2. Auswahl und Anpassung (Fit) von Algorithmen auf den Daten, um ein
Modell zu generieren
3. Anwendung (Apply) & Validierung (Validate) der Modelle, bis die
gewünschte Güte der Vorhersage zur Lösung des Problems erreicht ist
4. Bereitstellung des Modells zu Abteilung X, die mit den Ergebnissen arbeitet
23. 23
Machine Learning Prozess in Splunk
Daten
sammeln
Exploration,
Visualisierung
Modellierung
Evaluierung
Bereinigung,
Transformation
Bereit-
stellung
props.conf,
transforms.conf,
Datenmodelle
Add-ons von Splunkbase, etc.
Pivot,
Dataset UI,
SPL
ML Toolkit
Alarmierung,
Dashboards,
Reports
25. 25
Machine Learning bei Splunk Kunden
Network Incident Detection
Service Degradation Detection Security / Fraud Prevention
Prioritize Website Issues
and Predict Root Cause
Predict Gaming Outages
Fraud Prevention
Machine Learning Consulting Services Analytics App built on ML Toolkit
Optimizing operations and business results
Cell Tower Incident Detection
Optimize Repair Operations
Entertainment
Company
26. 26
ML Toolkit Customer Use Cases
Speeding website problem resolution by automatically ranking actions for support engineers
Reducing customer service disruption with early identification of difficult-to-detect network incidents
Minimizing cell tower degradation and downtime with improved issue detection sensitivity
Improving cell tower uptime and reducing repair truck roles with anomaly detection
and root cause analysis
Predicting and averting potential gaming outage conditions with finer-grained detection
Ensuring mobile device security by detecting anomalies in ID authentication
Preventing fraud by Identifying malicious accounts and suspicious activities
Entertainment
Company
Q: What is a statistical model?A: A model is a little copy of the world you can hold in your hands.
Formal: A model is a parametrized relationship between variables.
FITTING a model sets the parameters using feature variables & observed values
APPLYING a model fills in predicted values using feature variables
Image source: http://phdp.github.io/posts/2013-07-05-dtl.html
Getting data into Splunk is designed to be as flexible and easy as possible. Because the indexing engine is so flexible and doesn’t generally require configuration for most machine data generated by all of the devices, control systems, sensors, SCADA, networks, applications and end users connected by industrial networks. There are many options:
Splunk can directly monitor hundreds or thousands of local files, index them and detect changes. Additionally, many customers use our out-of-the-box scripts and tools to generate data – common examples include performance polling scripts on *nix hosts, API and more.
You can onboard data directly from any application or device– opening up new types of machine data to the benefits of Splunk analysis. The Event Collector makes it simple and efficient to collect this data, scaling to millions of events per second, using a developer-friendly, standard HTTP/JSON API and logging libraries
The HTTP Event Collector (EC) uses a standard API and high-volume Splunk endpoint to allow events to be directly sent/collected at extreme velocity. The data volumes supported by Splunk are ideal for IoT and industrial data.
There are many free add-ons and Apps for Splunk software that simplify the connection and collection of data from both industrial systems and the Internet of Things. These include:
Protocol Data Inputs: Recieve data via a number of different data protocols such as TCP , TCP(s) ,HTTP(s) PUT/POST/File Upload , UDP , Websockets , SockJS.
Rest API Modular Input: Poll local and remote REST APIs and index the responses.
Amazon Kinesis Modular Input: Index data from Amazon Kinesis, a fully managed service for real-time streaming data.
Apache Kafka Modular Input: Index messages from Apache Kafka messaging brokers, including clusters managed by Zookeeper.
DB Connect 2: Integrate structured data sources with your Splunk real-time machine data collection.
MQTT Modular Input: Index messages from MQTT, a machine-to-machine connectivity protocol, by subscribing Splunk software to MQTT Broker Topics.
AMQP Modular Input: Index data from message queues provided by AMQP brokers.
JMS Modular Input: Poll and index message queues and topics from messaging queues and topics, including MQTT messages, provided by message providers, including TibcoEMS, Weblogic JMS and ActiveMQ.
COAP Modular Input: Index messages from a COAP (Constrained Application Protocol) Server.
SNMP Modular Input: Collect data by polling SNMP attributes and catching SNMP traps from datacenter infrastructure devices providing cooling and power distribution.
Splunk App for Stream: Capture, filter and index real-time streaming wire data and network events.
Splunk isn’t the only technology that can benefit from collecting machine data, so let Splunk help send the data to those systems that need it. For those systems that want a direct tap into the raw data, Splunk can forward all or a subset of data in real time via TCP as raw text or RFC-compliant syslog. This can be done on the forwarder or centrally via the indexer without incrementing your daily indexing volume. Separately, Splunk can schedule sophisticated correlation searches and configure them to open tickets or insert events into SIEMs or operation event consoles. This allows you to summarize, mash-up and transform the data with the full power of the search language and import data into these other systems in a controlled fashion, even if they don’t natively support all the data types Splunk does.
What’s needed : A solution that can monitor conditions of interest and analyze behaviors of interest across all business processes, and deliver actionable insights to business decision-makers
Splunk handles the full continuum: past, present & future.
DATA IS STILL IN MOTION, still in a BUSINESS PROCESS.
Enrich real-time MACHINE DATA with structured HISTORICAL DATA
Make decisions IN REAL TIME using ALL THE DATA
Q: What is a statistical model?A: A model is a little copy of the world you can hold in your hands.
Formal: A model is a parametrized relationship between variables.
FITTING a model sets the parameters using feature variables & observed values
APPLYING a model fills in predicted values using feature variables
Image source: http://phdp.github.io/posts/2013-07-05-dtl.html
Q: What is a statistical model?A: A model is a little copy of the world you can hold in your hands.
Formal: A model is a parametrized relationship between variables.
FITTING a model sets the parameters using feature variables & observed values
APPLYING a model fills in predicted values using feature variables
Image source: http://phdp.github.io/posts/2013-07-05-dtl.html
Example:
So, let’s look at a simple visual to discuss how it works?
In four simple steps, customers can achieve data driven service insights.
They Get the data in. (all the data…)
They quickly define services, entities, and KPIs
They monitor and troubleshoot
They analyze and detect
Through these steps, the customers is able to realize the value of Data Defined, Data Driven Service Insights.
Machine learning is bringing data analysis into a new era, allowing companies to use predictive analytics that continually “learn” from historical data. These analytics can optimize IT, security and business operations—helping to detect incidents, reduce resolution times, and predict and prevent undesired outcomes.
The Splunk platform makes it easy for you to harness the power of machine learning by offering a rich set of machine learning commands and a guided workbench to create custom models for any use case.
Assistants: Assistants let you choose the algorithm and then guide you through model creation, testing and deployment for common objectives like forecasting values, predicting numeric or categorical fields, and detecting numeric or categorical outliers.
Showcases: Walk through interactive examples of model creation organized by common use cases for IT, security, IoT and business analytics. Examples include predicting disk failures, finding outliers in response time, predicting VPN usage and forecasting internet traffic.
SPL ML Commands: The Splunk platform offers over 20 machine learning commands that can be applied directly to your data for detection, alerting or analysis. Commands such as outlier, predict, cluster and correlate utilize fixed algorithms, while others such asanomalydetection allow you to choose between several algorithms to best fit your needs.
Want more flexibility? With the Machine Learning Toolkit, you get access to additional commands and open source algorithms to create custom models for any use case.
Python for Scientific Computing Library: Use machine learning SPL commands like fit, apply and allow to directly build, test and operationalize models using open source Python algorithms from the Splunk Python for Scientific Computing Add-on.
MS: This slide needs some work and structure around all the types of algos we’re supporting – pre-processing, feature extraction, classification, regression, clustering, time-series forecasting, outlier detection, text analytics, etc.
The ML process is itself a generalization of the different use cases. ML spans domains!
The arrow means OPERATIONALIZE. Feed back incident data & other high-level analysis back into the ML Process. Keep exploring that data & fitting better models to align with reality. Loop Step #5 (Act) back to Step #1 (Data).
Our Early Adopter customers have had much success creating and operationalizing ML models. Some examples include:
Zillow makes hundreds of website updates daily, including content from several partners nationally. These updates can often cause issues in the site. Zillow built an ML model that predicts which of these changes is likely to result in an issue to allow the team to fix them proactively. Once a potential or actual issue has been identified, the model can also provide guidance on likely root cause and resolution.
TELUS has thousands of mobile phone towers across Canada; when one of these goes offline it can cause significant disruption for their customers. TELUS built a model to predict which towers are likely to fail so that they can proactively fix issues before they occur.
Time for ML demo!
Get the ML App: http://tiny.cc/splunkmlapp
Want more? Take Splunk’s Analytics & Data Science course!
Course prework: http://bit.ly/splunkanalytics
Time for ML demo!
Get the ML App: http://tiny.cc/splunkmlapp
Want more? Take Splunk’s Analytics & Data Science course!
Course prework: http://bit.ly/splunkanalytics
Time for ML demo!
Get the ML App: http://tiny.cc/splunkmlapp
Want more? Take Splunk’s Analytics & Data Science course!
Course prework: http://bit.ly/splunkanalytics
Re: ML App v0.9. To be updated after new release. Stay tuned! Lots to come w/ Splunk ML.
Image modified from cover of book Protecting Study Volunteers in Research
Publisher: CenterWatch LLC; 4th Edition edition (June 15, 2012)
NEXT: either leave slide & discuss OR show ML demo
A direct customer-Splunk engagement focused on real-world use of the Splunk Enterprise - MachineLearning Toolkit and Showcase app and related SPL commands
Objectives• Help the customer to be successful in the impactful use of ML• Help Splunk to understand customer use cases and product requirements
Details• Splunk Account SE plus PM/Engineering work directly with customer to guide usage, providesupport, note analytics and product requirements and refine product where feasible• Customer participates in the above, developing 1 or more models and putting them in production• Customer agrees to be referenced publically; sharing reasonable detail and business impact• Customer agrees to participate in a set of activities that may include: case study, press quote, use
of logo, PR/AR reference call, video profile
Alerts are triggered when certain conditions are met by the results of the search upon which it is based. Alerts can be based on both historical and real-time searches.
When an alert is triggered, it performs an alert action. This action can be the sending of the alert information to a designated set of email addresses, or the posting of the alert information to an RSS feed. Alerts can also be set up to run a custom script when they are triggered.
You can base these alerts on a wide range of threshold and trend-based scenarios.
Custom Alert Actions provide the ability to use Splunk Alerts to trigger custom actions or pre-packaged integrations with 3rd party products such as work order management systems, trouble ticketing or support systems. Splunk and partners provide a growing set of integrations including, ServiceNow, xMatters, Webhooks and more. With custom alert actions you can:
Send message to IM clients (HipChat, Slack)
Send SMS
Automate the creation of tickets (ServiceNow, Jira)
Take action or send events to firewalls, devices, management consoles
Trigger device-level actions (change lights, sounds an alarm, send action to device)
Trigger any organization-specific action (restart application, integrate with homegrown service, and more)
This way you can set alerts on data coming from ICS, SCADA, sensor etc. data and alert operators or trigger actions in third party applications, enabling you to sense anomalous condition in the data and respond to these conditions.
One other consideration is that Splunk is an analytics platform. It doesn’t know that the data is “security” data, or “IT” data, or “web” data. The same data may be used for all kinds of purposes. It’s up to you to decide how the data will be used and who can see it.
One of the barriers to seeing what is possible is preconceptions about what data can be used for based on who and where it is collected. Some of the best creative workshops we’ve had started with everyone sharing what data they are working with and what information it contains, with others realizing that the piece they were missing for their analysis was already accessible it just needed to be loaded into Splunk.
Next we will give some examples on how to think about Splunk in different business oriented ways.