SlideShare uma empresa Scribd logo
1 de 35
Baixar para ler offline
Introduction to 
Machine Learning 
September 2014 Meetup 
Rahul Jain 
@rahuldausa 
Join us @ For Solr, Lucene, Elasticsearch, Machine Learning, IR 
http://www.meetup.com/Hyderabad-Apache-Solr-Lucene-Group/ 
http://www.meetup.com/DataAnalyticsGroup/ 
Join us @ For Hadoop, Spark, Cascading, Scala, NoSQL, Crawlers and all cutting edge technologies. 
http://www.meetup.com/Hyderabad-Programming-Geeks-Group/
Agenda 
• Introduction 
• Basics 
• Classification 
• Clustering 
• Regression 
• Use-Cases 
2
Quick Questionnaire 
How many people have heard about Machine Learning 
How many people know about Machine Learning 
How many people are using Machine Learning
About 
• subfield of Artificial Intelligence (AI) 
• name is derived from the concept that it deals with 
“construction and study of systems that can learn from data” 
• can be seen as building blocks to make computers learn to 
behave more intelligently 
• It is a theoretical concept. There are various techniques with 
various implementations. 
• http://en.wikipedia.org/wiki/Machine_learning
In other words… 
“A computer program is said to learn from 
experience (E) with some class of tasks (T) and a 
performance measure (P) if its performance at tasks 
in T as measured by P improves with E”
Terminology 
• Features 
– The number of features or distinct traits that can be used to describe 
each item in a quantitative manner. 
• Samples 
– A sample is an item to process (e.g. classify). It can be a document, a 
picture, a sound, a video, a row in database or CSV file, or whatever 
you can describe with a fixed set of quantitative traits. 
• Feature vector 
– is an n-dimensional vector of numerical features that represent some 
object. 
• Feature extraction 
– Preparation of feature vector 
– transforms the data in the high-dimensional space to a space of 
fewer dimensions. 
• Training/Evolution set 
– Set of data to discover potentially predictive relationships.
Let’s dig deep into it… 
What do you mean by 
Apple
Learning (Training) 
Features: 
1. Color: Radish/Red 
2. Type : Fruit 
3. Shape 
etc… 
Features: 
1. Sky Blue 
2. Logo 
3. Shape 
etc… 
Features: 
1. Yellow 
2. Fruit 
3. Shape 
etc…
Workflow
Categories 
• Supervised Learning 
• Unsupervised Learning 
• Semi-Supervised Learning 
• Reinforcement Learning
Supervised Learning 
• the correct classes of the training data are 
known 
Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
Unsupervised Learning 
• the correct classes of the training data are not 
known 
Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
Semi-Supervised Learning 
• A Mix of Supervised and Unsupervised learning 
Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
Reinforcement Learning 
• allows the machine or software agent to learn its 
behavior based on feedback from the environment. 
• This behavior can be learnt once and for all, or keep on 
adapting as time goes by. 
Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
Machine Learning Techniques
Techniques 
• classification: predict class from observations 
• clustering: group observations into 
“meaningful” groups 
• regression (prediction): predict value from 
observations
Classification 
• classify a document into a predefined category. 
• documents can be text, images 
• Popular one is Naive Bayes Classifier. 
• Steps: 
– Step1 : Train the program (Building a Model) using a 
training set with a category for e.g. sports, cricket, news, 
– Classifier will compute probability for each word, the 
probability that it makes a document belong to each of 
considered categories 
– Step2 : Test with a test data set against this Model 
• http://en.wikipedia.org/wiki/Naive_Bayes_classifier
Clustering 
• clustering is the task of grouping a set of objects in 
such a way that objects in the same group (called 
a cluster) are more similar to each other 
• objects are not predefined 
• For e.g. these keywords 
– “man’s shoe” 
– “women’s shoe” 
– “women’s t-shirt” 
– “man’s t-shirt” 
– can be cluster into 2 categories “shoe” and “t-shirt” or 
“man” and “women” 
• Popular ones are K-means clustering and Hierarchical 
clustering
K-means Clustering 
• partition n observations into k clusters in which each observation belongs 
to the cluster with the nearest mean, serving as a prototype of the cluster. 
• http://en.wikipedia.org/wiki/K-means_clustering 
http://pypr.sourceforge.net/kmeans.html
Hierarchical clustering 
• method of cluster analysis which seeks to build 
a hierarchy of clusters. 
• There can be two strategies 
– Agglomerative: 
• This is a "bottom up" approach: each observation starts in its own 
cluster, and pairs of clusters are merged as one moves up the 
hierarchy. 
• Time complexity is O(n^3) 
– Divisive: 
• This is a "top down" approach: all observations start in one cluster, 
and splits are performed recursively as one moves down the 
hierarchy. 
• Time complexity is O(2^n) 
• http://en.wikipedia.org/wiki/Hierarchical_clustering
Regression 
• is a measure of the relation between 
the mean value of one variable (e.g. 
output) and corresponding values of 
other variables (e.g. time and cost). 
• regression analysis is a statistical 
process for estimating the 
relationships among variables. 
• Regression means to predict the 
output value using training data. 
• Popular one is Logistic regression 
(binary regression) 
• http://en.wikipedia.org/wiki/Logistic_regression
Classification vs Regression 
• Classification means to 
group the output into 
a class. 
• classification to predict 
the type of tumor i.e. 
harmful or not harmful 
using training data 
• if it is 
discrete/categorical 
variable, then it is 
classification problem 
• Regression means to 
predict the output 
value using training 
data. 
• regression to predict 
the house price from 
training data 
• if it is a real 
number/continuous, 
then it is regression 
problem.
Let’s see the usage in Real life
Use-Cases 
• Spam Email Detection 
• Machine Translation (Language Translation) 
• Image Search (Similarity) 
• Clustering (KMeans) : Amazon 
Recommendations 
• Classification : Google News 
continued…
Use-Cases (contd.) 
• Text Summarization - Google News 
• Rating a Review/Comment: Yelp 
• Fraud detection : Credit card Providers 
• Decision Making : e.g. Bank/Insurance sector 
• Sentiment Analysis 
• Speech Understanding – iPhone with Siri 
• Face Detection – Facebook’s Photo tagging
Classification in Action 
isn’t it easy?
it’s not (Snapshot of Spam folder) 
Not a 
Spam 
Not a 
Spam
NER (Named Entity Recognition) 
http://nlp.stanford.edu:8080/ner/process
Similar/Duplicate Images 
Remember 
Features ? 
(Feature Extraction) 
Can be : 
• Width 
• Height 
• Contrast 
• Brightness 
• Position 
• Hue 
• Colors 
Credit: https://www.google.co.in/ 
Check this : 
LIRE (Lucene Image REtrieval) 
library - 
https://code.google.com/p/lire/
Recommendations 
http://www.webdesignerdepot.com/2009/10/an-analysis-of-the-amazon-shopping-experience/
Popular Frameworks/Tools 
• Weka 
• Carrot2 
• Gate 
• OpenNLP 
• LingPipe 
• Stanford NLP 
• Mallet – Topic Modelling 
• Gensim – Topic Modelling (Python) 
• Apache Mahout 
• MLib – Apache Spark 
• scikit-learn - Python 
• LIBSVM : Support Vector Machines 
• and many more…
Advanced concepts (related to IR) 
• Topic Modelling 
• Latent Dirichlet allocation (LDA) 
• Latent semantic analysis (LSA/LSI) - Semantic 
Search 
• Singular Value Decomposition (SVD) 
• Summarization (without Training)
Solr/Lucene Meetup 
• Case study of Rujhaan.com 
(A social news app ) 
• Saturday, Sep 27, 2014 10:00 AM 
• IIIT Hyderabad 
• URL: http://www.meetup.com/Hyderabad- 
Apache-Solr-Lucene-Group/events/203434032/ 
OR 
• Search on Google … 
Topics of Talk 
 Crawler(Crawler4j) 
 MongoDB 
 Solr 
 Nginx, ApacheTomcat 
 Redis 
 Machine Learning 
1. Classification - Classification 
of News, Tweets - Lingpipe 
2. Clustering, - Similar Items - 
carrot2 (Near Future: Hadoop 
and Apache Spark ) 
3. Summarization - Extracting 
the main text with Automatic 
Summary of article 
4. Topics Extraction from text
Questions ? 
34
Thanks! 
@rahuldausa on twitter and slideshare 
http://www.linkedin.com/in/rahuldausa 
Interested in Search/Information Retrieval ? 
Join us @ http://www.meetup.com/Hyderabad-Apache-Solr-Lucene-Group/ 
35

Mais conteúdo relacionado

Mais procurados

Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Simplilearn
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Simplilearn
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachinePulse
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and RegressionMegha Sharma
 
Machine learning Algorithms
Machine learning AlgorithmsMachine learning Algorithms
Machine learning AlgorithmsWalaa Hamdy Assy
 
Machine Learning
Machine LearningMachine Learning
Machine LearningRahul Kumar
 
Machine Learning
Machine LearningMachine Learning
Machine LearningShrey Malik
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)SwatiTripathi44
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningLior Rokach
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.pptbutest
 

Mais procurados (20)

Machine Learning
Machine LearningMachine Learning
Machine Learning
 
supervised learning
supervised learningsupervised learning
supervised learning
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
machine learning
machine learningmachine learning
machine learning
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Machine learning Algorithms
Machine learning AlgorithmsMachine learning Algorithms
Machine learning Algorithms
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentation
 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
K Nearest Neighbors
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
PAC Learning
PAC LearningPAC Learning
PAC Learning
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 

Destaque

Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...PAPIs.io
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learningTonmoy Bhagawati
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksBICA Labs
 
Introduction to Digital Image Processing Using MATLAB
Introduction to Digital Image Processing Using MATLABIntroduction to Digital Image Processing Using MATLAB
Introduction to Digital Image Processing Using MATLABRay Phan
 
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ..."Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...Edge AI and Vision Alliance
 
Image Processing Basics
Image Processing BasicsImage Processing Basics
Image Processing BasicsNam Le
 
8085 Paper Presentation slides,ppt,microprocessor 8085 ,guide, instruction set
8085 Paper Presentation slides,ppt,microprocessor 8085 ,guide, instruction set8085 Paper Presentation slides,ppt,microprocessor 8085 ,guide, instruction set
8085 Paper Presentation slides,ppt,microprocessor 8085 ,guide, instruction setSaumitra Rukmangad
 
8085 microprocessor architecture ppt
8085 microprocessor architecture ppt8085 microprocessor architecture ppt
8085 microprocessor architecture pptParvesh Gautam
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
Linux command ppt
Linux command pptLinux command ppt
Linux command pptkalyanineve
 
Digital Image Processing
Digital Image ProcessingDigital Image Processing
Digital Image ProcessingSahil Biswas
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications Ahmed_hashmi
 
Stock exchange simple ppt
Stock exchange simple pptStock exchange simple ppt
Stock exchange simple pptAvinash Varun
 

Destaque (20)

Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...
 
Machine Learning and Artificial Intelligence
Machine Learning and Artificial IntelligenceMachine Learning and Artificial Intelligence
Machine Learning and Artificial Intelligence
 
Basic web dev
Basic web devBasic web dev
Basic web dev
 
Machine learning
Machine learningMachine learning
Machine learning
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural Networks
 
Introduction to Digital Image Processing Using MATLAB
Introduction to Digital Image Processing Using MATLABIntroduction to Digital Image Processing Using MATLAB
Introduction to Digital Image Processing Using MATLAB
 
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ..."Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
 
Future technology
Future technologyFuture technology
Future technology
 
Image Processing Basics
Image Processing BasicsImage Processing Basics
Image Processing Basics
 
8085 Paper Presentation slides,ppt,microprocessor 8085 ,guide, instruction set
8085 Paper Presentation slides,ppt,microprocessor 8085 ,guide, instruction set8085 Paper Presentation slides,ppt,microprocessor 8085 ,guide, instruction set
8085 Paper Presentation slides,ppt,microprocessor 8085 ,guide, instruction set
 
8085 microprocessor architecture ppt
8085 microprocessor architecture ppt8085 microprocessor architecture ppt
8085 microprocessor architecture ppt
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Linux command ppt
Linux command pptLinux command ppt
Linux command ppt
 
Digital Image Processing
Digital Image ProcessingDigital Image Processing
Digital Image Processing
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications
 
IoT architecture
IoT architectureIoT architecture
IoT architecture
 
Stock exchange simple ppt
Stock exchange simple pptStock exchange simple ppt
Stock exchange simple ppt
 

Semelhante a Introduction to Machine Learning

Introduction to Machine learning ppt
Introduction to Machine learning pptIntroduction to Machine learning ppt
Introduction to Machine learning pptshubhamshirke12
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Lucidworks
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningJoaquin Delgado PhD.
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningS. Diana Hu
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for EveryoneAly Abdelkareem
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Lucidworks
 
Facets and Pivoting for Flexible and Usable Linked Data Exploration
Facets and Pivoting for Flexible and Usable Linked Data ExplorationFacets and Pivoting for Flexible and Usable Linked Data Exploration
Facets and Pivoting for Flexible and Usable Linked Data ExplorationRoberto García
 
Handout on Object orienetd Analysis and Design
Handout on Object orienetd Analysis and DesignHandout on Object orienetd Analysis and Design
Handout on Object orienetd Analysis and DesignSAFAD ISMAIL
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic WebRoberto García
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Rahul Jain
 
Analyzing a system and specifying the requirements
Analyzing a system and specifying the requirementsAnalyzing a system and specifying the requirements
Analyzing a system and specifying the requirementsvikramgopale2
 
Object Modelling Technique " ooad "
Object Modelling Technique  " ooad "Object Modelling Technique  " ooad "
Object Modelling Technique " ooad "AchrafJbr
 
Machine Learning Innovations
Machine Learning InnovationsMachine Learning Innovations
Machine Learning InnovationsHPCC Systems
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineTrey Grainger
 
Object Oriented Programming
Object Oriented ProgrammingObject Oriented Programming
Object Oriented ProgrammingManish Pandit
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
 

Semelhante a Introduction to Machine Learning (20)

Introduction to Machine learning ppt
Introduction to Machine learning pptIntroduction to Machine learning ppt
Introduction to Machine learning ppt
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
 
Facets and Pivoting for Flexible and Usable Linked Data Exploration
Facets and Pivoting for Flexible and Usable Linked Data ExplorationFacets and Pivoting for Flexible and Usable Linked Data Exploration
Facets and Pivoting for Flexible and Usable Linked Data Exploration
 
Handout on Object orienetd Analysis and Design
Handout on Object orienetd Analysis and DesignHandout on Object orienetd Analysis and Design
Handout on Object orienetd Analysis and Design
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )
 
Analyzing a system and specifying the requirements
Analyzing a system and specifying the requirementsAnalyzing a system and specifying the requirements
Analyzing a system and specifying the requirements
 
Object Modelling Technique " ooad "
Object Modelling Technique  " ooad "Object Modelling Technique  " ooad "
Object Modelling Technique " ooad "
 
Machine Learning Innovations
Machine Learning InnovationsMachine Learning Innovations
Machine Learning Innovations
 
Ooad unit – 1 introduction
Ooad unit – 1 introductionOoad unit – 1 introduction
Ooad unit – 1 introduction
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 
Object Oriented Programming
Object Oriented ProgrammingObject Oriented Programming
Object Oriented Programming
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 

Mais de Rahul Jain

Flipkart Strategy Analysis and Recommendation
Flipkart Strategy Analysis and RecommendationFlipkart Strategy Analysis and Recommendation
Flipkart Strategy Analysis and RecommendationRahul Jain
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataRahul Jain
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrRahul Jain
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
Introduction to Scala
Introduction to ScalaIntroduction to Scala
Introduction to ScalaRahul Jain
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremRahul Jain
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrRahul Jain
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperRahul Jain
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersRahul Jain
 
Hibernate tutorial for beginners
Hibernate tutorial for beginnersHibernate tutorial for beginners
Hibernate tutorial for beginnersRahul Jain
 

Mais de Rahul Jain (14)

Flipkart Strategy Analysis and Recommendation
Flipkart Strategy Analysis and RecommendationFlipkart Strategy Analysis and Recommendation
Flipkart Strategy Analysis and Recommendation
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache Solr
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Introduction to Scala
Introduction to ScalaIntroduction to Scala
Introduction to Scala
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 
Hibernate tutorial for beginners
Hibernate tutorial for beginnersHibernate tutorial for beginners
Hibernate tutorial for beginners
 

Último

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 

Último (20)

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 

Introduction to Machine Learning

  • 1. Introduction to Machine Learning September 2014 Meetup Rahul Jain @rahuldausa Join us @ For Solr, Lucene, Elasticsearch, Machine Learning, IR http://www.meetup.com/Hyderabad-Apache-Solr-Lucene-Group/ http://www.meetup.com/DataAnalyticsGroup/ Join us @ For Hadoop, Spark, Cascading, Scala, NoSQL, Crawlers and all cutting edge technologies. http://www.meetup.com/Hyderabad-Programming-Geeks-Group/
  • 2. Agenda • Introduction • Basics • Classification • Clustering • Regression • Use-Cases 2
  • 3. Quick Questionnaire How many people have heard about Machine Learning How many people know about Machine Learning How many people are using Machine Learning
  • 4. About • subfield of Artificial Intelligence (AI) • name is derived from the concept that it deals with “construction and study of systems that can learn from data” • can be seen as building blocks to make computers learn to behave more intelligently • It is a theoretical concept. There are various techniques with various implementations. • http://en.wikipedia.org/wiki/Machine_learning
  • 5. In other words… “A computer program is said to learn from experience (E) with some class of tasks (T) and a performance measure (P) if its performance at tasks in T as measured by P improves with E”
  • 6. Terminology • Features – The number of features or distinct traits that can be used to describe each item in a quantitative manner. • Samples – A sample is an item to process (e.g. classify). It can be a document, a picture, a sound, a video, a row in database or CSV file, or whatever you can describe with a fixed set of quantitative traits. • Feature vector – is an n-dimensional vector of numerical features that represent some object. • Feature extraction – Preparation of feature vector – transforms the data in the high-dimensional space to a space of fewer dimensions. • Training/Evolution set – Set of data to discover potentially predictive relationships.
  • 7. Let’s dig deep into it… What do you mean by Apple
  • 8. Learning (Training) Features: 1. Color: Radish/Red 2. Type : Fruit 3. Shape etc… Features: 1. Sky Blue 2. Logo 3. Shape etc… Features: 1. Yellow 2. Fruit 3. Shape etc…
  • 10. Categories • Supervised Learning • Unsupervised Learning • Semi-Supervised Learning • Reinforcement Learning
  • 11. Supervised Learning • the correct classes of the training data are known Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
  • 12. Unsupervised Learning • the correct classes of the training data are not known Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
  • 13. Semi-Supervised Learning • A Mix of Supervised and Unsupervised learning Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
  • 14. Reinforcement Learning • allows the machine or software agent to learn its behavior based on feedback from the environment. • This behavior can be learnt once and for all, or keep on adapting as time goes by. Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
  • 16. Techniques • classification: predict class from observations • clustering: group observations into “meaningful” groups • regression (prediction): predict value from observations
  • 17. Classification • classify a document into a predefined category. • documents can be text, images • Popular one is Naive Bayes Classifier. • Steps: – Step1 : Train the program (Building a Model) using a training set with a category for e.g. sports, cricket, news, – Classifier will compute probability for each word, the probability that it makes a document belong to each of considered categories – Step2 : Test with a test data set against this Model • http://en.wikipedia.org/wiki/Naive_Bayes_classifier
  • 18. Clustering • clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other • objects are not predefined • For e.g. these keywords – “man’s shoe” – “women’s shoe” – “women’s t-shirt” – “man’s t-shirt” – can be cluster into 2 categories “shoe” and “t-shirt” or “man” and “women” • Popular ones are K-means clustering and Hierarchical clustering
  • 19. K-means Clustering • partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. • http://en.wikipedia.org/wiki/K-means_clustering http://pypr.sourceforge.net/kmeans.html
  • 20. Hierarchical clustering • method of cluster analysis which seeks to build a hierarchy of clusters. • There can be two strategies – Agglomerative: • This is a "bottom up" approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. • Time complexity is O(n^3) – Divisive: • This is a "top down" approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy. • Time complexity is O(2^n) • http://en.wikipedia.org/wiki/Hierarchical_clustering
  • 21. Regression • is a measure of the relation between the mean value of one variable (e.g. output) and corresponding values of other variables (e.g. time and cost). • regression analysis is a statistical process for estimating the relationships among variables. • Regression means to predict the output value using training data. • Popular one is Logistic regression (binary regression) • http://en.wikipedia.org/wiki/Logistic_regression
  • 22. Classification vs Regression • Classification means to group the output into a class. • classification to predict the type of tumor i.e. harmful or not harmful using training data • if it is discrete/categorical variable, then it is classification problem • Regression means to predict the output value using training data. • regression to predict the house price from training data • if it is a real number/continuous, then it is regression problem.
  • 23. Let’s see the usage in Real life
  • 24. Use-Cases • Spam Email Detection • Machine Translation (Language Translation) • Image Search (Similarity) • Clustering (KMeans) : Amazon Recommendations • Classification : Google News continued…
  • 25. Use-Cases (contd.) • Text Summarization - Google News • Rating a Review/Comment: Yelp • Fraud detection : Credit card Providers • Decision Making : e.g. Bank/Insurance sector • Sentiment Analysis • Speech Understanding – iPhone with Siri • Face Detection – Facebook’s Photo tagging
  • 26. Classification in Action isn’t it easy?
  • 27. it’s not (Snapshot of Spam folder) Not a Spam Not a Spam
  • 28. NER (Named Entity Recognition) http://nlp.stanford.edu:8080/ner/process
  • 29. Similar/Duplicate Images Remember Features ? (Feature Extraction) Can be : • Width • Height • Contrast • Brightness • Position • Hue • Colors Credit: https://www.google.co.in/ Check this : LIRE (Lucene Image REtrieval) library - https://code.google.com/p/lire/
  • 31. Popular Frameworks/Tools • Weka • Carrot2 • Gate • OpenNLP • LingPipe • Stanford NLP • Mallet – Topic Modelling • Gensim – Topic Modelling (Python) • Apache Mahout • MLib – Apache Spark • scikit-learn - Python • LIBSVM : Support Vector Machines • and many more…
  • 32. Advanced concepts (related to IR) • Topic Modelling • Latent Dirichlet allocation (LDA) • Latent semantic analysis (LSA/LSI) - Semantic Search • Singular Value Decomposition (SVD) • Summarization (without Training)
  • 33. Solr/Lucene Meetup • Case study of Rujhaan.com (A social news app ) • Saturday, Sep 27, 2014 10:00 AM • IIIT Hyderabad • URL: http://www.meetup.com/Hyderabad- Apache-Solr-Lucene-Group/events/203434032/ OR • Search on Google … Topics of Talk  Crawler(Crawler4j)  MongoDB  Solr  Nginx, ApacheTomcat  Redis  Machine Learning 1. Classification - Classification of News, Tweets - Lingpipe 2. Clustering, - Similar Items - carrot2 (Near Future: Hadoop and Apache Spark ) 3. Summarization - Extracting the main text with Automatic Summary of article 4. Topics Extraction from text
  • 35. Thanks! @rahuldausa on twitter and slideshare http://www.linkedin.com/in/rahuldausa Interested in Search/Information Retrieval ? Join us @ http://www.meetup.com/Hyderabad-Apache-Solr-Lucene-Group/ 35