SlideShare uma empresa Scribd logo
1 de 36
Baixar para ler offline
(Deep) Neural Network& Text Mining 
Piji Li 
lipiji.pz@gmail.com
•Deep Learning 
-Story 
•DL for NLP & Text Mining 
-Words 
-Sentences 
-Documents 
9/3/2014 2 
lipiji.pz@gmail.comOutline
•Deep Learning 
-Story 
•DL for NLP & Text Mining 
-Words 
-Sentences 
-Documents 
9/3/2014 3 
lipiji.pz@gmail.comOutline
9/3/2014 4 
lipiji.pz@gmail.comBP 
D.E. Rumelhart, G.E. Hinton, R.J. Williams 
Learning representation by back-propagating errors. Nature, 323 (1986), pp. 533–536 
Solved learning problem 
Biological system 
… 
Hard to train (non-convex, tricks) 
Hard to do theoretical analysis 
Small training sets... 
1986
9/3/2014 5 
lipiji.pz@gmail.comDBN 
Hinton,Geoffrey,SimonOsindero,andYee-WhyeTeh."Afastlearningalgorithmfordeepbeliefnets."Neuralcomputation18,no.7(2006):1527-1554. 
Hinton,GeoffreyE.,andRuslanR.Salakhutdinov. "Reducingthedimensionalityofdatawithneuralnetworks."Science313,no.5786(2006):504-507. 
2006 
•Unsupervised layer-wised pre-training 
•Supervised fine-tuning 
•Feature learning & distributed representations 
•Large scale datasets 
•Tricks (learning rate, mini-batch size, momentum, etc.) 
•https://github.com/lipiji/PG_DEEP
9/3/2014 6 
lipiji.pz@gmail.comCD-DNN-HMM 
Dahl,GeorgeE.,DongYu,LiDeng,andAlexAcero. "Context-dependentpre-traineddeepneuralnetworksforlarge-vocabularyspeechrecognition."Audio, Speech,andLanguageProcessing,IEEETransactionson20,no.1(2012):30-42. 
Winnerofthe2013IEEESignalProcessingSocietyBestPaperAward 
2011 
PhD candidate of Prof. Hinton, intern at MSR 
GPU
9/3/2014 7 
lipiji.pz@gmail.comImageNet Classification 
0 
0.05 
0.1 
0.15 
0.2 
0.25 
0.3 
0.35 
0.4 
ImageNet Classification Error 
2012 
2013 
2014 
Deeper 
Network in Network 
Deep 
DNN 
First Blood 
http://image-net.org/ 
•1000categories and 1.2million training images 
Li Fei-Fei: ImageNet Large Scale Visual Recognition Challenge, 2014
9/3/2014 8 
lipiji.pz@gmail.comNowadays 
People: Geoffrey Hinton, Yann LeCun, YoshuaBengio, Andrew Ng 
Industry: Google, Baidu, Facebook, IBM 
Paper: ICCV CVPR ECCV; ICML, NIPS, AAAI, ICLR; ACL, EMNLP, COLING 
Startup Companies using (Deep) Machine Learning:
•Deep Learning 
-Story 
•DL for NLP & Text Mining 
-Words 
-Sentences 
-Documents 
9/3/2014 9 
lipiji.pz@gmail.comOutline
•Distributed Representation 
-A Neural Probabilistic Language Model 
-Word2Vec 
•Application 
-Semantic Hierarchies of Words 
-Machine Translation 
9/3/2014 10 
lipiji.pz@gmail.comNN & Words 
V(king) –V(queen) + V(woman) ≈ V(man)
•Bengio,Yoshua,RéjeanDucharme,PascalVincent,andChristianJauvin."ANeuralProbabilisticLanguageModel."JournalofMachineLearningResearch3(2003):1137-1155. 
9/3/2014 11 
lipiji.pz@gmail.comNeural Probabilistic Language Model 
H, d 
U, b 
W, b 
|V| 
concatenate 
-Maximizes the penalized log-likelihood: 
-Sofmax 
-where 
-Training: BP 
Better than n-gram model 
1, Word vector; 
2, Need not soothing, p(w|context) ~(0,1) 
Time-Consuming
•Mikolov,Tomas,KaiChen,GregCorrado,andJeffreyDean. "Efficientestimationofwordrepresentationsinvectorspace."ICLR(2013). 
-Largeimprovementsinaccuracy,lowercomputationalcost. 
-Ittakeslessthanadaytotrainfrom1.6billionwordsdataset. 
9/3/2014 12 
lipiji.pz@gmail.comWord2Vec
•Morin,Frederic,andYoshuaBengio."Hierarchicalprobabilisticneuralnetworklanguagemodel."InAISTATS,vol.5,pp.246-252.2005. 
•Mnih,Andriy,andGeoffreyE.Hinton."Ascalablehierarchicaldistributedlanguagemodel."InAdvancesinneuralinformationprocessingsystems,pp. 1081-1088.2009. 
9/3/2014 13 
lipiji.pz@gmail.comWord2Vec -Hierarchical Softmax 
5 
3 
2 
2 
1 
휃2 
휃1 
푤3 
푤2 
푤1 
•Hierarchical Strategies: 
O(|V|)O(log|V|) 
•Word2Vec: 
Huffman Tree -short codes to frequent words 
Time-Consuming
9/3/2014 14 
lipiji.pz@gmail.comWord2vec -CBOW 
•Log-likelihood 
Blog: http://blog.csdn.net/itplus/article/details/37969979 
•Condition Probability 
•SGD 
Hierarchical Softmax
9/3/2014 15 
lipiji.pz@gmail.comWord2vec –Skip-gram
9/3/2014 16 
lipiji.pz@gmail.comCBOW vs Skip-gram 
Semantic Relation 
Syntactic Relation
9/3/2014 17 
lipiji.pz@gmail.comWord2Vec -Negative Sampling 
Mikolov,Tomas,etal."Distributedrepresentationsofwordsandphrasesandtheircompositionality."NIPS(2013). 
CBOW&Skip-gram 
Blog: http://blog.csdn.net/itplus/article/details/37969979 
Phrase Skip-Gram Results 
Negative Subsampling: 
Subsampling:
9/3/2014 18 
lipiji.pz@gmail.comWord2Vec Properties 
•Vector spaces 
-Mapping different vector spaces 
-Machine Translation 
-… 
•Embedding offsets 
-V(king) –V(queen) + V(woman) ≈ V(man) 
-Complicated Semantic Relations: Hierarchies 
-… 
•?
9/3/2014 19 
lipiji.pz@gmail.comWord2Vec Properties 
•Vector spaces 
-Mapping different vector spaces 
-Machine Translation 
-… 
•Embedding offsets 
-V(king) –V(queen) + V(woman) ≈ V(man) 
-Complicated Semantic Relations: Hierarchies 
-… 
•?
•Mikolov, Tomas, QuocV. Le, and IlyaSutskever. "Exploiting similarities among languages for machine translation."arXivpreprint arXiv:1309.4168(2013). 
9/3/2014 20 
lipiji.pz@gmail.comMachine Translation 
Translation Matrix 
English (left) Spanish (right). 
•Devlin,Jacob,RabihZbib,ZhongqiangHuang,ThomasLamar,RichardSchwartz,andJohnMakhoul."FastandRobustNeuralNetworkJointModelsforStatisticalMachineTranslation."ACL2014BestLongPaper.
9/3/2014 21 
lipiji.pz@gmail.comWord2Vec Properties 
•Vector spaces 
-Mapping different vector spaces 
-Machine Translation 
-… 
•Embedding offsets 
-V(king) –V(queen) + V(woman) ≈ V(man) 
-Complicated Semantic Relations: Hierarchies 
-… 
•?
9/3/2014 22 
lipiji.pz@gmail.comSemantic Hierarchies 
•Fu,Ruiji,JiangGuo,BingQin,WanxiangChe,HaifengWang,andTingLiu. "Learningsemantichierarchiesviawordembeddings."ACL,2014. 
hypernym–hyponym 
More Relations? 
Clustering y -x, and then for each cluster:
•Zeng,Daojian,KangLiu,SiweiLai,GuangyouZhou,andJunZhao."RelationClassificationviaConvolutionalDeepNeuralNetwork.“COLING2014BestPaper 
9/3/2014 23 
lipiji.pz@gmail.comRelation Classification 
Cause-Effect relationship: “The [fire]e1 inside WTC was caused by exploding [fuel]e2” 
Steps: 1, word vector; 2, lexical level features; 3, sentences features via CNN; 
4, Concatenate features and feed into softmax classifier 
SENNA
•Blunsom, Phil, Edward Grefenstette, and NalKalchbrenner. "A Convolutional Neural Network for Modelling Sentences." ACL 2014. 
•Hermann, Karl Moritz, and Phil Blunsom. "Multilingual Models for Compositional Distributed Semantics."arXivpreprint arXiv:1404.4641(2014). 
•Hermann, Karl Moritz, and Phil Blunsom. "Multilingual Distributed Representations without Word Alignment."arXivpreprint arXiv: 1312.6173(2013). 
•Kim, Yoon. "Convolutional Neural Networks for Sentence Classification. “ arxiv: 2014 
•Le, QuocV., and Tomas Mikolov. "Distributed Representations of Sentences and Documents."ICML(2014). 
9/3/2014 24 
lipiji.pz@gmail.comNN & Sentences 
PhilBlunsom 
Oxford 
http://www.cs.ox.ac.uk/people/phil.blunsom/
•Best Results in Computer Vision (LeNet5, ImageNet 2012~now) 
•Shared Weight: Convolutional Filter Feature Map 
•* Invariance: pooling (mean-, max-, stochastic) 
9/3/2014 25 
lipiji.pz@gmail.comConvolutional Neural Networks 
http://yann.lecun.com/exdb/lenet/ 
C1: (5*5+1)*6=156 parameters; 
156*(28*28)=122,304 connections 
Convolutional 
Windows in Sentences
•Blunsom, Phil, Edward Grefenstette, and NalKalchbrenner. "A Convolutional Neural Network for Modelling Sentences." ACL 2014. 
9/3/2014 26 
lipiji.pz@gmail.comCNN for Modelling Sentences 
•Supervisor depends on tasks 
•Dynamic k-Max Pooling 
•Folding 
-Sums every two rows 
•Multiple Feature Maps 
-Different features 
-Word order, n-gram 
-Sentence Feature 
-Many tasks 
•Word vector: learning
•Kim, Yoon. "Convolutional Neural Networks for Sentence Classification. “ arxiv: 2014 
9/3/2014 27 
lipiji.pz@gmail.comCNN for Sentence Classification 
New: Multi-Channel 
A model with two sets of word vectors. Each set of vectors is treated 
as a ‘channel’ and each filter is applied to both channels. 
word2vec
•Hermann, Karl Moritz, and Phil Blunsom. "Multilingual Models for Compositional Distributed Semantics."arXivpreprint arXiv:1404.4641(2014). 
9/3/2014 28 
lipiji.pz@gmail.comMultilingual Models 
Sentences 
Documents 
Word vector: learning 
Objection: 
CVM: SUM or
•Le, QuocV., and Tomas Mikolov. "Distributed Representations of Sentences and Documents."ICML(2014). 
9/3/2014 29 
lipiji.pz@gmail.comWord2Vec Extension 
Sentiment Analysisand Document Classification
9/3/2014 30 
lipiji.pz@gmail.comSENNA 
Collobert,Ronan,JasonWeston,LéonBottou,MichaelKarlen,KorayKavukcuoglu,andPavelKuksa."Naturallanguageprocessing(almost) fromscratch."TheJournalofMachineLearningResearch12(2011) 
Word embeddings 
ConvNet
•Le, QuocV., and Tomas Mikolov. "Distributed Representations of Sentences and Documents."ICML(2014). 
•Denil, Misha, Alban Demiraj, NalKalchbrenner, Phil Blunsom, and Nandode Freitas. "Modelling, Visualisingand SummarisingDocuments with a Single Convolutional Neural Network."arXivpreprint arXiv:1406.3830(2014). 
9/3/2014 31 
lipiji.pz@gmail.comNN & Documents 
Layer1 
Layer4 
Layer5 
Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." ECCV 2014 
How about Text? 
-Word salience? –word/entity/topic extraction 
-Sentence salience?
9/3/2014 32 
lipiji.pz@gmail.comWeb Search | Learning to Rank 
•Huang,Po-Sen,XiaodongHe,JianfengGao,LiDeng,AlexAcero,andLarryHeck. "Learningdeepstructuredsemanticmodelsforwebsearchusingclickthroughdata." CIKM,2013.
9/3/2014 33 
lipiji.pz@gmail.comNN & Topic Model 
•Wan,Li,LeoZhu,andRobFergus."Ahybridneuralnetwork-latenttopicmodel."ICAIS.2012. 
•Larochelle,Hugo,andStanislasLauly."Aneuralautoregressivetopicmodel." NIPS.2012.
9/3/2014 34 
lipiji.pz@gmail.comHybrid NN & Topic Model 
•Wan,Li,LeoZhu,andRobFergus."Ahybridneuralnetwork-latenttopicmodel."ICAIS.2012.
9/3/2014 35 
lipiji.pz@gmail.comTuning 
•#layers,#hiddenunits 
•Learningrate 
•Momentum 
•Weightdecay 
•Mini-bachsize 
•… 
TO BE A GOOD TUNER
9/3/2014 36 
lipiji.pz@gmail.com 
Thanks! 
Q&A

Mais conteúdo relacionado

Mais procurados

Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPMachine Learning Prague
 
Word Embedding to Document distances
Word Embedding to Document distancesWord Embedding to Document distances
Word Embedding to Document distancesGanesh Borle
 
Using Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalBhaskar Mitra
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information RetrievalRoelof Pieters
 
(Kpi summer school 2015) word embeddings and neural language modeling
(Kpi summer school 2015) word embeddings and neural language modeling(Kpi summer school 2015) word embeddings and neural language modeling
(Kpi summer school 2015) word embeddings and neural language modelingSerhii Havrylov
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector spaceAbdullah Khan Zehady
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageRoelof Pieters
 
Text Mining for Lexicography
Text Mining for LexicographyText Mining for Lexicography
Text Mining for LexicographyLeiden University
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryRoelof Pieters
 
Vectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchVectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchBhaskar Mitra
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsRoelof Pieters
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - IntroductionChristian Perone
 
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopiwan_rg
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Saurabh Kaushik
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsRoelof Pieters
 
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddingsRoelof Pieters
 

Mais procurados (20)

What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Word Embedding to Document distances
Word Embedding to Document distancesWord Embedding to Document distances
Word Embedding to Document distances
 
Using Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information Retrieval
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
 
(Kpi summer school 2015) word embeddings and neural language modeling
(Kpi summer school 2015) word embeddings and neural language modeling(Kpi summer school 2015) word embeddings and neural language modeling
(Kpi summer school 2015) word embeddings and neural language modeling
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
Thai Word Embedding with Tensorflow
Thai Word Embedding with Tensorflow Thai Word Embedding with Tensorflow
Thai Word Embedding with Tensorflow
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
 
Text Mining for Lexicography
Text Mining for LexicographyText Mining for Lexicography
Text Mining for Lexicography
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionary
 
Vectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchVectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for Search
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - Introduction
 
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
 
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddings
 
Word Embedding In IR
Word Embedding In IRWord Embedding In IR
Word Embedding In IR
 

Semelhante a (Deep) Neural Networks在 NLP 和 Text Mining 总结

Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for RetrievalBhaskar Mitra
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2Karthik Murugesan
 
Should we be afraid of Transformers?
Should we be afraid of Transformers?Should we be afraid of Transformers?
Should we be afraid of Transformers?Dominik Seisser
 
Hacking Human Language (PyData London)
Hacking Human Language (PyData London)Hacking Human Language (PyData London)
Hacking Human Language (PyData London)hen_drik
 
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Ana Marasović
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?Paul Groth
 
2015-SemEval2015_poster
2015-SemEval2015_poster2015-SemEval2015_poster
2015-SemEval2015_posterhpcosta
 
Towards Modelling Language Innovation Acceptance in Online Social Networks
Towards Modelling Language Innovation Acceptance in Online Social NetworksTowards Modelling Language Innovation Acceptance in Online Social Networks
Towards Modelling Language Innovation Acceptance in Online Social NetworksDaniel Kershaw
 
Lean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicLean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicValeria de Paiva
 
Big Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingBig Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingNa'im Tyson
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...Seth Grimes
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Seth Grimes
 
Convolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classificationConvolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classificationYunchao He
 
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked DataEnrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked DataHang Dong
 
Subword and spatiotemporal models for identifying actionable information in ...
Subword and spatiotemporal models for identifying actionable information in ...Subword and spatiotemporal models for identifying actionable information in ...
Subword and spatiotemporal models for identifying actionable information in ...Robert Munro
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2Yuriy Guts
 

Semelhante a (Deep) Neural Networks在 NLP 和 Text Mining 总结 (20)

Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 
Should we be afraid of Transformers?
Should we be afraid of Transformers?Should we be afraid of Transformers?
Should we be afraid of Transformers?
 
Hacking Human Language (PyData London)
Hacking Human Language (PyData London)Hacking Human Language (PyData London)
Hacking Human Language (PyData London)
 
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?
 
Our World is Socio-technical
Our World is Socio-technicalOur World is Socio-technical
Our World is Socio-technical
 
2015-SemEval2015_poster
2015-SemEval2015_poster2015-SemEval2015_poster
2015-SemEval2015_poster
 
Towards Modelling Language Innovation Acceptance in Online Social Networks
Towards Modelling Language Innovation Acceptance in Online Social NetworksTowards Modelling Language Innovation Acceptance in Online Social Networks
Towards Modelling Language Innovation Acceptance in Online Social Networks
 
Lean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicLean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural Logic
 
Big Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingBig Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic Processing
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
 
Dcnn for text
Dcnn for textDcnn for text
Dcnn for text
 
Convolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classificationConvolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classification
 
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked DataEnrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
 
Topics Modeling
Topics ModelingTopics Modeling
Topics Modeling
 
Subword and spatiotemporal models for identifying actionable information in ...
Subword and spatiotemporal models for identifying actionable information in ...Subword and spatiotemporal models for identifying actionable information in ...
Subword and spatiotemporal models for identifying actionable information in ...
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2
 
Semantic, Cognitive and Perceptual Computing -Deep learning
Semantic, Cognitive and Perceptual Computing -Deep learning Semantic, Cognitive and Perceptual Computing -Deep learning
Semantic, Cognitive and Perceptual Computing -Deep learning
 

Último

Benefits of Fiber Internet vs. Traditional Internet.pptx
Benefits of Fiber Internet vs. Traditional Internet.pptxBenefits of Fiber Internet vs. Traditional Internet.pptx
Benefits of Fiber Internet vs. Traditional Internet.pptxlibertyuae uae
 
如何办理朴茨茅斯大学毕业证书学位证书成绩单?
如何办理朴茨茅斯大学毕业证书学位证书成绩单?如何办理朴茨茅斯大学毕业证书学位证书成绩单?
如何办理朴茨茅斯大学毕业证书学位证书成绩单?krc0yvm5
 
Section 3 - Technical Sales Foundations for IBM QRadar for Cloud (QRoC)V1 P10...
Section 3 - Technical Sales Foundations for IBM QRadar for Cloud (QRoC)V1 P10...Section 3 - Technical Sales Foundations for IBM QRadar for Cloud (QRoC)V1 P10...
Section 3 - Technical Sales Foundations for IBM QRadar for Cloud (QRoC)V1 P10...hasimatwork
 
APNIC Update and RIR Policies for ccTLDs, presented at APTLD 85
APNIC Update and RIR Policies for ccTLDs, presented at APTLD 85APNIC Update and RIR Policies for ccTLDs, presented at APTLD 85
APNIC Update and RIR Policies for ccTLDs, presented at APTLD 85APNIC
 
Google-Next-Madrid-BBVA-Research inv.pdf
Google-Next-Madrid-BBVA-Research inv.pdfGoogle-Next-Madrid-BBVA-Research inv.pdf
Google-Next-Madrid-BBVA-Research inv.pdfMaria Adalfio
 
SQL Server on Azure VM datasheet.dsadaspptx
SQL Server on Azure VM datasheet.dsadaspptxSQL Server on Azure VM datasheet.dsadaspptx
SQL Server on Azure VM datasheet.dsadaspptxJustineGarcia32
 
Mary Meeker Internet Trends Report for 2019
Mary Meeker Internet Trends Report for 2019Mary Meeker Internet Trends Report for 2019
Mary Meeker Internet Trends Report for 2019Eric Johnson
 
Generalities about NFT , as a new technology
Generalities about NFT , as a new technologyGeneralities about NFT , as a new technology
Generalities about NFT , as a new technologysoufianbouktaib1
 
overview of Virtualization, concept of Virtualization
overview of Virtualization, concept of Virtualizationoverview of Virtualization, concept of Virtualization
overview of Virtualization, concept of VirtualizationRajan yadav
 
Tungsten Webinar: v6 & v7 Release Recap, and Beyond
Tungsten Webinar: v6 & v7 Release Recap, and BeyondTungsten Webinar: v6 & v7 Release Recap, and Beyond
Tungsten Webinar: v6 & v7 Release Recap, and BeyondContinuent
 

Último (10)

Benefits of Fiber Internet vs. Traditional Internet.pptx
Benefits of Fiber Internet vs. Traditional Internet.pptxBenefits of Fiber Internet vs. Traditional Internet.pptx
Benefits of Fiber Internet vs. Traditional Internet.pptx
 
如何办理朴茨茅斯大学毕业证书学位证书成绩单?
如何办理朴茨茅斯大学毕业证书学位证书成绩单?如何办理朴茨茅斯大学毕业证书学位证书成绩单?
如何办理朴茨茅斯大学毕业证书学位证书成绩单?
 
Section 3 - Technical Sales Foundations for IBM QRadar for Cloud (QRoC)V1 P10...
Section 3 - Technical Sales Foundations for IBM QRadar for Cloud (QRoC)V1 P10...Section 3 - Technical Sales Foundations for IBM QRadar for Cloud (QRoC)V1 P10...
Section 3 - Technical Sales Foundations for IBM QRadar for Cloud (QRoC)V1 P10...
 
APNIC Update and RIR Policies for ccTLDs, presented at APTLD 85
APNIC Update and RIR Policies for ccTLDs, presented at APTLD 85APNIC Update and RIR Policies for ccTLDs, presented at APTLD 85
APNIC Update and RIR Policies for ccTLDs, presented at APTLD 85
 
Google-Next-Madrid-BBVA-Research inv.pdf
Google-Next-Madrid-BBVA-Research inv.pdfGoogle-Next-Madrid-BBVA-Research inv.pdf
Google-Next-Madrid-BBVA-Research inv.pdf
 
SQL Server on Azure VM datasheet.dsadaspptx
SQL Server on Azure VM datasheet.dsadaspptxSQL Server on Azure VM datasheet.dsadaspptx
SQL Server on Azure VM datasheet.dsadaspptx
 
Mary Meeker Internet Trends Report for 2019
Mary Meeker Internet Trends Report for 2019Mary Meeker Internet Trends Report for 2019
Mary Meeker Internet Trends Report for 2019
 
Generalities about NFT , as a new technology
Generalities about NFT , as a new technologyGeneralities about NFT , as a new technology
Generalities about NFT , as a new technology
 
overview of Virtualization, concept of Virtualization
overview of Virtualization, concept of Virtualizationoverview of Virtualization, concept of Virtualization
overview of Virtualization, concept of Virtualization
 
Tungsten Webinar: v6 & v7 Release Recap, and Beyond
Tungsten Webinar: v6 & v7 Release Recap, and BeyondTungsten Webinar: v6 & v7 Release Recap, and Beyond
Tungsten Webinar: v6 & v7 Release Recap, and Beyond
 

(Deep) Neural Networks在 NLP 和 Text Mining 总结

  • 1. (Deep) Neural Network& Text Mining Piji Li lipiji.pz@gmail.com
  • 2. •Deep Learning -Story •DL for NLP & Text Mining -Words -Sentences -Documents 9/3/2014 2 lipiji.pz@gmail.comOutline
  • 3. •Deep Learning -Story •DL for NLP & Text Mining -Words -Sentences -Documents 9/3/2014 3 lipiji.pz@gmail.comOutline
  • 4. 9/3/2014 4 lipiji.pz@gmail.comBP D.E. Rumelhart, G.E. Hinton, R.J. Williams Learning representation by back-propagating errors. Nature, 323 (1986), pp. 533–536 Solved learning problem Biological system … Hard to train (non-convex, tricks) Hard to do theoretical analysis Small training sets... 1986
  • 5. 9/3/2014 5 lipiji.pz@gmail.comDBN Hinton,Geoffrey,SimonOsindero,andYee-WhyeTeh."Afastlearningalgorithmfordeepbeliefnets."Neuralcomputation18,no.7(2006):1527-1554. Hinton,GeoffreyE.,andRuslanR.Salakhutdinov. "Reducingthedimensionalityofdatawithneuralnetworks."Science313,no.5786(2006):504-507. 2006 •Unsupervised layer-wised pre-training •Supervised fine-tuning •Feature learning & distributed representations •Large scale datasets •Tricks (learning rate, mini-batch size, momentum, etc.) •https://github.com/lipiji/PG_DEEP
  • 6. 9/3/2014 6 lipiji.pz@gmail.comCD-DNN-HMM Dahl,GeorgeE.,DongYu,LiDeng,andAlexAcero. "Context-dependentpre-traineddeepneuralnetworksforlarge-vocabularyspeechrecognition."Audio, Speech,andLanguageProcessing,IEEETransactionson20,no.1(2012):30-42. Winnerofthe2013IEEESignalProcessingSocietyBestPaperAward 2011 PhD candidate of Prof. Hinton, intern at MSR GPU
  • 7. 9/3/2014 7 lipiji.pz@gmail.comImageNet Classification 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 ImageNet Classification Error 2012 2013 2014 Deeper Network in Network Deep DNN First Blood http://image-net.org/ •1000categories and 1.2million training images Li Fei-Fei: ImageNet Large Scale Visual Recognition Challenge, 2014
  • 8. 9/3/2014 8 lipiji.pz@gmail.comNowadays People: Geoffrey Hinton, Yann LeCun, YoshuaBengio, Andrew Ng Industry: Google, Baidu, Facebook, IBM Paper: ICCV CVPR ECCV; ICML, NIPS, AAAI, ICLR; ACL, EMNLP, COLING Startup Companies using (Deep) Machine Learning:
  • 9. •Deep Learning -Story •DL for NLP & Text Mining -Words -Sentences -Documents 9/3/2014 9 lipiji.pz@gmail.comOutline
  • 10. •Distributed Representation -A Neural Probabilistic Language Model -Word2Vec •Application -Semantic Hierarchies of Words -Machine Translation 9/3/2014 10 lipiji.pz@gmail.comNN & Words V(king) –V(queen) + V(woman) ≈ V(man)
  • 11. •Bengio,Yoshua,RéjeanDucharme,PascalVincent,andChristianJauvin."ANeuralProbabilisticLanguageModel."JournalofMachineLearningResearch3(2003):1137-1155. 9/3/2014 11 lipiji.pz@gmail.comNeural Probabilistic Language Model H, d U, b W, b |V| concatenate -Maximizes the penalized log-likelihood: -Sofmax -where -Training: BP Better than n-gram model 1, Word vector; 2, Need not soothing, p(w|context) ~(0,1) Time-Consuming
  • 13. •Morin,Frederic,andYoshuaBengio."Hierarchicalprobabilisticneuralnetworklanguagemodel."InAISTATS,vol.5,pp.246-252.2005. •Mnih,Andriy,andGeoffreyE.Hinton."Ascalablehierarchicaldistributedlanguagemodel."InAdvancesinneuralinformationprocessingsystems,pp. 1081-1088.2009. 9/3/2014 13 lipiji.pz@gmail.comWord2Vec -Hierarchical Softmax 5 3 2 2 1 휃2 휃1 푤3 푤2 푤1 •Hierarchical Strategies: O(|V|)O(log|V|) •Word2Vec: Huffman Tree -short codes to frequent words Time-Consuming
  • 14. 9/3/2014 14 lipiji.pz@gmail.comWord2vec -CBOW •Log-likelihood Blog: http://blog.csdn.net/itplus/article/details/37969979 •Condition Probability •SGD Hierarchical Softmax
  • 16. 9/3/2014 16 lipiji.pz@gmail.comCBOW vs Skip-gram Semantic Relation Syntactic Relation
  • 17. 9/3/2014 17 lipiji.pz@gmail.comWord2Vec -Negative Sampling Mikolov,Tomas,etal."Distributedrepresentationsofwordsandphrasesandtheircompositionality."NIPS(2013). CBOW&Skip-gram Blog: http://blog.csdn.net/itplus/article/details/37969979 Phrase Skip-Gram Results Negative Subsampling: Subsampling:
  • 18. 9/3/2014 18 lipiji.pz@gmail.comWord2Vec Properties •Vector spaces -Mapping different vector spaces -Machine Translation -… •Embedding offsets -V(king) –V(queen) + V(woman) ≈ V(man) -Complicated Semantic Relations: Hierarchies -… •?
  • 19. 9/3/2014 19 lipiji.pz@gmail.comWord2Vec Properties •Vector spaces -Mapping different vector spaces -Machine Translation -… •Embedding offsets -V(king) –V(queen) + V(woman) ≈ V(man) -Complicated Semantic Relations: Hierarchies -… •?
  • 20. •Mikolov, Tomas, QuocV. Le, and IlyaSutskever. "Exploiting similarities among languages for machine translation."arXivpreprint arXiv:1309.4168(2013). 9/3/2014 20 lipiji.pz@gmail.comMachine Translation Translation Matrix English (left) Spanish (right). •Devlin,Jacob,RabihZbib,ZhongqiangHuang,ThomasLamar,RichardSchwartz,andJohnMakhoul."FastandRobustNeuralNetworkJointModelsforStatisticalMachineTranslation."ACL2014BestLongPaper.
  • 21. 9/3/2014 21 lipiji.pz@gmail.comWord2Vec Properties •Vector spaces -Mapping different vector spaces -Machine Translation -… •Embedding offsets -V(king) –V(queen) + V(woman) ≈ V(man) -Complicated Semantic Relations: Hierarchies -… •?
  • 22. 9/3/2014 22 lipiji.pz@gmail.comSemantic Hierarchies •Fu,Ruiji,JiangGuo,BingQin,WanxiangChe,HaifengWang,andTingLiu. "Learningsemantichierarchiesviawordembeddings."ACL,2014. hypernym–hyponym More Relations? Clustering y -x, and then for each cluster:
  • 23. •Zeng,Daojian,KangLiu,SiweiLai,GuangyouZhou,andJunZhao."RelationClassificationviaConvolutionalDeepNeuralNetwork.“COLING2014BestPaper 9/3/2014 23 lipiji.pz@gmail.comRelation Classification Cause-Effect relationship: “The [fire]e1 inside WTC was caused by exploding [fuel]e2” Steps: 1, word vector; 2, lexical level features; 3, sentences features via CNN; 4, Concatenate features and feed into softmax classifier SENNA
  • 24. •Blunsom, Phil, Edward Grefenstette, and NalKalchbrenner. "A Convolutional Neural Network for Modelling Sentences." ACL 2014. •Hermann, Karl Moritz, and Phil Blunsom. "Multilingual Models for Compositional Distributed Semantics."arXivpreprint arXiv:1404.4641(2014). •Hermann, Karl Moritz, and Phil Blunsom. "Multilingual Distributed Representations without Word Alignment."arXivpreprint arXiv: 1312.6173(2013). •Kim, Yoon. "Convolutional Neural Networks for Sentence Classification. “ arxiv: 2014 •Le, QuocV., and Tomas Mikolov. "Distributed Representations of Sentences and Documents."ICML(2014). 9/3/2014 24 lipiji.pz@gmail.comNN & Sentences PhilBlunsom Oxford http://www.cs.ox.ac.uk/people/phil.blunsom/
  • 25. •Best Results in Computer Vision (LeNet5, ImageNet 2012~now) •Shared Weight: Convolutional Filter Feature Map •* Invariance: pooling (mean-, max-, stochastic) 9/3/2014 25 lipiji.pz@gmail.comConvolutional Neural Networks http://yann.lecun.com/exdb/lenet/ C1: (5*5+1)*6=156 parameters; 156*(28*28)=122,304 connections Convolutional Windows in Sentences
  • 26. •Blunsom, Phil, Edward Grefenstette, and NalKalchbrenner. "A Convolutional Neural Network for Modelling Sentences." ACL 2014. 9/3/2014 26 lipiji.pz@gmail.comCNN for Modelling Sentences •Supervisor depends on tasks •Dynamic k-Max Pooling •Folding -Sums every two rows •Multiple Feature Maps -Different features -Word order, n-gram -Sentence Feature -Many tasks •Word vector: learning
  • 27. •Kim, Yoon. "Convolutional Neural Networks for Sentence Classification. “ arxiv: 2014 9/3/2014 27 lipiji.pz@gmail.comCNN for Sentence Classification New: Multi-Channel A model with two sets of word vectors. Each set of vectors is treated as a ‘channel’ and each filter is applied to both channels. word2vec
  • 28. •Hermann, Karl Moritz, and Phil Blunsom. "Multilingual Models for Compositional Distributed Semantics."arXivpreprint arXiv:1404.4641(2014). 9/3/2014 28 lipiji.pz@gmail.comMultilingual Models Sentences Documents Word vector: learning Objection: CVM: SUM or
  • 29. •Le, QuocV., and Tomas Mikolov. "Distributed Representations of Sentences and Documents."ICML(2014). 9/3/2014 29 lipiji.pz@gmail.comWord2Vec Extension Sentiment Analysisand Document Classification
  • 30. 9/3/2014 30 lipiji.pz@gmail.comSENNA Collobert,Ronan,JasonWeston,LéonBottou,MichaelKarlen,KorayKavukcuoglu,andPavelKuksa."Naturallanguageprocessing(almost) fromscratch."TheJournalofMachineLearningResearch12(2011) Word embeddings ConvNet
  • 31. •Le, QuocV., and Tomas Mikolov. "Distributed Representations of Sentences and Documents."ICML(2014). •Denil, Misha, Alban Demiraj, NalKalchbrenner, Phil Blunsom, and Nandode Freitas. "Modelling, Visualisingand SummarisingDocuments with a Single Convolutional Neural Network."arXivpreprint arXiv:1406.3830(2014). 9/3/2014 31 lipiji.pz@gmail.comNN & Documents Layer1 Layer4 Layer5 Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." ECCV 2014 How about Text? -Word salience? –word/entity/topic extraction -Sentence salience?
  • 32. 9/3/2014 32 lipiji.pz@gmail.comWeb Search | Learning to Rank •Huang,Po-Sen,XiaodongHe,JianfengGao,LiDeng,AlexAcero,andLarryHeck. "Learningdeepstructuredsemanticmodelsforwebsearchusingclickthroughdata." CIKM,2013.
  • 33. 9/3/2014 33 lipiji.pz@gmail.comNN & Topic Model •Wan,Li,LeoZhu,andRobFergus."Ahybridneuralnetwork-latenttopicmodel."ICAIS.2012. •Larochelle,Hugo,andStanislasLauly."Aneuralautoregressivetopicmodel." NIPS.2012.
  • 34. 9/3/2014 34 lipiji.pz@gmail.comHybrid NN & Topic Model •Wan,Li,LeoZhu,andRobFergus."Ahybridneuralnetwork-latenttopicmodel."ICAIS.2012.
  • 35. 9/3/2014 35 lipiji.pz@gmail.comTuning •#layers,#hiddenunits •Learningrate •Momentum •Weightdecay •Mini-bachsize •… TO BE A GOOD TUNER