2. 內容
• 本投影⽚片內容簡化於 Elasticsearch:The Definitive Guide 中 Getting Started 章節的:
• You know, for search…
• life inside a cluster
• Distributed Document Store
• Mapping and Analysis
• Index Management
• inside a shard
• 除此之外也介紹了三個 elasticsearch 的 rails gem
15. Employee Directory Tutorial
• Enable data to contain multi value tags, numbers, and full text.
• Retrieve the full details of any employee.
• Allow structured search, such as finding employees over the age of 30.
• Allow simple full-text search and more-complex phrase searches.
• Return highlighted search snippets from the text in the matching documents.
• Enable management to build analytic dashboards over the data.
• 詳細請看: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/
_finding_your_feet.html
30. • 假如有 12 筆 date 是 2014-xx-xx 的⽂文件。但只有⼀一個
⽂文件的 date 是 2014-09-15。那我們發送以下請求:
GET /_search?q=2014 # 12 results
GET /_search?q=2014-09-15 # 12 results !
GET /_search?q=date:2014-09-15 # 1 result
GET /_search?q=date:2014 # 0 results !
怎麼結果會這麼奇怪呢?
31. 跨欄位搜尋
• 當儲存⽂文件時,elasticsearch 預設會另外儲存
⼀一個 _all 欄位。該欄位預設由所有的欄位串接
⽽而成,並使⽤用 inverted index 製作索引提供全
⽂文搜索。例如:
{
"tweet": "However did I manage before Elasticsearch?",
"date": "2014-09-14",
"name": "Mary Jones",
"user_id": 1
}
"However did I manage before Elasticsearch? 2014-09-14 Mary Jones 1"
該⽂文件的 _all 欄位如下
33. exact value 與 full text
• elasticsearch 把值分成兩類:exact value 與 full text
• 當針對 exact value 的欄位搜尋時,使⽤用布林判
斷,例如:Foo != foo
• 當針對 full text 的欄位搜尋時,則是計算相關程
度,例如:UK 與 United Kingdom 相關、jumping
與 leap 也相關
34. inverted Index
• elasticsearch ⽤用 inverted index 建⽴立索引,提供
全⽂文搜索。考慮以下兩份⽂文件:
• The quick brown fox jumped over the lazy dog
• Quick brown foxes leap over lazy dogs in summer
35. inverted Index
• 建⽴立出來的 inverted index
看起來⼤大概像是左邊的表。
• 搜尋 ”quick brown” 的結果
如下表。
Term Doc_1 Doc_2
-------------------------
Quick | | X
The | X |
brown | X | X
dog | X |
dogs | | X
fox | X |
foxes | | X
in | | X
jumped | X |
lazy | X | X
leap | | X
over | X | X
quick | X |
summer | | X
the | X |
------------------------
Term Doc_1 Doc_2
-------------------------
brown | X | X
quick | X |
------------------------
Total | 2 | 1
36. inverted Index
• 此表還可以再優化,例如:
• Quick 可以變成 quick
• foxes, dogs 可以變成 fox 與 dog
• jumped, leap 可以變成 jump
• 這種分詞(tokenization)、正規化
(normalization)過程叫做 analysis
Term Doc_1 Doc_2
-------------------------
Quick | | X
The | X |
brown | X | X
dog | X |
dogs | | X
fox | X |
foxes | | X
in | | X
jumped | X |
lazy | X | X
leap | | X
over | X | X
quick | X |
summer | | X
the | X |
------------------------
37. inverted Index
• 優化結果如下
Term Doc_1 Doc_2
-------------------------
Quick | | X
The | X |
brown | X | X
dog | X |
dogs | | X
fox | X |
foxes | | X
in | | X
jumped | X |
lazy | X | X
leap | | X
over | X | X
quick | X |
summer | | X
the | X |
------------------------
Term Doc_1 Doc_2
-------------------------
brown | X | X
dog | X | X
fox | X | X
in | | X
jump | X | X
lazy | X | X
over | X | X
quick | X | X
summer | | X
the | X | X
------------------------