SlideShare uma empresa Scribd logo
1 de 52
Baixar para ler offline
Présentation ElasticSearch
1
Indexation d’un annuaire de restaurant
● Titre
● Description
● Prix
● Adresse
● Type
2
Création d’un index sans mapping
PUT restaurant
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2
}
}
3
Indexation sans mapping
PUT restaurant/restaurant/1
{
"title": 42,
"description": "Un restaurant gastronomique où tout plat coûte 42 euros",
"price": 42,
"adresse": "10 rue de l'industrie, 31000 TOULOUSE",
"type": "gastronomie"
}
4
Risque de l’indexation sans mapping
PUT restaurant/restaurant/2
{
"title": "Pizza de l'ormeau",
"description": "Dans cette pizzeria on trouve
des pizzas très bonnes et très variés",
"price": 10,
"adresse": "1 place de l'ormeau, 31400
TOULOUSE",
"type": "italien"
}
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "failed to parse [title]"
}
],
"type": "mapper_parsing_exception",
"reason": "failed to parse [title]",
"caused_by": {
"type": "number_format_exception",
"reason": "For input string: "Pizza de
l'ormeau""
}
},
"status": 400
} 5
Mapping inféré
GET /restaurant/_mapping
{
"restaurant": {
"mappings": {
"restaurant": {
"properties": {
"adresse": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"description": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"prix": {
"type": "long"
},
"title": {
"type": "long"
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
6
Création d’un mapping
PUT :url/restaurant
{
"settings": {
"index": {"number_of_shards": 3, "number_of_replicas": 2}
},
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text"},
"description": {"type": "text"},
"price": {"type": "integer"},
"adresse": {"type": "text"},
"type": { "type": "keyword"}
}
}
}
}
7
Indexation de quelques restaurants
POST :url/restaurant/restaurant/_bulk
{"index": {"_id": 1}}
{"title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse":
"10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie"}
{"index": {"_id": 2}}
{"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très
variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien"}
{"index": {"_id": 3}}
{"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux", "price": 14, "adresse": "13
route de labège, 31400 TOULOUSE", "type": "asiatique"}
8
Recherche basique
GET :url/restaurant/_search
{
"query": {
"match": {
"description": "asiatique"
}
}
}
{
"hits": {
"total": 1,
"max_score": 0.6395861,
"hits": [
{
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant asiatique très copieux pour un prix
contenu",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
}
]
}
}
9
Mise en défaut de notre mapping
GET :url/restaurant/_search
{
"query": {
"match": {
"description": "asiatiques"
}
}
}
{
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
10
Qu’est ce qu’un analyseur
● Transforme une chaîne de caractères en token
○ Ex: “Le chat est rouge” -> [“le”, “chat”, “est”, “rouge”]
● Les tokens permettent de construire un index inversé
11
Qu’est ce qu’un index inversé
12
Explication: analyseur par défaut
GET /_analyze
{
"analyzer": "standard",
"text": "Un restaurant asiatique très copieux"
}
{
"tokens": [{
"token": "un",
"start_offset": 0, "end_offset": 2,
"type": "<ALPHANUM>", "position": 0
},{
"token": "restaurant",
"start_offset": 3, "end_offset": 13,
"type": "<ALPHANUM>", "position": 1
},{
"token": "asiatique",
"start_offset": 14, "end_offset": 23,
"type": "<ALPHANUM>", "position": 2
},{
"token": "très",
"start_offset": 24, "end_offset": 28,
"type": "<ALPHANUM>", "position": 3
},{
"token": "copieux",
"start_offset": 29, "end_offset": 36,
"type": "<ALPHANUM>", "position": 4
}
]
}
13
Explication: analyseur “french”
GET /_analyze
{
"analyzer": "french",
"text": "Un restaurant asiatique très copieux"
}
{
"tokens": [
{
"token": "restaurant",
"start_offset": 3, "end_offset": 13,
"type": "<ALPHANUM>", "position": 1
},{
"token": "asiat",
"start_offset": 14, "end_offset": 23,
"type": "<ALPHANUM>", "position": 2
},{
"token": "trè",
"start_offset": 24, "end_offset": 28,
"type": "<ALPHANUM>", "position": 3
},{
"token": "copieu",
"start_offset": 29, "end_offset": 36,
"type": "<ALPHANUM>", "position": 4
}
]
} 14
Décomposition d’un analyseur
Elasticsearch décompose l’analyse en trois étapes:
● Filtrage des caractères (ex: suppression de balises html)
● Découpage en “token”
● Filtrage des tokens:
○ Suppression de token (mot vide de sens “un”, “le”, “la”)
○ Transformation (lemmatisation...)
○ Ajout de tokens (synonyme)
15
Décomposition de l’analyseur french
GET /_analyze
{
"tokenizer": "standard",
"filter": [
{
"type": "elision",
"articles_case": true,
"articles": [
"l", "m", "t", "qu", "n", "s", "j", "d", "c",
"jusqu", "quoiqu", "lorsqu", "puisqu"
]
}, {
"type": "stop", "stopwords": "_french_"
}, {
"type": "stemmer", "language": "french"
}
],
"text": "ce n'est qu'un restaurant asiatique très copieux"
}
“ce n’est qu’un restaurant asiatique très
copieux”
[“ce”, “n’est”, “qu’un”, “restaurant”, “asiatique”,
“très”, “copieux”]
[“ce”, “est”, “un”, “restaurant”, “asiatique”,
“très”, “copieux”]
[“restaurant”, “asiatique”, “très”, “copieux”]
[“restaurant”, “asiat”, “trè”, “copieu”]
elision
standard tokenizer
stopwords
french stemming
16
Spécification de l’analyseur dans le mapping
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2
}
},
"mappings": {
"restaurant": {
"properties": {
"title": {fields: {"type": "text", "analyzer": "french"}},
"description": {"type": "text", "analyzer": "french"},
"price": {"type": "integer"},
"adresse": {"type": "text", "analyzer": "french"},
"type": { "type": "keyword"}
}
}
}
}
17
Recherche résiliente aux erreurs de frappe
GET /restaurant/restaurant/_search
{
"query": {
"match": {
"description": "asiatuques"
}
}
}
{
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
18
Une solution le ngram token filter
GET /_analyze
{
"tokenizer": "standard",
"filter": [
{
"type": "ngram",
"min_gram": 3,
"max_gram": 7
}
],
"text": "asiatuque"
}
[
"asi",
"asia",
"asiat",
"asiatu",
"asiatuq",
"sia",
"siat",
"siatu",
"siatuq",
"siatuqu",
"iat",
"iatu",
"iatuq",
"iatuqu",
"iatuque",
"atu",
"atuq",
"atuqu",
"atuque",
"tuq",
"tuqu",
"tuque",
"uqu",
"uque",
"que"
]
19
Création d’un analyseur custom pour utiliser le ngram filter
PUT /restaurant
{
"settings": {
"analysis": {
"filter": {"custom_ngram": {"type": "ngram", "min_gram": 3, "max_gram": 7}},
"analyzer": {"ngram_analyzer": {"tokenizer": "standard", "filter": ["asciifolding", "custom_ngram"]}}
}
},
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text", "analyzer": "ngram_analyzer"},
"description": {"type": "text", "analyzer": "ngram_analyzer"},
"price": {"type": "integer"},
"adresse": {"type": "text", "analyzer": "ngram_analyzer"},
"type": {"type": "keyword"}
}
}
}
20
GET /restaurant/restaurant/_search
{
"query": {
"match": {
"description": "asiatuques"
}
}
}
{
"hits": {
"hits": [
{
"_score": 0.60128295,
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant asiatique très copieux
pour un prix contenu",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
}, {
"_score": 0.46237043,
"_source": {
"title": 42,
"description": "Un restaurant gastronomique où
tout plat coûte 42 euros",
"price": 42,
"adresse": "10 rue de l'industrie, 31000
TOULOUSE",
"type": "gastronomie"
21
Bruit induit par le ngram
GET /restaurant/restaurant/_search
{
"query": {
"match": {
"description": "gastronomique"
}
}
}
{
"hits": {
"hits": [
{
"_score": 0.6277555,
"_source": {
"title": 42,
"description": "Un restaurant gastronomique où tout plat
coûte 42 euros",
"price": 42,
"adresse": "10 rue de l'industrie, 31000 TOULOUSE",
"type": "gastronomie"
}
},{
"_score": 0.56373334,
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant asiatique très copieux pour un
prix contenu",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
},
22
Spécifier plusieurs analyseurs pour un champs
PUT /restaurant
{
"settings": {
"analysis": {
"filter": {"custom_ngram": {"type": "ngram", "min_gram": 3, "max_gram": 7}},
"analyzer": {"ngram_analyzer": {"tokenizer": "standard", "filter": ["asciifolding", "custom_ngram"]}
}
}
},
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text", "analyzer": "french"},
"description": {
"type": "text", "analyzer": "french",
"fields": {
"ngram": { "type": "text", "analyzer": "ngram_analyzer"}
},
"price": {"type": "integer"},
23
Utilisation de plusieurs champs lors d’une recherche
GET /restaurant/restaurant/_search
{
"query": {
"multi_match": {
"query": "gastronomique",
"fields": [
"description^4",
"description.ngram"
]
}
}
}
{
"hits": {
"hits": [
{
"_score": 2.0649285,
"_source": {
"title": 42,
"description": "Un restaurant gastronomique où tout plat coûte 42 euros",
"price": 42,
"adresse": "10 rue de l'industrie, 31000 TOULOUSE",
"type": "gastronomie"
}
},
{
"_score": 0 .56373334,
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant asiatique très copieux pour un prix contenu",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
},
{
"_index": "restaurant",
24
Ignorer ou ne pas ignorer les stopwords tel est la question
POST :url/restaurant/restaurant/_bulk
{"index": {"_id": 1}}
{"title": 42, "description": "Un restaurant gastronomique donc cher ou tout plat coûte cher (42 euros)", "price":
42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie"}
{"index": {"_id": 2}}
{"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très
variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien"}
{"index": {"_id": 3}}
{"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux et pas cher", "price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique"}
25
Les stopwords ne sont pas
forcément vide de sens
GET /restaurant/restaurant/_search
{
"query": {
"match_phrase": {
"description": "pas cher"
}
}
}
{
"hits": {
"hits": [
{
"_source": {
"title": 42,
"description": "Un restaurant gastronomique donc
cher ou tout plat coûte cher (42 euros)",
"price": 42,
"adresse": "10 rue de l'industrie, 31000
TOULOUSE",
"type": "gastronomie"
}
},{
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant asiatique très copieux
et pas cher",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
}
26
Modification de l’analyser french
pour garder les stopwords
PUT /restaurant
{
"settings": {
"analysis": {
"filter": {
"french_elision": {
"type": "elision",
"articles_case": true,
"articles": [“l", "m", "t", "qu", "n", "s","j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu"]
},
"french_stemmer": {"type": "stemmer", "language": "light_french"}
},
"analyzer": {
"custom_french": {
"tokenizer": "standard",
"filter": [
"french_elision",
"lowercase",
"french_stemmer"
]
}
27
GET /restaurant/restaurant/_search
{
"query": {
"match_phrase": {
"description": "pas cher"
}
}
}
{
"hits": {
"hits": [
{
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant
asiatique très copieux et pas cher",
"price": 14,
"adresse": "13 route de labège,
31400 TOULOUSE",
"type": "asiatique"
}
}
]
}
}
28
Rechercher avec les stopwords sans diminuer les
performances
GET /restaurant/restaurant/_search
{
"query": {
"match": {
"description": {
"query": "restaurant pas
cher",
"cutoff_frequency": 0.01
}
}
}
}
GET /restaurant/restaurant/_search
{
"query": {
"bool": {
"must": {
"bool": {
"should": [
{"term": {"description": "restaurant"}},
{"term": {"description": "cher"}}]
}
},
"should": [
{"match": {
"description": "pas"
}}
]
}
29
Personnaliser le “scoring”
GET /restaurant/restaurant/_search
{
"query": {
"function_score": {
"query": {
"match": {
"adresse": "toulouse"
}
},
"functions": [{
"filter": { "terms": { "type": ["asiatique", "italien"]}},
"weight": 2
}]
}
}
}
30
Personnaliser le “scoring”
GET /restaurant/restaurant/_search
{
"query": {
"function_score": {
"query": {
"match": {
"adresse": "toulouse"
}
},
"script_score": {
"script": {
"lang": "painless",
"inline": "_score * ( 1 + 10/doc['prix'].value)"
}
}
}
}
}
{
"hits": {
"hits": [
{
"_score": 0.53484553,
"_source": {
"title": "Pizza de l'ormeau",
"price": 10,
"adresse": "1 place de l'ormeau, 31400 TOULOUSE",
"type": "italien"
}
}, {
"_score": 0.26742277,
"_source": {
"title": 42,
"price": 42,
"adresse": "10 rue de l'industrie, 31000 TOULOUSE",
"type": "gastronomie"
}
}, {
"_score": 0.26742277,
"_source": {
"title": "Chez l'oncle chan",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
}
]
}
}
31
Comment indexer les documents multilingues
Trois cas:
● Champs avec plusieurs langages (ex: {"message": "warning | attention | cuidado"})
○ Ngram
○ Analysé plusieurs fois le même champs avec un analyseur par langage
● Un champ par langue:
○ Facile car on peut spécifier un analyseur différent par langue
○ Attention de ne pas se retrouver avec un index parsemé
● Une version du document par langue (à favoriser)
○ Un index par document
○ Surtout ne pas utiliser des types pour chaque langue dans le même index (problème de statistique)
32
Gestion des synonymes
PUT /restaurant
{
"settings": {
"analysis": {
"filter": {
"french_elision": {
"type": "elision", "articles_case": true,
"articles": ["l", "m", "t", "qu", "n", "s", "j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu"]
},
"french_stemmer": {"type": "stemmer", "language": "light_french"},
"french_synonym": {"type": "synonym", "synonyms": ["sou marin => sandwitch", "formul, menu"]}
},
"analyzer": {
"french_with_synonym": {
"tokenizer": "standard",
"filter": ["french_elision", "lowercase", "french_stemmer", "french_synonym"]
}
}
}
},
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text", "analyzer": "french"},
"description": { "type": "text", "analyzer": "french", "search_analyzer": "french_with_synonym"},
"price": {"type": "integer"},
"adresse": {"type": "text", "analyzer": "french"},
"coord": {"type": "geo_point"},
33
Gestions des synonymes
GET /restaurant/restaurant/_search
{
"query": {
"match": {"description": "sous-marins"}
}
}
{
"hits": {
"hits": [
{
"_source": {
"title:": "Subway",
"description": "service très rapide,
rapport qualité/prix médiocre mais on peut choisir la
composition de son sandwitch",
"price": 8,
"adresse": "211 route de narbonne,
31520 RAMONVILLE",
"type": "fastfood",
"coord": "43.5577519,1.4625753"
}
}
]
}
}
34
Données géolocalisées
PUT /restaurant
{
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text", "analyzer": "french"},
"description": {"type": "text", "analyzer": "french"
},
"price": {"type": "integer"},
"adresse": {"type": "text","analyzer": "french"},
"coord": {"type": "geo_point"},
"type": { "type": "keyword"}
}
}
}
}
35
Données géolocalisées
POST restaurant/restaurant/_bulk
{"index": {"_id": 1}}
{"title": "bistronomique", "description": "Un restaurant bon mais un petit peu cher, les desserts sont excellents",
"price": 17, "adresse": "73 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.57417,1.4905748"}
{"index": {"_id": 2}}
{"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très variés",
"price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien", "coord": "43.579225,1.4835248"}
{"index": {"_id": 3}}
{"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14,
"adresse": "18 rue des cosmonautetes, 31400 TOULOUSE", "type": "asiatique", "coord": "43.5612759,1.4936073"}
{"index": {"_id": 4}}
{"title:": "Un fastfood très connu", "description": "service très rapide, rapport qualité/prix médiocre", "price": 8,
"adresse": "210 route de narbonne, 31520 RAMONVILLE", "type": "fastfood", "coord": "43.5536343,1.476165"}
{"index": {"_id": 5}}
{"title:": "Subway", "description": "service très rapide, rapport qualité/prix médiocre mais on peut choisir la
composition de son sandwitch", "price": 8, "adresse": "211 route de narbonne, 31520 RAMONVILLE", "type": "fastfood",
"coord": "43.5577519,1.4625753"}
{"index": {"_id": 6}}
{"title:": "L'évidence", "description": "restaurant copieux et pas cher, cependant c'est pas bon", "price": 12,
"adresse": "38 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.5770109,1.4846573"} 36
Filtrage et trie sur données
géolocalisées
GET /restaurant/restaurant/_search
{
"query": {
"bool": {
"filter": [
{"term": {"type":"français"}},
{"geo_distance": {
"distance": "1km",
"coord": {"lat": 43.5739329, "lon": 1.4893669}
}}
]
}
},
"sort": [{
"geo_distance": {
"coord": {"lat": 43.5739329, "lon": 1.4893669},
"unit": "km"
}
}]
{
"hits": {
"hits": [
{
"_source": {
"title": "bistronomique",
"description": "Un restaurant bon mais un petit peu cher, les desserts sont
"price": 17,
"adresse": "73 route de revel, 31400 TOULOUSE",
"type": "français",
"coord": "43.57417,1.4905748"
},
"sort": [0.10081529266640063]
},{
"_source": {
"title:": "L'évidence",
"description": "restaurant copieux et pas cher, cependant c'est pas bon",
"price": 12,
"adresse": "38 route de revel, 31400 TOULOUSE",
"type": "français",
"coord": "43.5770109,1.4846573"
},
"sort": [0.510960087579506]
},{
"_source": {
"title:": "Chez Ingalls",
"description": "Contemporain et rustique, ce restaurant avec cheminée sert
savoyardes et des grillades",
37
Explication de la requête Bool
GET /restaurant/restaurant/_search
{
"query": {
"bool": {
"must": {"match": {"description": "sandwitch"}},
"should" : [
{"match": {"description": "bon"}},
{"match": {"description": "excellent"}}
],
"must_not": [
{"match_phrase": {
"description": "pas bon"
}}
],
"filter": [
{"range": {"price": {
"lte": "20"
}}}
]
}
} 38
Explication de la requête Bool
GET /restaurant/restaurant/_search
{
"query": {
"bool": {
"should" : [
{"match": {"description": "bon"}},
{"match": {"description": "excellent"}},
{"match": {"description": "service rapide"}}
],
"minimum_number_should_match": 2
}
}
}
39
Proposer une recherche avancé
à vos utilisateurs
GET /restaurant/restaurant/_search
{
"query": {
"simple_query_string": {
"fields": ["description", "title^2", "adresse", "type"],
"query": "-"pas bon" +(pizzi~2 OR sandwitch)"
}
}
}
GET /restaurant/restaurant/_search
{
"query": {
"bool": {
"must_not": {
"multi_match": {
"fields": [ "description", , "title^2", "adresse", "type"],
"type": "phrase",
"query": "pas bon"
}
},
"should": [
{"multi_match": {
"fields": [ "description", , "title^2", "adresse", "type"],
"fuziness": 2,
"max_expansions": 50,
"query": "pizzi"
}
},
{"multi_match": {
"fields": [ "description", , "title^2", "adresse",
"type"],
"query": "sandwitch"
} 40
Alias: comment se donner des marges de manoeuvre
PUT /restaurant_v1/
{
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text"},
"lat": {"type": "double"},
"lon": {"type": "double"}
}
}
}
}
POST /_aliases
{
"actions": [
{"add": {"index": "restaurant_v1", "alias": "restaurant_search"}},
{"add": {"index": "restaurant_v1", "alias": "restaurant_write"}}
]
}
41
Alias, Pipeline et reindexion
PUT /restaurant_v2
{
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text", "analyzer": "french"},
"position": {"type": "geo_point"}
}
}
}
}
PUT /_ingest/pipeline/fixing_position
{
"description": "move lat lon into position parameter",
"processors": [
{"rename": {"field": "lat", "target_field": "position.lat"}},
{"rename": {"field": "lon", "target_field": "position.lon"}}
]
}
POST /_aliases
{
"actions": [
{"remove": {"index": "restaurant_v1", "alias":
"restaurant_search"}},
{"remove": {"index": "restaurant_v1", "alias":
"restaurant_write"}},
{"add": {"index": "restaurant_v2", "alias":
"restaurant_search"}},
{"add": {"index": "restaurant_v2", "alias": "restaurant_write"}}
]
}
POST /_reindex
{
"source": {"index": "restaurant_v1"},
"dest": {"index": "restaurant_v2", "pipeline": "fixing_position"}
}
42
Analyse des données des interventions des pompiers
de 2005 à 2014
PUT /pompier
{
"mappings": {
"intervention": {
"properties": {
"date": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss"},
"type_incident": { "type": "keyword" },
"description_groupe": { "type": "keyword" },
"caserne": { "type": "integer"},
"ville": { "type": "keyword"},
"arrondissement": { "type": "keyword"},
"division": {"type": "integer"},
"position": {"type": "geo_point"},
"nombre_unites": {"type": "integer"}
}
}
}
}
43
Voir les différents incidents
GET /pompier/interventions/_search
{
"size": 0,
"aggs": {
"type_incident": {
"terms": {"field": "type_incident", "size": 100}
}
}
}
{
"aggregations": {
"type_incident": {
"buckets": [
{"key": "Premier répondant", "doc_count": 437891},
{"key": "Appel de Cie de détection", "doc_count": 76157},
{"key": "Alarme privé ou locale", "doc_count": 60879},
{"key": "Ac.véh./1R/s.v./ext/29B/D", "doc_count": 41734},
{"key": "10-22 sans feu", "doc_count": 29283},
{"key": "Acc. sans victime sfeu - ext.", "doc_count": 27663},
{"key": "Inondation", "doc_count": 26801},
{"key": "Problèmes électriques", "doc_count": 23495},
{"key": "Aliments surchauffés", "doc_count": 23428},
{"key": "Odeur suspecte - gaz", "doc_count": 21158},
{"key": "Déchets en feu", "doc_count": 18007},
{"key": "Ascenseur", "doc_count": 12703},
{"key": "Feu de champ *", "doc_count": 11518},
{"key": "Structure dangereuse", "doc_count": 9958},
{"key": "10-22 avec feu", "doc_count": 9876},
{"key": "Alarme vérification", "doc_count": 8328},
{"key": "Aide à un citoyen", "doc_count": 7722},
{"key": "Fuite ext.:hydrocar. liq. div.", "doc_count": 7351},
{"key": "Ac.véh./1R/s.v./V.R./29B/D", "doc_count": 6232},
{"key": "Feu de véhicule extérieur", "doc_count": 5943},
{"key": "Fausse alerte 10-19", "doc_count": 4680},
{"key": "Acc. sans victime sfeu - v.r", "doc_count": 3494},
{"key": "Assistance serv. muni.", "doc_count": 3431},
{"key": "Avertisseur de CO", "doc_count": 2542},
{"key": "Fuite gaz naturel 10-22", "doc_count": 1928},
{"key": "Matières dangereuses / 10-22", "doc_count": 1905},
{"key": "Feu de bâtiment", "doc_count": 1880},
{"key": "Senteur de feu à l'extérieur", "doc_count": 1566},
{"key": "Surchauffe - véhicule", "doc_count": 1499},
{"key": "Feu / Agravation possible", "doc_count": 1281},
{"key": "Fuite gaz naturel 10-09", "doc_count": 1257},
{"key": "Acc.véh/1rép/vict/ext 29D04", "doc_count": 1015},
{"key": "Acc. véh victime sfeu - (ext.)", "doc_count": 971},
44
Agrégations imbriquées
GET /pompier/interventions/_search
{
"size": 0,
"aggs": {
"ville": {
"terms": {"field": "ville"},
"aggs": {
"arrondissement": {
"terms": {"field": "arrondissement"}
}
}
}
}
}
{
"aggregations": {"ville": {"buckets": [
{
"key": "Montréal", "doc_count": 768955,
"arrondissement": {"buckets": [
{"key": "Ville-Marie", "doc_count": 83010},
{"key": "Mercier / Hochelaga-Maisonneuve", "doc_count": 67272},
{"key": "Côte-des-Neiges / Notre-Dame-de-Grâce", "doc_count": 65933},
{"key": "Villeray / St-Michel / Parc Extension", "doc_count": 60951},
{"key": "Rosemont / Petite-Patrie", "doc_count": 59213},
{"key": "Ahuntsic / Cartierville", "doc_count": 57721},
{"key": "Plateau Mont-Royal", "doc_count": 53344},
{"key": "Montréal-Nord", "doc_count": 40757},
{"key": "Sud-Ouest", "doc_count": 39936},
{"key": "Rivière-des-Prairies / Pointe-aux-Trembles", "doc_count": 38139}
]}
}, {
"key": "Dollard-des-Ormeaux", "doc_count": 17961,
"arrondissement": {"buckets": [
{"key": "Indéterminé", "doc_count": 13452},
{"key": "Dollard-des-Ormeaux / Roxboro", "doc_count": 4477},
{"key": "Pierrefonds / Senneville", "doc_count": 10},
{"key": "Dorval / Ile Dorval", "doc_count": 8},
{"key": "Pointe-Claire", "doc_count": 8},
{"key": "Ile-Bizard / Ste-Geneviève / Ste-A-de-B", "doc_count": 6}
]}
}, {
"key": "Pointe-Claire", "doc_count": 17925,
"arrondissement": {"buckets": [
{"key": "Indéterminé", "doc_count": 13126},
{"key": "Pointe-Claire", "doc_count": 4766},
{"key": "Dorval / Ile Dorval", "doc_count": 12},
{"key": "Dollard-des-Ormeaux / Roxboro", "doc_count": 7},
{"key": "Kirkland", "doc_count": 7},
{"key": "Beaconsfield / Baie d'Urfé", "doc_count": 5},
{"key": "Ile-Bizard / Ste-Geneviève / Ste-A-de-B", "doc_count": 1},
{"key": "St-Laurent", "doc_count": 1}
45
Calcul de moyenne et trie d'agrégation
GET /pompier/interventions/_search
{
"size": 0,
"aggs": {
"avg_nombre_unites_general": {
"avg": {"field": "nombre_unites"}
},
"type_incident": {
"terms": {
"field": "type_incident",
"size": 5,
"order" : {"avg_nombre_unites": "desc"}
},
"aggs": {
"avg_nombre_unites": {
"avg": {"field": "nombre_unites"}
}
}
}
}
{
"aggregations": {
"type_incident": {
"buckets": [
{
"key": "Feu / 5e Alerte", "doc_count": 162,
"avg_nombre_unites": {"value": 70.9074074074074}
}, {
"key": "Feu / 4e Alerte", "doc_count": 100,
"avg_nombre_unites": {"value": 49.36}
}, {
"key": "Troisième alerte/autre que BAT", "doc_count": 1,
"avg_nombre_unites": {"value": 43.0}
}, {
"key": "Feu / 3e Alerte", "doc_count": 173,
"avg_nombre_unites": {"value": 41.445086705202314}
}, {
"key": "Deuxième alerte/autre que BAT", "doc_count": 8,
"avg_nombre_unites": {"value": 37.5}
}
]
},
"avg_nombre_unites_general": {"value": 2.1374461758713728}
}
} 46
Percentile
GET /pompier/interventions/_search
{
"size": 0,
"aggs": {
"unites_percentile": {
"percentiles": {
"field": "nombre_unites",
"percents": [25, 50, 75, 100]
}
}
}
}
{
"aggregations": {
"unites_percentile": {
"values": {
"25.0": 1.0,
"50.0": 1.0,
"75.0": 3.0,
"100.0": 275.0
}
}
}
}
47
Histogram
GET /pompier/interventions/_search
{
"size": 0,
"query": {
"term": {"type_incident": "Inondation"}
},
"aggs": {
"unites_histogram": {
"histogram": {
"field": "nombre_unites",
"order": {"_key": "asc"},
"interval": 1
},
"aggs": {
"ville": {
"terms": {"field": "ville", "size": 1}
}
}
}
}
}
{
"aggregations": {
"unites_histogram": {
"buckets": [
{
"key": 1.0, "doc_count": 23507,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 19417}]}
},{
"key": 2.0, "doc_count": 1550,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 1229}]}
},{
"key": 3.0, "doc_count": 563,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 404}]}
},{
"key": 4.0, "doc_count": 449,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 334}]}
},{
"key": 5.0, "doc_count": 310,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 253}]}
},{
"key": 6.0, "doc_count": 215,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 173}]}
},{
"key": 7.0, "doc_count": 136,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 112}]}
},{
"key": 8.0, "doc_count": 35,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 30}]}
},{
"key": 9.0, "doc_count": 10,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 8}]}
},{
"key": 10.0, "doc_count": 11,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 8}]}
},{
"key": 11.0, "doc_count": 2,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 2}]}
48
“Significant term”
GET /pompier/interventions/_search
{
"size": 0,
"query": {
"term": {"type_incident": "Inondation"}
},
"aggs": {
"ville": {
"significant_terms": {"field": "ville", "size": 5, "percentage": {}}
}
}
}
{
"aggregations": {
"ville": {
"doc_count": 26801,
"buckets": [
{
"key": "Ile-Bizard",
"score": 0.10029498525073746,
"doc_count": 68, "bg_count": 678
},
{
"key": "Montréal-Nord",
"score": 0.0826544804291675,
"doc_count": 416, "bg_count": 5033
},
{
"key": "Roxboro",
"score": 0.08181818181818182,
"doc_count": 27, "bg_count": 330
},
{
"key": "Côte St-Luc",
"score": 0.07654825526563974,
"doc_count": 487, "bg_count": 6362
},
{
"key": "Saint-Laurent",
"score": 0.07317073170731707,
"doc_count": 465, "bg_count": 6355
49
Agrégation et données géolocalisées
GET :url/pompier/interventions/_search
{
"size": 0,
"query": {
"regexp": {"type_incident": "Feu.*"}
},
"aggs": {
"distance_from_here": {
"geo_distance": {
"field": "position",
"unit": "km",
"origin": {
"lat": 45.495902,
"lon": -73.554263
},
"ranges": [
{ "to": 2},
{"from":2, "to": 4},
{"from":4, "to": 6},
{"from": 6, "to": 8},
{"from": 8}]
}
}
}
{
"aggregations": {
"distance_from_here": {
"buckets": [
{
"key": "*-2.0",
"from": 0.0,
"to": 2.0,
"doc_count": 80
},
{
"key": "2.0-4.0",
"from": 2.0,
"to": 4.0,
"doc_count": 266
},
{
"key": "4.0-6.0",
"from": 4.0,
"to": 6.0,
"doc_count": 320
},
{
"key": "6.0-8.0",
"from": 6.0,
"to": 8.0,
"doc_count": 326
},
{
"key": "8.0-*",
"from": 8.0,
"doc_count": 1720
}
]
}
}
}
50
Il y a t-il des questions ?
? 51
Proposer une recherche avancé
à vos utilisateurs
GET /restaurant/restaurant/_search
{
"query": {
"simple_query_string": {
"fields": ["description", "title^2", "adresse", "type"],
"query": ""service rapide"~2"
}
}
}
"hits": {
"hits": [
{
"_source": {
"title:": "Un fastfood très connu",
"description": "service très rapide,
rapport qualité/prix médiocre",
"price": 8,
"adresse": "210 route de narbonne, 31520
RAMONVILLE",
"type": "fastfood",
"coord": "43.5536343,1.476165"
}
},{
"_source": {
"title:": "Subway",
"description": "service très rapide,
rapport qualité/prix médiocre mais on peut choisir la
composition de son sandwitch",
"price": 8,
"adresse": "211 route de narbonne, 31520
GET /restaurant/restaurant/_search
{
"query": {
"match_phrase": {
"description": {
"slop": 2,
"query": "service rapide"
}
}
}
52

Mais conteúdo relacionado

Mais procurados

Casos reales usando osint
Casos reales usando osintCasos reales usando osint
Casos reales usando osintQuantiKa14
 
IoT Security, Threats and Challenges By V.P.Prabhakaran
IoT Security, Threats and Challenges By V.P.PrabhakaranIoT Security, Threats and Challenges By V.P.Prabhakaran
IoT Security, Threats and Challenges By V.P.PrabhakaranKoenig Solutions Ltd.
 
Quantum Cryptography
Quantum  CryptographyQuantum  Cryptography
Quantum CryptographyBise Mond
 
Internet of things (IOT) connects physical to digital
Internet of things (IOT) connects physical to digitalInternet of things (IOT) connects physical to digital
Internet of things (IOT) connects physical to digitalEslam Nader
 
Application Layer Protocols for the IoT
Application Layer Protocols for the IoTApplication Layer Protocols for the IoT
Application Layer Protocols for the IoTDamien Magoni
 
IP security Part 1
IP security   Part 1IP security   Part 1
IP security Part 1CAS
 
Internet of things - challenges scopes and solutions
Internet of things - challenges scopes and solutionsInternet of things - challenges scopes and solutions
Internet of things - challenges scopes and solutionsShivam Kumar
 
OCS352-IOT -UNIT-1.pdf
OCS352-IOT -UNIT-1.pdfOCS352-IOT -UNIT-1.pdf
OCS352-IOT -UNIT-1.pdfgopinathcreddy
 
Digital Guardianship in Self-Sovereign Identity
Digital Guardianship in Self-Sovereign IdentityDigital Guardianship in Self-Sovereign Identity
Digital Guardianship in Self-Sovereign IdentityEvernym
 
Creator IoT Framework
Creator IoT FrameworkCreator IoT Framework
Creator IoT FrameworkPaul Evans
 
Internet of Things: A Hands-On Approach
Internet of Things: A Hands-On ApproachInternet of Things: A Hands-On Approach
Internet of Things: A Hands-On ApproachArshdeep Bahga
 
Security issues and solutions : IoT
Security issues and solutions : IoTSecurity issues and solutions : IoT
Security issues and solutions : IoTJinia Bhowmik
 
Internet of Things (IoT)
Internet of Things (IoT)Internet of Things (IoT)
Internet of Things (IoT)Akanksha Prasad
 

Mais procurados (20)

Internet of things
Internet of thingsInternet of things
Internet of things
 
Casos reales usando osint
Casos reales usando osintCasos reales usando osint
Casos reales usando osint
 
IoT Security, Threats and Challenges By V.P.Prabhakaran
IoT Security, Threats and Challenges By V.P.PrabhakaranIoT Security, Threats and Challenges By V.P.Prabhakaran
IoT Security, Threats and Challenges By V.P.Prabhakaran
 
Quantum Cryptography
Quantum  CryptographyQuantum  Cryptography
Quantum Cryptography
 
Lecture 9
Lecture 9Lecture 9
Lecture 9
 
Internet of things (IOT) connects physical to digital
Internet of things (IOT) connects physical to digitalInternet of things (IOT) connects physical to digital
Internet of things (IOT) connects physical to digital
 
Application Layer Protocols for the IoT
Application Layer Protocols for the IoTApplication Layer Protocols for the IoT
Application Layer Protocols for the IoT
 
WPE
WPEWPE
WPE
 
IOT presentation
IOT presentationIOT presentation
IOT presentation
 
Tor the onion router
Tor   the onion routerTor   the onion router
Tor the onion router
 
IP security Part 1
IP security   Part 1IP security   Part 1
IP security Part 1
 
Internet of things - challenges scopes and solutions
Internet of things - challenges scopes and solutionsInternet of things - challenges scopes and solutions
Internet of things - challenges scopes and solutions
 
OCS352-IOT -UNIT-1.pdf
OCS352-IOT -UNIT-1.pdfOCS352-IOT -UNIT-1.pdf
OCS352-IOT -UNIT-1.pdf
 
Introduction to IoT
Introduction to IoTIntroduction to IoT
Introduction to IoT
 
IOT Security
IOT SecurityIOT Security
IOT Security
 
Digital Guardianship in Self-Sovereign Identity
Digital Guardianship in Self-Sovereign IdentityDigital Guardianship in Self-Sovereign Identity
Digital Guardianship in Self-Sovereign Identity
 
Creator IoT Framework
Creator IoT FrameworkCreator IoT Framework
Creator IoT Framework
 
Internet of Things: A Hands-On Approach
Internet of Things: A Hands-On ApproachInternet of Things: A Hands-On Approach
Internet of Things: A Hands-On Approach
 
Security issues and solutions : IoT
Security issues and solutions : IoTSecurity issues and solutions : IoT
Security issues and solutions : IoT
 
Internet of Things (IoT)
Internet of Things (IoT)Internet of Things (IoT)
Internet of Things (IoT)
 

Mais de LINAGORA

Personal branding : e-recrutement et réseaux sociaux professionnels
Personal branding : e-recrutement et réseaux sociaux professionnels Personal branding : e-recrutement et réseaux sociaux professionnels
Personal branding : e-recrutement et réseaux sociaux professionnels LINAGORA
 
Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !LINAGORA
 
ChatBots et intelligence artificielle arrivent dans les banques
ChatBots et intelligence artificielle arrivent dans les banques ChatBots et intelligence artificielle arrivent dans les banques
ChatBots et intelligence artificielle arrivent dans les banques LINAGORA
 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupLINAGORA
 
Advanced Node.JS Meetup
Advanced Node.JS MeetupAdvanced Node.JS Meetup
Advanced Node.JS MeetupLINAGORA
 
Call a C API from Python becomes more enjoyable with CFFI
Call a C API from Python becomes more enjoyable with CFFICall a C API from Python becomes more enjoyable with CFFI
Call a C API from Python becomes more enjoyable with CFFILINAGORA
 
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)LINAGORA
 
Angular v2 et plus : le futur du développement d'applications en entreprise
Angular v2 et plus : le futur du développement d'applications en entrepriseAngular v2 et plus : le futur du développement d'applications en entreprise
Angular v2 et plus : le futur du développement d'applications en entrepriseLINAGORA
 
Angular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraAngular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraLINAGORA
 
Industrialisez le développement et la maintenance de vos sites avec Drupal
Industrialisez le développement et la maintenance de vos sites avec DrupalIndustrialisez le développement et la maintenance de vos sites avec Drupal
Industrialisez le développement et la maintenance de vos sites avec DrupalLINAGORA
 
CapDémat Evolution plateforme de GRU pour collectivités
CapDémat Evolution plateforme de GRU pour collectivitésCapDémat Evolution plateforme de GRU pour collectivités
CapDémat Evolution plateforme de GRU pour collectivitésLINAGORA
 
Présentation du marché P2I UGAP « Support sur Logiciels Libres »
Présentation du marché P2I UGAP « Support sur Logiciels Libres »Présentation du marché P2I UGAP « Support sur Logiciels Libres »
Présentation du marché P2I UGAP « Support sur Logiciels Libres »LINAGORA
 
Offre de demat d'Adullact projet
Offre de demat d'Adullact projet Offre de demat d'Adullact projet
Offre de demat d'Adullact projet LINAGORA
 
La dématérialisation du conseil minicipal
La dématérialisation du conseil minicipalLa dématérialisation du conseil minicipal
La dématérialisation du conseil minicipalLINAGORA
 
Open stack @ sierra wireless
Open stack @ sierra wirelessOpen stack @ sierra wireless
Open stack @ sierra wirelessLINAGORA
 
OpenStack - open source au service du Cloud
OpenStack - open source au service du CloudOpenStack - open source au service du Cloud
OpenStack - open source au service du CloudLINAGORA
 
Architecture d'annuaire hautement disponible avec OpenLDAP
Architecture d'annuaire hautement disponible avec OpenLDAPArchitecture d'annuaire hautement disponible avec OpenLDAP
Architecture d'annuaire hautement disponible avec OpenLDAPLINAGORA
 
Présentation offre LINID
Présentation offre LINIDPrésentation offre LINID
Présentation offre LINIDLINAGORA
 
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...LINAGORA
 
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...LINAGORA
 

Mais de LINAGORA (20)

Personal branding : e-recrutement et réseaux sociaux professionnels
Personal branding : e-recrutement et réseaux sociaux professionnels Personal branding : e-recrutement et réseaux sociaux professionnels
Personal branding : e-recrutement et réseaux sociaux professionnels
 
Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !
 
ChatBots et intelligence artificielle arrivent dans les banques
ChatBots et intelligence artificielle arrivent dans les banques ChatBots et intelligence artificielle arrivent dans les banques
ChatBots et intelligence artificielle arrivent dans les banques
 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - Meetup
 
Advanced Node.JS Meetup
Advanced Node.JS MeetupAdvanced Node.JS Meetup
Advanced Node.JS Meetup
 
Call a C API from Python becomes more enjoyable with CFFI
Call a C API from Python becomes more enjoyable with CFFICall a C API from Python becomes more enjoyable with CFFI
Call a C API from Python becomes more enjoyable with CFFI
 
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
 
Angular v2 et plus : le futur du développement d'applications en entreprise
Angular v2 et plus : le futur du développement d'applications en entrepriseAngular v2 et plus : le futur du développement d'applications en entreprise
Angular v2 et plus : le futur du développement d'applications en entreprise
 
Angular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraAngular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - Linagora
 
Industrialisez le développement et la maintenance de vos sites avec Drupal
Industrialisez le développement et la maintenance de vos sites avec DrupalIndustrialisez le développement et la maintenance de vos sites avec Drupal
Industrialisez le développement et la maintenance de vos sites avec Drupal
 
CapDémat Evolution plateforme de GRU pour collectivités
CapDémat Evolution plateforme de GRU pour collectivitésCapDémat Evolution plateforme de GRU pour collectivités
CapDémat Evolution plateforme de GRU pour collectivités
 
Présentation du marché P2I UGAP « Support sur Logiciels Libres »
Présentation du marché P2I UGAP « Support sur Logiciels Libres »Présentation du marché P2I UGAP « Support sur Logiciels Libres »
Présentation du marché P2I UGAP « Support sur Logiciels Libres »
 
Offre de demat d'Adullact projet
Offre de demat d'Adullact projet Offre de demat d'Adullact projet
Offre de demat d'Adullact projet
 
La dématérialisation du conseil minicipal
La dématérialisation du conseil minicipalLa dématérialisation du conseil minicipal
La dématérialisation du conseil minicipal
 
Open stack @ sierra wireless
Open stack @ sierra wirelessOpen stack @ sierra wireless
Open stack @ sierra wireless
 
OpenStack - open source au service du Cloud
OpenStack - open source au service du CloudOpenStack - open source au service du Cloud
OpenStack - open source au service du Cloud
 
Architecture d'annuaire hautement disponible avec OpenLDAP
Architecture d'annuaire hautement disponible avec OpenLDAPArchitecture d'annuaire hautement disponible avec OpenLDAP
Architecture d'annuaire hautement disponible avec OpenLDAP
 
Présentation offre LINID
Présentation offre LINIDPrésentation offre LINID
Présentation offre LINID
 
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
 
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
 

Último

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 

Último (20)

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 

Comment faire ses mappings ElasticSearch aux petits oignons ? - LINAGORA

  • 2. Indexation d’un annuaire de restaurant ● Titre ● Description ● Prix ● Adresse ● Type 2
  • 3. Création d’un index sans mapping PUT restaurant { "settings": { "index": { "number_of_shards": 3, "number_of_replicas": 2 } } 3
  • 4. Indexation sans mapping PUT restaurant/restaurant/1 { "title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" } 4
  • 5. Risque de l’indexation sans mapping PUT restaurant/restaurant/2 { "title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien" } { "error": { "root_cause": [ { "type": "mapper_parsing_exception", "reason": "failed to parse [title]" } ], "type": "mapper_parsing_exception", "reason": "failed to parse [title]", "caused_by": { "type": "number_format_exception", "reason": "For input string: "Pizza de l'ormeau"" } }, "status": 400 } 5
  • 6. Mapping inféré GET /restaurant/_mapping { "restaurant": { "mappings": { "restaurant": { "properties": { "adresse": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "description": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "prix": { "type": "long" }, "title": { "type": "long" }, "type": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } } } } } 6
  • 7. Création d’un mapping PUT :url/restaurant { "settings": { "index": {"number_of_shards": 3, "number_of_replicas": 2} }, "mappings": { "restaurant": { "properties": { "title": {"type": "text"}, "description": {"type": "text"}, "price": {"type": "integer"}, "adresse": {"type": "text"}, "type": { "type": "keyword"} } } } } 7
  • 8. Indexation de quelques restaurants POST :url/restaurant/restaurant/_bulk {"index": {"_id": 1}} {"title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie"} {"index": {"_id": 2}} {"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien"} {"index": {"_id": 3}} {"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique"} 8
  • 9. Recherche basique GET :url/restaurant/_search { "query": { "match": { "description": "asiatique" } } } { "hits": { "total": 1, "max_score": 0.6395861, "hits": [ { "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } } ] } } 9
  • 10. Mise en défaut de notre mapping GET :url/restaurant/_search { "query": { "match": { "description": "asiatiques" } } } { "hits": { "total": 0, "max_score": null, "hits": [] } } 10
  • 11. Qu’est ce qu’un analyseur ● Transforme une chaîne de caractères en token ○ Ex: “Le chat est rouge” -> [“le”, “chat”, “est”, “rouge”] ● Les tokens permettent de construire un index inversé 11
  • 12. Qu’est ce qu’un index inversé 12
  • 13. Explication: analyseur par défaut GET /_analyze { "analyzer": "standard", "text": "Un restaurant asiatique très copieux" } { "tokens": [{ "token": "un", "start_offset": 0, "end_offset": 2, "type": "<ALPHANUM>", "position": 0 },{ "token": "restaurant", "start_offset": 3, "end_offset": 13, "type": "<ALPHANUM>", "position": 1 },{ "token": "asiatique", "start_offset": 14, "end_offset": 23, "type": "<ALPHANUM>", "position": 2 },{ "token": "très", "start_offset": 24, "end_offset": 28, "type": "<ALPHANUM>", "position": 3 },{ "token": "copieux", "start_offset": 29, "end_offset": 36, "type": "<ALPHANUM>", "position": 4 } ] } 13
  • 14. Explication: analyseur “french” GET /_analyze { "analyzer": "french", "text": "Un restaurant asiatique très copieux" } { "tokens": [ { "token": "restaurant", "start_offset": 3, "end_offset": 13, "type": "<ALPHANUM>", "position": 1 },{ "token": "asiat", "start_offset": 14, "end_offset": 23, "type": "<ALPHANUM>", "position": 2 },{ "token": "trè", "start_offset": 24, "end_offset": 28, "type": "<ALPHANUM>", "position": 3 },{ "token": "copieu", "start_offset": 29, "end_offset": 36, "type": "<ALPHANUM>", "position": 4 } ] } 14
  • 15. Décomposition d’un analyseur Elasticsearch décompose l’analyse en trois étapes: ● Filtrage des caractères (ex: suppression de balises html) ● Découpage en “token” ● Filtrage des tokens: ○ Suppression de token (mot vide de sens “un”, “le”, “la”) ○ Transformation (lemmatisation...) ○ Ajout de tokens (synonyme) 15
  • 16. Décomposition de l’analyseur french GET /_analyze { "tokenizer": "standard", "filter": [ { "type": "elision", "articles_case": true, "articles": [ "l", "m", "t", "qu", "n", "s", "j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu" ] }, { "type": "stop", "stopwords": "_french_" }, { "type": "stemmer", "language": "french" } ], "text": "ce n'est qu'un restaurant asiatique très copieux" } “ce n’est qu’un restaurant asiatique très copieux” [“ce”, “n’est”, “qu’un”, “restaurant”, “asiatique”, “très”, “copieux”] [“ce”, “est”, “un”, “restaurant”, “asiatique”, “très”, “copieux”] [“restaurant”, “asiatique”, “très”, “copieux”] [“restaurant”, “asiat”, “trè”, “copieu”] elision standard tokenizer stopwords french stemming 16
  • 17. Spécification de l’analyseur dans le mapping { "settings": { "index": { "number_of_shards": 3, "number_of_replicas": 2 } }, "mappings": { "restaurant": { "properties": { "title": {fields: {"type": "text", "analyzer": "french"}}, "description": {"type": "text", "analyzer": "french"}, "price": {"type": "integer"}, "adresse": {"type": "text", "analyzer": "french"}, "type": { "type": "keyword"} } } } } 17
  • 18. Recherche résiliente aux erreurs de frappe GET /restaurant/restaurant/_search { "query": { "match": { "description": "asiatuques" } } } { "hits": { "total": 0, "max_score": null, "hits": [] } } 18
  • 19. Une solution le ngram token filter GET /_analyze { "tokenizer": "standard", "filter": [ { "type": "ngram", "min_gram": 3, "max_gram": 7 } ], "text": "asiatuque" } [ "asi", "asia", "asiat", "asiatu", "asiatuq", "sia", "siat", "siatu", "siatuq", "siatuqu", "iat", "iatu", "iatuq", "iatuqu", "iatuque", "atu", "atuq", "atuqu", "atuque", "tuq", "tuqu", "tuque", "uqu", "uque", "que" ] 19
  • 20. Création d’un analyseur custom pour utiliser le ngram filter PUT /restaurant { "settings": { "analysis": { "filter": {"custom_ngram": {"type": "ngram", "min_gram": 3, "max_gram": 7}}, "analyzer": {"ngram_analyzer": {"tokenizer": "standard", "filter": ["asciifolding", "custom_ngram"]}} } }, "mappings": { "restaurant": { "properties": { "title": {"type": "text", "analyzer": "ngram_analyzer"}, "description": {"type": "text", "analyzer": "ngram_analyzer"}, "price": {"type": "integer"}, "adresse": {"type": "text", "analyzer": "ngram_analyzer"}, "type": {"type": "keyword"} } } } 20
  • 21. GET /restaurant/restaurant/_search { "query": { "match": { "description": "asiatuques" } } } { "hits": { "hits": [ { "_score": 0.60128295, "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } }, { "_score": 0.46237043, "_source": { "title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" 21
  • 22. Bruit induit par le ngram GET /restaurant/restaurant/_search { "query": { "match": { "description": "gastronomique" } } } { "hits": { "hits": [ { "_score": 0.6277555, "_source": { "title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" } },{ "_score": 0.56373334, "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } }, 22
  • 23. Spécifier plusieurs analyseurs pour un champs PUT /restaurant { "settings": { "analysis": { "filter": {"custom_ngram": {"type": "ngram", "min_gram": 3, "max_gram": 7}}, "analyzer": {"ngram_analyzer": {"tokenizer": "standard", "filter": ["asciifolding", "custom_ngram"]} } } }, "mappings": { "restaurant": { "properties": { "title": {"type": "text", "analyzer": "french"}, "description": { "type": "text", "analyzer": "french", "fields": { "ngram": { "type": "text", "analyzer": "ngram_analyzer"} }, "price": {"type": "integer"}, 23
  • 24. Utilisation de plusieurs champs lors d’une recherche GET /restaurant/restaurant/_search { "query": { "multi_match": { "query": "gastronomique", "fields": [ "description^4", "description.ngram" ] } } } { "hits": { "hits": [ { "_score": 2.0649285, "_source": { "title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" } }, { "_score": 0 .56373334, "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } }, { "_index": "restaurant", 24
  • 25. Ignorer ou ne pas ignorer les stopwords tel est la question POST :url/restaurant/restaurant/_bulk {"index": {"_id": 1}} {"title": 42, "description": "Un restaurant gastronomique donc cher ou tout plat coûte cher (42 euros)", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie"} {"index": {"_id": 2}} {"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien"} {"index": {"_id": 3}} {"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux et pas cher", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique"} 25
  • 26. Les stopwords ne sont pas forcément vide de sens GET /restaurant/restaurant/_search { "query": { "match_phrase": { "description": "pas cher" } } } { "hits": { "hits": [ { "_source": { "title": 42, "description": "Un restaurant gastronomique donc cher ou tout plat coûte cher (42 euros)", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" } },{ "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux et pas cher", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } } 26
  • 27. Modification de l’analyser french pour garder les stopwords PUT /restaurant { "settings": { "analysis": { "filter": { "french_elision": { "type": "elision", "articles_case": true, "articles": [“l", "m", "t", "qu", "n", "s","j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu"] }, "french_stemmer": {"type": "stemmer", "language": "light_french"} }, "analyzer": { "custom_french": { "tokenizer": "standard", "filter": [ "french_elision", "lowercase", "french_stemmer" ] } 27
  • 28. GET /restaurant/restaurant/_search { "query": { "match_phrase": { "description": "pas cher" } } } { "hits": { "hits": [ { "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux et pas cher", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } } ] } } 28
  • 29. Rechercher avec les stopwords sans diminuer les performances GET /restaurant/restaurant/_search { "query": { "match": { "description": { "query": "restaurant pas cher", "cutoff_frequency": 0.01 } } } } GET /restaurant/restaurant/_search { "query": { "bool": { "must": { "bool": { "should": [ {"term": {"description": "restaurant"}}, {"term": {"description": "cher"}}] } }, "should": [ {"match": { "description": "pas" }} ] } 29
  • 30. Personnaliser le “scoring” GET /restaurant/restaurant/_search { "query": { "function_score": { "query": { "match": { "adresse": "toulouse" } }, "functions": [{ "filter": { "terms": { "type": ["asiatique", "italien"]}}, "weight": 2 }] } } } 30
  • 31. Personnaliser le “scoring” GET /restaurant/restaurant/_search { "query": { "function_score": { "query": { "match": { "adresse": "toulouse" } }, "script_score": { "script": { "lang": "painless", "inline": "_score * ( 1 + 10/doc['prix'].value)" } } } } } { "hits": { "hits": [ { "_score": 0.53484553, "_source": { "title": "Pizza de l'ormeau", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien" } }, { "_score": 0.26742277, "_source": { "title": 42, "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" } }, { "_score": 0.26742277, "_source": { "title": "Chez l'oncle chan", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } } ] } } 31
  • 32. Comment indexer les documents multilingues Trois cas: ● Champs avec plusieurs langages (ex: {"message": "warning | attention | cuidado"}) ○ Ngram ○ Analysé plusieurs fois le même champs avec un analyseur par langage ● Un champ par langue: ○ Facile car on peut spécifier un analyseur différent par langue ○ Attention de ne pas se retrouver avec un index parsemé ● Une version du document par langue (à favoriser) ○ Un index par document ○ Surtout ne pas utiliser des types pour chaque langue dans le même index (problème de statistique) 32
  • 33. Gestion des synonymes PUT /restaurant { "settings": { "analysis": { "filter": { "french_elision": { "type": "elision", "articles_case": true, "articles": ["l", "m", "t", "qu", "n", "s", "j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu"] }, "french_stemmer": {"type": "stemmer", "language": "light_french"}, "french_synonym": {"type": "synonym", "synonyms": ["sou marin => sandwitch", "formul, menu"]} }, "analyzer": { "french_with_synonym": { "tokenizer": "standard", "filter": ["french_elision", "lowercase", "french_stemmer", "french_synonym"] } } } }, "mappings": { "restaurant": { "properties": { "title": {"type": "text", "analyzer": "french"}, "description": { "type": "text", "analyzer": "french", "search_analyzer": "french_with_synonym"}, "price": {"type": "integer"}, "adresse": {"type": "text", "analyzer": "french"}, "coord": {"type": "geo_point"}, 33
  • 34. Gestions des synonymes GET /restaurant/restaurant/_search { "query": { "match": {"description": "sous-marins"} } } { "hits": { "hits": [ { "_source": { "title:": "Subway", "description": "service très rapide, rapport qualité/prix médiocre mais on peut choisir la composition de son sandwitch", "price": 8, "adresse": "211 route de narbonne, 31520 RAMONVILLE", "type": "fastfood", "coord": "43.5577519,1.4625753" } } ] } } 34
  • 35. Données géolocalisées PUT /restaurant { "mappings": { "restaurant": { "properties": { "title": {"type": "text", "analyzer": "french"}, "description": {"type": "text", "analyzer": "french" }, "price": {"type": "integer"}, "adresse": {"type": "text","analyzer": "french"}, "coord": {"type": "geo_point"}, "type": { "type": "keyword"} } } } } 35
  • 36. Données géolocalisées POST restaurant/restaurant/_bulk {"index": {"_id": 1}} {"title": "bistronomique", "description": "Un restaurant bon mais un petit peu cher, les desserts sont excellents", "price": 17, "adresse": "73 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.57417,1.4905748"} {"index": {"_id": 2}} {"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien", "coord": "43.579225,1.4835248"} {"index": {"_id": 3}} {"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14, "adresse": "18 rue des cosmonautetes, 31400 TOULOUSE", "type": "asiatique", "coord": "43.5612759,1.4936073"} {"index": {"_id": 4}} {"title:": "Un fastfood très connu", "description": "service très rapide, rapport qualité/prix médiocre", "price": 8, "adresse": "210 route de narbonne, 31520 RAMONVILLE", "type": "fastfood", "coord": "43.5536343,1.476165"} {"index": {"_id": 5}} {"title:": "Subway", "description": "service très rapide, rapport qualité/prix médiocre mais on peut choisir la composition de son sandwitch", "price": 8, "adresse": "211 route de narbonne, 31520 RAMONVILLE", "type": "fastfood", "coord": "43.5577519,1.4625753"} {"index": {"_id": 6}} {"title:": "L'évidence", "description": "restaurant copieux et pas cher, cependant c'est pas bon", "price": 12, "adresse": "38 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.5770109,1.4846573"} 36
  • 37. Filtrage et trie sur données géolocalisées GET /restaurant/restaurant/_search { "query": { "bool": { "filter": [ {"term": {"type":"français"}}, {"geo_distance": { "distance": "1km", "coord": {"lat": 43.5739329, "lon": 1.4893669} }} ] } }, "sort": [{ "geo_distance": { "coord": {"lat": 43.5739329, "lon": 1.4893669}, "unit": "km" } }] { "hits": { "hits": [ { "_source": { "title": "bistronomique", "description": "Un restaurant bon mais un petit peu cher, les desserts sont "price": 17, "adresse": "73 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.57417,1.4905748" }, "sort": [0.10081529266640063] },{ "_source": { "title:": "L'évidence", "description": "restaurant copieux et pas cher, cependant c'est pas bon", "price": 12, "adresse": "38 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.5770109,1.4846573" }, "sort": [0.510960087579506] },{ "_source": { "title:": "Chez Ingalls", "description": "Contemporain et rustique, ce restaurant avec cheminée sert savoyardes et des grillades", 37
  • 38. Explication de la requête Bool GET /restaurant/restaurant/_search { "query": { "bool": { "must": {"match": {"description": "sandwitch"}}, "should" : [ {"match": {"description": "bon"}}, {"match": {"description": "excellent"}} ], "must_not": [ {"match_phrase": { "description": "pas bon" }} ], "filter": [ {"range": {"price": { "lte": "20" }}} ] } } 38
  • 39. Explication de la requête Bool GET /restaurant/restaurant/_search { "query": { "bool": { "should" : [ {"match": {"description": "bon"}}, {"match": {"description": "excellent"}}, {"match": {"description": "service rapide"}} ], "minimum_number_should_match": 2 } } } 39
  • 40. Proposer une recherche avancé à vos utilisateurs GET /restaurant/restaurant/_search { "query": { "simple_query_string": { "fields": ["description", "title^2", "adresse", "type"], "query": "-"pas bon" +(pizzi~2 OR sandwitch)" } } } GET /restaurant/restaurant/_search { "query": { "bool": { "must_not": { "multi_match": { "fields": [ "description", , "title^2", "adresse", "type"], "type": "phrase", "query": "pas bon" } }, "should": [ {"multi_match": { "fields": [ "description", , "title^2", "adresse", "type"], "fuziness": 2, "max_expansions": 50, "query": "pizzi" } }, {"multi_match": { "fields": [ "description", , "title^2", "adresse", "type"], "query": "sandwitch" } 40
  • 41. Alias: comment se donner des marges de manoeuvre PUT /restaurant_v1/ { "mappings": { "restaurant": { "properties": { "title": {"type": "text"}, "lat": {"type": "double"}, "lon": {"type": "double"} } } } } POST /_aliases { "actions": [ {"add": {"index": "restaurant_v1", "alias": "restaurant_search"}}, {"add": {"index": "restaurant_v1", "alias": "restaurant_write"}} ] } 41
  • 42. Alias, Pipeline et reindexion PUT /restaurant_v2 { "mappings": { "restaurant": { "properties": { "title": {"type": "text", "analyzer": "french"}, "position": {"type": "geo_point"} } } } } PUT /_ingest/pipeline/fixing_position { "description": "move lat lon into position parameter", "processors": [ {"rename": {"field": "lat", "target_field": "position.lat"}}, {"rename": {"field": "lon", "target_field": "position.lon"}} ] } POST /_aliases { "actions": [ {"remove": {"index": "restaurant_v1", "alias": "restaurant_search"}}, {"remove": {"index": "restaurant_v1", "alias": "restaurant_write"}}, {"add": {"index": "restaurant_v2", "alias": "restaurant_search"}}, {"add": {"index": "restaurant_v2", "alias": "restaurant_write"}} ] } POST /_reindex { "source": {"index": "restaurant_v1"}, "dest": {"index": "restaurant_v2", "pipeline": "fixing_position"} } 42
  • 43. Analyse des données des interventions des pompiers de 2005 à 2014 PUT /pompier { "mappings": { "intervention": { "properties": { "date": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss"}, "type_incident": { "type": "keyword" }, "description_groupe": { "type": "keyword" }, "caserne": { "type": "integer"}, "ville": { "type": "keyword"}, "arrondissement": { "type": "keyword"}, "division": {"type": "integer"}, "position": {"type": "geo_point"}, "nombre_unites": {"type": "integer"} } } } } 43
  • 44. Voir les différents incidents GET /pompier/interventions/_search { "size": 0, "aggs": { "type_incident": { "terms": {"field": "type_incident", "size": 100} } } } { "aggregations": { "type_incident": { "buckets": [ {"key": "Premier répondant", "doc_count": 437891}, {"key": "Appel de Cie de détection", "doc_count": 76157}, {"key": "Alarme privé ou locale", "doc_count": 60879}, {"key": "Ac.véh./1R/s.v./ext/29B/D", "doc_count": 41734}, {"key": "10-22 sans feu", "doc_count": 29283}, {"key": "Acc. sans victime sfeu - ext.", "doc_count": 27663}, {"key": "Inondation", "doc_count": 26801}, {"key": "Problèmes électriques", "doc_count": 23495}, {"key": "Aliments surchauffés", "doc_count": 23428}, {"key": "Odeur suspecte - gaz", "doc_count": 21158}, {"key": "Déchets en feu", "doc_count": 18007}, {"key": "Ascenseur", "doc_count": 12703}, {"key": "Feu de champ *", "doc_count": 11518}, {"key": "Structure dangereuse", "doc_count": 9958}, {"key": "10-22 avec feu", "doc_count": 9876}, {"key": "Alarme vérification", "doc_count": 8328}, {"key": "Aide à un citoyen", "doc_count": 7722}, {"key": "Fuite ext.:hydrocar. liq. div.", "doc_count": 7351}, {"key": "Ac.véh./1R/s.v./V.R./29B/D", "doc_count": 6232}, {"key": "Feu de véhicule extérieur", "doc_count": 5943}, {"key": "Fausse alerte 10-19", "doc_count": 4680}, {"key": "Acc. sans victime sfeu - v.r", "doc_count": 3494}, {"key": "Assistance serv. muni.", "doc_count": 3431}, {"key": "Avertisseur de CO", "doc_count": 2542}, {"key": "Fuite gaz naturel 10-22", "doc_count": 1928}, {"key": "Matières dangereuses / 10-22", "doc_count": 1905}, {"key": "Feu de bâtiment", "doc_count": 1880}, {"key": "Senteur de feu à l'extérieur", "doc_count": 1566}, {"key": "Surchauffe - véhicule", "doc_count": 1499}, {"key": "Feu / Agravation possible", "doc_count": 1281}, {"key": "Fuite gaz naturel 10-09", "doc_count": 1257}, {"key": "Acc.véh/1rép/vict/ext 29D04", "doc_count": 1015}, {"key": "Acc. véh victime sfeu - (ext.)", "doc_count": 971}, 44
  • 45. Agrégations imbriquées GET /pompier/interventions/_search { "size": 0, "aggs": { "ville": { "terms": {"field": "ville"}, "aggs": { "arrondissement": { "terms": {"field": "arrondissement"} } } } } } { "aggregations": {"ville": {"buckets": [ { "key": "Montréal", "doc_count": 768955, "arrondissement": {"buckets": [ {"key": "Ville-Marie", "doc_count": 83010}, {"key": "Mercier / Hochelaga-Maisonneuve", "doc_count": 67272}, {"key": "Côte-des-Neiges / Notre-Dame-de-Grâce", "doc_count": 65933}, {"key": "Villeray / St-Michel / Parc Extension", "doc_count": 60951}, {"key": "Rosemont / Petite-Patrie", "doc_count": 59213}, {"key": "Ahuntsic / Cartierville", "doc_count": 57721}, {"key": "Plateau Mont-Royal", "doc_count": 53344}, {"key": "Montréal-Nord", "doc_count": 40757}, {"key": "Sud-Ouest", "doc_count": 39936}, {"key": "Rivière-des-Prairies / Pointe-aux-Trembles", "doc_count": 38139} ]} }, { "key": "Dollard-des-Ormeaux", "doc_count": 17961, "arrondissement": {"buckets": [ {"key": "Indéterminé", "doc_count": 13452}, {"key": "Dollard-des-Ormeaux / Roxboro", "doc_count": 4477}, {"key": "Pierrefonds / Senneville", "doc_count": 10}, {"key": "Dorval / Ile Dorval", "doc_count": 8}, {"key": "Pointe-Claire", "doc_count": 8}, {"key": "Ile-Bizard / Ste-Geneviève / Ste-A-de-B", "doc_count": 6} ]} }, { "key": "Pointe-Claire", "doc_count": 17925, "arrondissement": {"buckets": [ {"key": "Indéterminé", "doc_count": 13126}, {"key": "Pointe-Claire", "doc_count": 4766}, {"key": "Dorval / Ile Dorval", "doc_count": 12}, {"key": "Dollard-des-Ormeaux / Roxboro", "doc_count": 7}, {"key": "Kirkland", "doc_count": 7}, {"key": "Beaconsfield / Baie d'Urfé", "doc_count": 5}, {"key": "Ile-Bizard / Ste-Geneviève / Ste-A-de-B", "doc_count": 1}, {"key": "St-Laurent", "doc_count": 1} 45
  • 46. Calcul de moyenne et trie d'agrégation GET /pompier/interventions/_search { "size": 0, "aggs": { "avg_nombre_unites_general": { "avg": {"field": "nombre_unites"} }, "type_incident": { "terms": { "field": "type_incident", "size": 5, "order" : {"avg_nombre_unites": "desc"} }, "aggs": { "avg_nombre_unites": { "avg": {"field": "nombre_unites"} } } } } { "aggregations": { "type_incident": { "buckets": [ { "key": "Feu / 5e Alerte", "doc_count": 162, "avg_nombre_unites": {"value": 70.9074074074074} }, { "key": "Feu / 4e Alerte", "doc_count": 100, "avg_nombre_unites": {"value": 49.36} }, { "key": "Troisième alerte/autre que BAT", "doc_count": 1, "avg_nombre_unites": {"value": 43.0} }, { "key": "Feu / 3e Alerte", "doc_count": 173, "avg_nombre_unites": {"value": 41.445086705202314} }, { "key": "Deuxième alerte/autre que BAT", "doc_count": 8, "avg_nombre_unites": {"value": 37.5} } ] }, "avg_nombre_unites_general": {"value": 2.1374461758713728} } } 46
  • 47. Percentile GET /pompier/interventions/_search { "size": 0, "aggs": { "unites_percentile": { "percentiles": { "field": "nombre_unites", "percents": [25, 50, 75, 100] } } } } { "aggregations": { "unites_percentile": { "values": { "25.0": 1.0, "50.0": 1.0, "75.0": 3.0, "100.0": 275.0 } } } } 47
  • 48. Histogram GET /pompier/interventions/_search { "size": 0, "query": { "term": {"type_incident": "Inondation"} }, "aggs": { "unites_histogram": { "histogram": { "field": "nombre_unites", "order": {"_key": "asc"}, "interval": 1 }, "aggs": { "ville": { "terms": {"field": "ville", "size": 1} } } } } } { "aggregations": { "unites_histogram": { "buckets": [ { "key": 1.0, "doc_count": 23507, "ville": {"buckets": [{"key": "Montréal", "doc_count": 19417}]} },{ "key": 2.0, "doc_count": 1550, "ville": {"buckets": [{"key": "Montréal", "doc_count": 1229}]} },{ "key": 3.0, "doc_count": 563, "ville": {"buckets": [{"key": "Montréal", "doc_count": 404}]} },{ "key": 4.0, "doc_count": 449, "ville": {"buckets": [{"key": "Montréal", "doc_count": 334}]} },{ "key": 5.0, "doc_count": 310, "ville": {"buckets": [{"key": "Montréal", "doc_count": 253}]} },{ "key": 6.0, "doc_count": 215, "ville": {"buckets": [{"key": "Montréal", "doc_count": 173}]} },{ "key": 7.0, "doc_count": 136, "ville": {"buckets": [{"key": "Montréal", "doc_count": 112}]} },{ "key": 8.0, "doc_count": 35, "ville": {"buckets": [{"key": "Montréal", "doc_count": 30}]} },{ "key": 9.0, "doc_count": 10, "ville": {"buckets": [{"key": "Montréal", "doc_count": 8}]} },{ "key": 10.0, "doc_count": 11, "ville": {"buckets": [{"key": "Montréal", "doc_count": 8}]} },{ "key": 11.0, "doc_count": 2, "ville": {"buckets": [{"key": "Montréal", "doc_count": 2}]} 48
  • 49. “Significant term” GET /pompier/interventions/_search { "size": 0, "query": { "term": {"type_incident": "Inondation"} }, "aggs": { "ville": { "significant_terms": {"field": "ville", "size": 5, "percentage": {}} } } } { "aggregations": { "ville": { "doc_count": 26801, "buckets": [ { "key": "Ile-Bizard", "score": 0.10029498525073746, "doc_count": 68, "bg_count": 678 }, { "key": "Montréal-Nord", "score": 0.0826544804291675, "doc_count": 416, "bg_count": 5033 }, { "key": "Roxboro", "score": 0.08181818181818182, "doc_count": 27, "bg_count": 330 }, { "key": "Côte St-Luc", "score": 0.07654825526563974, "doc_count": 487, "bg_count": 6362 }, { "key": "Saint-Laurent", "score": 0.07317073170731707, "doc_count": 465, "bg_count": 6355 49
  • 50. Agrégation et données géolocalisées GET :url/pompier/interventions/_search { "size": 0, "query": { "regexp": {"type_incident": "Feu.*"} }, "aggs": { "distance_from_here": { "geo_distance": { "field": "position", "unit": "km", "origin": { "lat": 45.495902, "lon": -73.554263 }, "ranges": [ { "to": 2}, {"from":2, "to": 4}, {"from":4, "to": 6}, {"from": 6, "to": 8}, {"from": 8}] } } } { "aggregations": { "distance_from_here": { "buckets": [ { "key": "*-2.0", "from": 0.0, "to": 2.0, "doc_count": 80 }, { "key": "2.0-4.0", "from": 2.0, "to": 4.0, "doc_count": 266 }, { "key": "4.0-6.0", "from": 4.0, "to": 6.0, "doc_count": 320 }, { "key": "6.0-8.0", "from": 6.0, "to": 8.0, "doc_count": 326 }, { "key": "8.0-*", "from": 8.0, "doc_count": 1720 } ] } } } 50
  • 51. Il y a t-il des questions ? ? 51
  • 52. Proposer une recherche avancé à vos utilisateurs GET /restaurant/restaurant/_search { "query": { "simple_query_string": { "fields": ["description", "title^2", "adresse", "type"], "query": ""service rapide"~2" } } } "hits": { "hits": [ { "_source": { "title:": "Un fastfood très connu", "description": "service très rapide, rapport qualité/prix médiocre", "price": 8, "adresse": "210 route de narbonne, 31520 RAMONVILLE", "type": "fastfood", "coord": "43.5536343,1.476165" } },{ "_source": { "title:": "Subway", "description": "service très rapide, rapport qualité/prix médiocre mais on peut choisir la composition de son sandwitch", "price": 8, "adresse": "211 route de narbonne, 31520 GET /restaurant/restaurant/_search { "query": { "match_phrase": { "description": { "slop": 2, "query": "service rapide" } } } 52