文章字数：1511,阅读全文大约需要6分钟

ElasticSearch是一款使用java开发基于Lucene的开箱即用的全文搜索引擎。使用REST API操作接口。整理自阮一峰的教程

安装

Java8环境

安装ElasticSearch-5.5.1

# 下载压缩包到当前目录
$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.1.zip
# 解压缩
$ unzip elasticsearch-5.5.1.zip
# 进入目录
$ cd elasticsearch-5.5.1/

中文分词插件ik

1
2
3

# 使用elastic插件工具下载安装插件
 ./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.5.1/elasticsearch-analysis-ik-5.5.1.zip
# 重启Elastic即可安装

开始使用

启动
1
./bin/elasticsearch
解决max virtual memory areas vm.maxmapcount [65530] is too low
1
$ sudo sysctl -w vm.max_map_count=262144

查看信息

# 默认9200端口运行
$ curl localhost:9200

{
  "name" : "atntrTf",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "tf9250XhQ6ee4h7YI11anA",
  "version" : {
    "number" : "5.5.1",
    "build_hash" : "19c13d0",
    "build_date" : "2017-07-18T20:44:24.823Z",
    "build_snapshot" : false,
    "lucene_version" : "6.6.0"
  },
  "tagline" : "You Know, for Search"
}

允许非本机访问
修改config/elasticsearch.yml去除network.host注释，修改值

# 任何人都能访问（不推荐）
network.host: 0.0.0.0
# 指定ip
network.host: 192.168.1.1

基本概念

cluster集群，多个Elastic实例组成一个集群cluster。实例可以运行在多个/同一个服务器上。
Node节点，一个Elastic实例就是一个节点。一组节点构成一个集群。
Index索引，查找数据的顶层单位。Elastic会索引所有字段，经过处理后写入一个反向索引Inverted Index。Index可以理解成单个数据库。
Index名字必须是小写
Document文档，Index里的单条记录成为Document文档。许多文档构成了Index。文档是json类型表示，可以理解为一个json对象。
1
2
3
4
5
{
"title":"Elastic分布式全文搜索引擎",
"keyword":"java,search",
"body":"这是内容"
}
同一个文档(Doucment)最好结构(scheme)相同*
Type分组、类型，Document可以进行分组。根据虚拟逻辑进行分组，比如技术博问和散文博文，用来过滤Documeent。

根据规划，Elastic6 版本之循序Index中包好一个Type, 7版本将移除type。

操作

新建Index

1 2	# 发送PUT请求创建Index curl -X PUT 'localhost:9200/weather'

服务器返回

# acknowledged表示操作成功
{
   "acknowledged":true,
   "shards_acknowledged":true
}

删除Index

1 2	# 发送DELETE请求删除 curl -X DELETE 'localhost:9200/weather'

设置Index详细内容

# 创建叫accounts的Index
curl -X PUT 'localhost:9200/accounts' -d '
{
    "mappings":{
        // 有一个type叫person
        "TypePerson":{
            "properties":{
                //字段1 User
                "user":{
                    "type":"text",//文本类型
                    "analyzer":"ik_max_word",//字段分词器使用ik提供的文本最大数量分词。
                    "search_analyzer":"ik_max_word"//搜索分词器
                }，
                //字段2 Title
                "title":{
                    ...同上
                }
            }
        }
    }
}'

新增记录(指定id)

# 向accounts的person分组插入(PUT)
# 1为此记录的id,任意字符。
$ curl -X PUT 'localhost:9200/accounts/person/1' -d '
{
  "user": "张三",
  "title": "工程师"
}'

{
  "_index":"accounts",
  "_type":"person",
  "_id":"1",
  "_version":1,
  "result":"created",
  "_shards":{"total":2,"successful":1,"failed":0},
  "created":true
}

新增记录(不指定id)

# 使用POST,服务器随机生成字符串形式的id
$ curl -X POST 'localhost:9200/accounts/person' -d '
{
  "user": "李四",
  "title": "工程师",
  "desc": "系统管理"
}'

如果没有创建Index直接插入会自动生成指定的Index。

查看记录

1 2	# 使用GET `Index/Type/Id`查看，pretty表示以易读格式返回。 $ curl 'localhost:9200/accounts/person/1?pretty=true'

{
  "_index" : "accounts",
  "_type" : "person",
  "_id" : "1",
  "_version" : 1,
  "found" : true,//没找到 false
  "_source" : {
    "user" : "张三",
    "title" : "工程师",
    "desc" : "数据库管理"
  }
}

删除记录

1 2	# delete id curl -X DELETE 'localhost:9200/accounts/person/1'

更新记录

1
2
3

# 重新发送一次就可以了
curl -X PUT 'localhost:9200/accounts/person/1' -d 
'...'

1
2
3

"_version" : 2,//版本+1
"result" : "updated",//结果从创建变成修改
"created" : false//created 变成 false

查询所有数据

1 2	# GET请求直接访问 $ curl 'localhost:9200/accounts/person/_search'

{
  "took":2,
  "timed_out":false,
  "_shards":{"total":5,"successful":5,"failed":0},
  "hits":{
    "total":2,
    "max_score":1.0,
    "hits":[
      {
        "_index":"accounts",
        "_type":"person",
        "_id":"AV3qGfrC6jMbsbXb6k1p",
        "_score":1.0,
        "_source": {
          "user": "李四",
          "title": "工程师",
          "desc": "系统管理"
        }
      },
      {
        "_index":"accounts",
        "_type":"person",
        "_id":"1",
        "_score":1.0,
        "_source": {
          "user" : "张三",
          "title" : "工程师",
          "desc" : "数据库管理，软件开发"
        }
      }
    ]
  }
}

took操作耗时(毫秒)、timed_out是否超时、hits命中的记录
hits字段含义

total：返回记录数，本例是2条。
max_score：最高的匹配程度，本例是1.0。
hits：返回的记录组成的数组。
hits下_score表示匹配的程序，按照这个字段排序的

全文搜索

# 查询desc字段中包含软件这个词的
$ curl 'localhost:9200/accounts/person/_search'  -d '
{
  "query" : { "match" : { "desc" : "软件" }}
}'

指定全文搜索结果条数

# size 1 指定返回一条结果
$ curl 'localhost:9200/accounts/person/_search'  -d '
{
  "query" : { "match" : { "desc" : "管理" }},
  "size": 1
}'

位移/跳过指定数量的结果

$ curl 'localhost:9200/accounts/person/_search'  -d '
{
  "query" : { "match" : { "desc" : "管理" }},
  "from": 1,//跳过1条
  "size": 1
}'

条件或、条件且
或

# 软件或是系统
$ curl 'localhost:9200/accounts/person/_search'  -d '
{
  "query" : { "match" : { "desc" : "软件 系统" }}
}'

且

# desc 拥有软件关键词同时也要拥有系统关键词
$ curl 'localhost:9200/accounts/person/_search'  -d '
{
  "query": {
    "bool": {
      "must": [
        { "match": { "desc": "软件" } },
        { "match": { "desc": "系统" } }
      ]
    }
  }
}'

后台运行

linux通用方法

#  nohup表示不依赖终端，可以在同一个终端继续其他事情
# &不依赖于用户
nohup bin/elasticsearch &
# 查看日志
tail -fn 200 nohup.out
# 重定向输出信息 >/dev/null->将所有正确输出都保存到null文件中(抛弃) 2>&1 ->2即错误信息，重定向到1(正确信息中)
nohup bin/elasticsearch >/dev/null 2>&1 &

Elastic参数

1 2	# -d 代表后台运行 bin/elasticsearch -d

关闭后台程序

# 查找进程
ps -ef|grep elastic
# 杀死进程
kill 41496

模糊查询+分页

1	Page<User> findByUsernameContainingOrderByCreateTimeDesc(String username,Pageable pageable);

调用

1 2	Pageable page = PageRequest.of(0 , 2); Page<User> list = userRepository.findByUsernameContainingOrderByCreateTimeDesc("丽",page);

优化

Elastic数据存储在磁盘中，查询的时候会自动提取到Filesystem Cache中。增加Filesystem Cache(内存)的容量，以及预知会有大量访问的数据自己提前定时查询，使其写入内存。热数据冷数据分离，热数据不会被冷数据挤下去。
join/nested/parent-child 这些操作尽量不要做
分页默认的性能很低，越往深层越慢。使用 Scroll API，滚动刷新(不能跳页)
配合其他数据库使用，es查询会查询所有数据，但是大部分可能没用。可以将需要检索的字段和id存入ES,再用id在mysql/HBase中取出其他的。