• 云途科技成立于2010年 - 专注全球跨境电商服务器租赁托管!
  • 帮助中心

    您可以通过下方搜索框快速查找您想知道的问题

    elasticsearch 拼音 中文 分词 混合使用

      in  unix      Tags: 

    前一篇文章说了IK中文分词,其实想实现的目的,就是拼音和中文都搜索到东西。类似百度搜索框的输入提示,淘宝搜索框的输入提示。

    1,安装配置analysis-pinyin

    //下载
    $ git clone https://github.com/medcl/elasticsearch-analysis-pinyin.git
    $ cd elasticsearch-analysis-pinyin
    $ git branch -a
    * master //主分支是6.2.3,对应 es6.2.3
     remotes/origin/0.16.x
     remotes/origin/1.x
     remotes/origin/2.x
     remotes/origin/5.3.x
     remotes/origin/5.x
     remotes/origin/6.1.x
     remotes/origin/HEAD -> origin/master
     remotes/origin/master
    
    $ mvn package  //打包
    
    $ ll target/releases/
    total 4400
    drwxr-xr-x 3 zhangying staff 102 4 24 13:46 ./
    drwxr-xr-x 11 zhangying staff 374 4 24 13:32 ../
    -rw-r--r-- 1 zhangying staff 4501993 4 24 13:32 elasticsearch-analysis-pinyin-6.2.3.zip
    
    $ cd target/releases/ && unzip elasticsearch-analysis-pinyin-6.2.3.zip
    
    $ brew info elasticsearch
    elasticsearch: stable 6.2.3, HEAD
    Distributed search & analytics engine
    
    https://www.elastic.co/products/elasticsearch
    
    /usr/local/Cellar/elasticsearch/6.2.3 (112 files, 30.8MB) *
     Built from source on 2018-04-24 at 14:17:01
    From: https://github.com/Homebrew/homebrew-core/blob/master/Formula/elasticsearch.rb
    ==> Requirements
    Required: java = 1.8 ✔
    ==> Options
    --HEAD
     Install HEAD version
    ==> Caveats
    Data: /usr/local/var/lib/elasticsearch/elasticsearch_zhangying/
    Logs: /usr/local/var/log/elasticsearch/elasticsearch_zhangying.log
    Plugins: /usr/local/var/elasticsearch/plugins/   //插件地址
    Config: /usr/local/etc/elasticsearch/
    
    To have launchd start elasticsearch now and restart at login:
     brew services start elasticsearch
    Or, if you don't want/need a background service you can just run:
     elasticsearch
    
    //将mvn后的插件copy到es插件目录
    $ mv elasticsearch /usr/local/var/elasticsearch/plugins/pinyin
    
    $ elasticsearch  //启动
    

    2,测试pinyin分词

    2.1,测试分词

    $ curl -XPOST 'http://localhost:9200/pinyin/_analyze?pretty=true' -H 'Content-Type: application/json' -d '
    > {
    > "analyzer":"pinyin",
    > "text":"gaotie"
    > }'
    {
     "tokens" : [
     {
     "token" : "gao",
     "start_offset" : 0,
     "end_offset" : 0,
     "type" : "word",
     "position" : 0
     },
     {
     "token" : "gaotie",
     "start_offset" : 0,
     "end_offset" : 0,
     "type" : "word",
     "position" : 0
     },
     {
     "token" : "tie",
     "start_offset" : 0,
     "end_offset" : 0,
     "type" : "word",
     "position" : 1
     }
     ]
    }
    
    $ curl -XPOST 'http://localhost:9200/pinyin/_analyze?pretty=true' -H 'Content-Type: application/json' -d '
    > {
    > "analyzer":"pinyin",
    > "text":"高铁"
    > }'
    {
     "tokens" : [
     {
     "token" : "gao",
     "start_offset" : 0,
     "end_offset" : 0,
     "type" : "word",
     "position" : 0
     },
     {
     "token" : "gt",
     "start_offset" : 0,
     "end_offset" : 0,
     "type" : "word",
     "position" : 0
     },
     {
     "token" : "tie",
     "start_offset" : 0,
     "end_offset" : 0,
     "type" : "word",
     "position" : 1
     }
     ]
    }

    从上面可以看出,pinyin分词对pinyin和中文都能分的,并且分出来的结果还不一样。

    2.2,创建索引,mapping,插入数据

    curl -XPUT "http://127.0.0.1:9200/pinyin?pretty"
    curl -XPOST "http://127.0.0.1:9200/pinyin/test/_mapping?pretty" -H "Content-Type: application/json" -d '
    {
        "test": {
                "_all":{
                  "enabled":false
                },
                "properties": {
                    "id": {
                        "type": "integer"
                    },
                    "username": {
                        "type": "text",
                        "analyzer": "pinyin"
                    },
                    "description": {
                        "type": "text",
                        "analyzer": "pinyin"
                    }
                }
            }
      }
    '
    curl -XPOST "http://127.0.0.1:9200/pinyin/test/?pretty"  -H "Content-Type: application/json" -d '
    {
        "id" : 1,
        "username" :  "中国高铁速度很快",
        "description" :  "如果要修改一个字段的类型"
    }'
    
    curl -XPOST "http://127.0.0.1:9200/pinyin/test/?pretty"   -H "Content-Type: application/json" -d '
    {
        "id" : 2,
        "username" :  "动车和复兴号,都属于高铁",
        "description" :  "现在想要修改为string类型"
    }'

    2.3,全拼音测试

    $ curl -XPOST "http://127.0.0.1:9200/pinyin/test/_search?pretty"  -H "Content-Type: application/json"  -d '
    > {
    >     "query": {
    >         "match": {
    >             "username": "gao tie"
    >         }
    >     }
    > }
    > '
    {
      "took" : 13,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : 2,
        "max_score" : 0.4039931,
        "hits" : [
          {
            "_index" : "pinyin",
            "_type" : "test",
            "_id" : "TGZ2AWMBlEkarXCPb7ED",
            "_score" : 0.4039931,
            "_source" : {
              "id" : 1,
              "username" : "中国高铁速度很快",
              "description" : "如果要修改一个字段的类型"
            }
          },
          {
            "_index" : "pinyin",
            "_type" : "test",
            "_id" : "TWZ2AWMBlEkarXCPb7En",
            "_score" : 0.35767543,
            "_source" : {
              "id" : 2,
              "username" : "动车和复兴号,都属于高铁",
              "description" : "现在想要修改为string类型"
            }
          }
        ]
      }
    }

    2.3,拼音分词,汉字搜索

    $ curl -XPOST "http://127.0.0.1:9200/pinyin/test/_search?pretty"  -H "Content-Type: application/json"  -d '
    > {
    >     "query": {
    >         "match": {
    >             "username": "中国高铁"
    >         }
    >     }
    > }
    > '
    {
      "took" : 3,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : 2,
        "max_score" : 1.9398875,
        "hits" : [
          {
            "_index" : "pinyin",
            "_type" : "test",
            "_id" : "TGZ2AWMBlEkarXCPb7ED",
            "_score" : 1.9398875,
            "_source" : {
              "id" : 1,
              "username" : "中国高铁速度很快",
              "description" : "如果要修改一个字段的类型"
            }
          },
          {
            "_index" : "pinyin",
            "_type" : "test",
            "_id" : "TWZ2AWMBlEkarXCPb7En",
            "_score" : 0.35767543,
            "_source" : {
              "id" : 2,
              "username" : "动车和复兴号,都属于高铁",
              "description" : "现在想要修改为string类型"
            }
          }
        ]
      }
    }

    2.4,部分首字母

    $ curl -XPOST "http://127.0.0.1:9200/pinyin/test/_search?pretty"  -H "Content-Type: application/json"  -d '
    > {
    >     "query": {
    >         "match": {
    >             "username": "Gaot"
    >         }
    >     }
    > }
    > '
    {
      "took" : 5,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : 2,
        "max_score" : 0.20199655,
        "hits" : [
          {
            "_index" : "pinyin",
            "_type" : "test",
            "_id" : "TGZ2AWMBlEkarXCPb7ED",
            "_score" : 0.20199655,
            "_source" : {
              "id" : 1,
              "username" : "中国高铁速度很快",
              "description" : "如果要修改一个字段的类型"
            }
          },
          {
            "_index" : "pinyin",
            "_type" : "test",
            "_id" : "TWZ2AWMBlEkarXCPb7En",
            "_score" : 0.17883772,
            "_source" : {
              "id" : 2,
              "username" : "动车和复兴号,都属于高铁",
              "description" : "现在想要修改为string类型"
            }
          }
        ]
      }
    }
    
    //同上
    $ curl -XPOST "http://127.0.0.1:9200/pinyin/test/_search?pretty"  -H "Content-Type: application/json"  -d '
    {
        "query": {
            "match": {
                "username": "gtie"
            }
        }
    }
    '

    2.5,全首字母搜索

    $ curl -XPOST "http://127.0.0.1:9200/pinyin/test/_search?pretty"  -H "Content-Type: application/json"  -d '
    > {
    >     "query": {
    >         "match": {
    >             "username": "gt"
    >         }
    >     }
    > }
    > '
    {
      "took" : 3,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : 0,
        "max_score" : null,
        "hits" : [ ]
      }
    }

    全首字母高铁(gt),没有搜索到东西。

    3,拼音分词和中文分词混合使用

    3.1,自定义analyzer,并设置过滤器

    $ curl -XPUT "http://localhost:9200/pinyin_ik/?pretty" -H "Content-Type: application/json" -d'
    {
        "index": {
            "analysis": {
                "analyzer": {
                    "ik_pinyin_analyzer": {
                        "type": "custom",
                        "tokenizer": "ik_max_word",
                        "filter": ["my_pinyin", "word_delimiter"]
                    }
                },
                "filter": {
                    "my_pinyin": {
                        "type": "pinyin"
                    }
                }
            }
        }
    }'
    
    $ curl -XPOST "http://127.0.0.1:9200/pinyin_ik/test/_mapping?pretty" -H "Content-Type: application/json" -d '
    {
        "test": {
                "_all":{
                  "enabled":false
                },
                "properties": {
                    "id": {
                        "type": "integer"
                    },
                    "username": {
                        "type": "text",
                        "analyzer": "ik_pinyin_analyzer"
                    },
                    "description": {
                        "type": "text",
                        "analyzer": "ik_pinyin_analyzer"
                    }
                }
            }
      }
    '  
    
    $ curl -XPOST "http://127.0.0.1:9200/pinyin_ik/test/?pretty"  -H "Content-Type: application/json" -d '
    {
        "id" : 1,
        "username" :  "中国高铁速度很快",
        "description" :  "如果要修改一个字段的类型"
    }'
    
    $ curl -XPOST "http://127.0.0.1:9200/pinyin_ik/test/?pretty"   -H "Content-Type: application/json" -d '
    {
        "id" : 2,
        "username" :  "动车和复兴号,都属于高铁",
        "description" :  "现在想要修改为string类型"
    }'

    3.2,全首字母搜索

    $ curl -XPOST "http://127.0.0.1:9200/pinyin_ik/test/_search?pretty"  -H "Content-Type: application/json"  -d '
    > {
    >     "query": {
    >         "match": {
    >             "username": "gt"
    >         }
    >     }
    > }
    > '
    {
      "took" : 2,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : 2,
        "max_score" : 0.6935897,
        "hits" : [
          {
            "_index" : "pinyin_ik",
            "_type" : "test",
            "_id" : "S2ZzAWMBlEkarXCPu7Hq",
            "_score" : 0.6935897,
            "_source" : {
              "id" : 2,
              "username" : "动车和复兴号,都属于高铁",
              "description" : "现在想要修改为string类型"
            }
          },
          {
            "_index" : "pinyin_ik",
            "_type" : "test",
            "_id" : "SmZzAWMBlEkarXCPubHw",
            "_score" : 0.6827974,
            "_source" : {
              "id" : 1,
              "username" : "中国高铁速度很快",
              "description" : "如果要修改一个字段的类型"
            }
          }
        ]
      }
    }


    • 外贸虚拟主机

      1GB硬盘

      2个独立站点

      1000M带宽

      不限制流量

      美国外贸专用虚拟主机,cPanel面板,每天远程备份.
      服务器配置:2*E5 32核,96GB 内存,4*2TB 硬盘 RAID10 阵列.

      ¥180/年

    • 美国/荷兰外贸VPS

      2核CPU

      1G内存

      30硬盘

      10M带宽

      美国/荷兰外贸云服务器,专注外贸服务器行业12年.
      服务器配置:2*E5 32核,96GB 内存,4*2TB 硬盘 RAID10 阵列.

      ¥99/月

    • 全球外贸服务器

      8核CPU

      32G内存

      1TB硬盘

      1000M带宽

      已部署数据中心:美国洛杉矶/亚特兰大、荷兰、加拿大、英国伦敦、德国、拉脱维亚、瑞典、爱沙尼亚
      自有机柜(全球九大数据中心),稳定在线率:99.9%

      ¥999/月 原价1380

    7*24小时 在线提交工单

    如果您的问题没有得到解决,推荐您在线提交工单,我们的客服人员会第一时间为您解决问题

    展开