ElasticSearch (4) – 内置分词器
- Standard Analyzer – 默认分词器,按词切分,小写处理
- Simple Analyzer – 按照非字母切分(符号被过滤),小写处理
- Stop Analyzer – 小写处理,停用过滤词(the, a, is)
- Whitespace Analyzer – 按照空格切分,不转小写
- Keyword Analyzer – 不分词,直接将输入当作输出
- Pattern Analyzer – 正则表达式,默认 \W+ (非字符分隔)
- Language – 提供了30多种常见语言的分词器
- Custom Analyzer 自定义分词器
#whitespace分词测试
GET _analyze
{
"analyzer": "whitespace",
"text":"he is-a boy"
}
# 响应如下:
{
"tokens" : [
{
"token" : "he",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 0
},
{
"token" : "is-a",
"start_offset" : 3,
"end_offset" : 7,
"type" : "word",
"position" : 1
},
{
"token" : "boy",
"start_offset" : 8,
"end_offset" : 11,
"type" : "word",
"position" : 2
}
]
}
![]()
Facebook评论