sphinx mysql 增量索引
一,增量索引
就个人理解增量索引到底是干什么的,怎么用?看下图
简单解释一下,向一个数据表插入数据时,这些新插入的数据,就是增量了,sphinx是根据索引来查找数据的,如果索引没有更新,新增数据是查不出来的,所以我们要更新主索引,更新增量索引,增量条件的设定就比较重要了。我在网上看到一些增量索引的做法,并亲自尝试了一下,发现在一些问题:
- source src1
- {
- sql_query_pre = SET NAMES utf8
- sql_query_pre = SET SESSION query_cache_type=OFF
- sql_query_pre = REPLACE INTO sph_counter SELECT 1, MAX(id) FROM documents
- sql_query = SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content FROM documents \
- WHERE id<=( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
- … //其他可以默认
- }
- // 注意:sql_query_pre的个数需和src1对应,否则可能搜索不出相应结果
- source src1throttled : src1
- {
- sql_ranged_throttle = 100
- sql_query_pre = SET NAMES utf8
- sql_query_pre = SET SESSION query_cache_type=OFF
- sql_query_pre =
- sql_query = SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content FROM documents \
- WHERE id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
- }
- index test1 //主索引
- {
- source = src1
- …
- }
- index test1stemmed : test1 //增量索引
- {
- source = src1throttled
- …
- }
因为WHERE id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 ),这里肯定是等于或者小于而不是大于
2,现在我把WHERE id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )这里的大于改成等于,又出现了一个问题,如下操作:
INSERT INTO `documents` ( `group_id`, `group_id2`, `date_added`, `title`, `content`) VALUES
(1, 5, '2009-08-15 19:13:27', 'test one', 'this is my test document number one. also checking search within phrases.'),
(1, 6, '2009-08-15 19:13:27', 'test two', 'this is my test document number two')
这里我一次插入了二条数据,但是增量条件中,只记录最后一个插入的数据,问题来了.
[root@BlackGhost log]# /usr/local/sphinx/bin/indexer --rotate --config /usr/local/sphinx/etc/sphinx.conf test1stemmed
Sphinx 0.9.8-rc2 (r1234)
Copyright (c) 2001-2008, Andrew Aksyonoff
using config file '/usr/local/sphinx/etc/sphinx.conf'...
indexing index 'test1stemmed'...
collected 1 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 1 docs, 10 bytes
total 0.010 sec, 1000.00 bytes/sec, 100.00 docs/sec
rotating indices: succesfully sent SIGHUP to searchd (pid=10807).
更新增量索引,他只更新了最后一条,之前的那个条件,如果在没有更新主索引的情况下,就消失了。
三,解决方法
1,存放增量条件的表中,不存放单条数据的ID,而是存放数据生成的时间段,我自已设的一个小时,就是记录生成时间的小时段。sphinx根小时段来更数据表中新增加的数据,你可设置3分钟,更新一次,时间自己定。
[root@BlackGhost conf]# /usr/local/sphinx/bin/indexer --rotate --config /usr/loal/sphinx/etc/sphinx.conf test1stemmed
Sphinx 0.9.8-rc2 (r1234)
Copyright (c) 2001-2008, Andrew Aksyonoff
using config file '/usr/local/sphinx/etc/sphinx.conf'...
indexing index 'test1stemmed'...
collected 22 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 22 docs, 950 bytes
total 0.026 sec, 36187.72 bytes/sec, 838.03 docs/sec
四,参考文档
http://www.sphinxsearch.com/docs/current.html#features
http://www.jcan.19dog.com/sphinx/#Install.linux
http://os.cnfan.net/linux/2421_3.html