一つ前のPOSTで検証したデータでそのまま、NGRAMにてIN BOOLEAN MODEを使い検索してみました。
Natural Language
自然言語処理
By default or with the IN NATURAL LANGUAGE MODE modifier,
the MATCH() function performs a natural language search for a string against a text collection.
– text searched for is converted to a union of n-gram values. For example, ‘sql’ is converted to ‘sq ql’ (with a default token size of 2 or bigram).
Boolean
真(true)と偽(false)の2種類の値だけを扱う型 (“+”、”-“で条件設定可能)
MySQL can perform boolean full-text searches using the IN BOOLEAN MODE modifier.
With this modifier, certain characters have special meaning at the beginning or end of words in the search string.
– text searched for is converted to an n-gram phrase search. For example, ‘sql’ is converted to ‘”sq ql”‘:
■ NATURAL LANGUAGE MODEとBOOLEAN MODEでの結果の違い
root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH (title) AGAINST ('sql' IN NATURAL LANGUAGE MODE); +----------+-------+ | FTS_N_ID | title | +----------+-------+ | 1 | mysql | | 2 | MYSQL | | 3 | MySQL | | 9 | sq | | 11 | ql | +----------+-------+ 5 rows in set (0.00 sec) root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('sql' IN BOOLEAN MODE); +----------+-------+ | FTS_N_ID | title | +----------+-------+ | 1 | mysql | | 2 | MYSQL | | 3 | MySQL | +----------+-------+ 3 rows in set (0.00 sec) root@localhost [ngram]>
■以下,BOOLEANモードの挙動についての確認
– text searched
– wildcard searches
– phrase searches
text searched for is converted to an n-gram phrase search.
For example, ‘sql’ is converted to ‘”sq ql”‘:
12.9.2 Boolean Full-Text Searches
http://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html
基本設定と検証用データ
root@localhost [ngram]> show variables like 'ngram_token_size'; +------------------+-------+ | Variable_name | Value | +------------------+-------+ | ngram_token_size | 2 | +------------------+-------+ 1 row in set (0.00 sec) root@localhost [ngram]> select * from N_DEMO; +----------+-----------------------------+ | FTS_N_ID | title | +----------+-----------------------------+ | 1 | mysql | | 2 | MYSQL | | 3 | MySQL | | 4 | マイエスキューエル | | 5 | マイエスキューエル | | 6 | まいえすきゅーえる | | 7 | まい | | 8 | えす | | 9 | sq | | 10 | sl | | 11 | ql | +----------+-----------------------------+ 11 rows in set (0.00 sec)
■ example)
‘s*’はsを含むすべてのデータ
root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('s*' IN BOOLEAN MODE); +----------+-------+ | FTS_N_ID | title | +----------+-------+ | 1 | mysql | | 2 | MYSQL | | 3 | MySQL | | 9 | sq | | 10 | sl | +----------+-------+ 5 rows in set (0.00 sec) root@localhost [ngram]>
‘sq*’ is converted to ‘”sq”‘
root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('sq*' IN BOOLEAN MODE); +----------+-------+ | FTS_N_ID | title | +----------+-------+ | 1 | mysql | | 2 | MYSQL | | 3 | MySQL | | 9 | sq | +----------+-------+ 4 rows in set (0.00 sec) root@localhost [ngram]>
‘sql*’ is equivalent to ‘”sq ql”‘:
root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('sql*' IN BOOLEAN MODE); +----------+-------+ | FTS_N_ID | title | +----------+-------+ | 1 | mysql | | 2 | MYSQL | | 3 | MySQL | +----------+-------+ 3 rows in set (0.00 sec) root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('"sq ql"' IN BOOLEAN MODE); +----------+-------+ | FTS_N_ID | title | +----------+-------+ | 1 | mysql | | 2 | MYSQL | | 3 | MySQL | +----------+-------+ 3 rows in set (0.00 sec) root@localhost [ngram]>
■ データを追加して日本語で確認
挙動は同じで、対象とするデータを抽出出来る事を確認しました。
データを増やして、もっと試してみる必要あり。
root@localhost [ngram]> select * from N_DEMO; +----------+--------------------------------+ | FTS_N_ID | title | +----------+--------------------------------+ | 1 | mysql | | 2 | MYSQL | | 3 | MySQL | | 4 | マイエスキューエル | | 5 | マイエスキューエル | | 6 | まいえすきゅーえる | | 7 | まい | | 8 | えす | | 9 | sq | | 10 | sl | | 11 | ql | | 12 | まいーえすきゅーえる | | 13 | まいえーすきゅーえる | +----------+--------------------------------+ 13 rows in set (0.00 sec) root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('ま*' IN BOOLEAN MODE); +----------+--------------------------------+ | FTS_N_ID | title | +----------+--------------------------------+ | 6 | まいえすきゅーえる | | 7 | まい | | 12 | まいーえすきゅーえる | | 13 | まいえーすきゅーえる | +----------+--------------------------------+ 4 rows in set (0.01 sec) root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('まい*' IN BOOLEAN MODE); +----------+--------------------------------+ | FTS_N_ID | title | +----------+--------------------------------+ | 6 | まいえすきゅーえる | | 7 | まい | | 12 | まいーえすきゅーえる | | 13 | まいえーすきゅーえる | +----------+--------------------------------+ 4 rows in set (0.00 sec) root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('まいえ*' IN BOOLEAN MODE); +----------+--------------------------------+ | FTS_N_ID | title | +----------+--------------------------------+ | 6 | まいえすきゅーえる | | 13 | まいえーすきゅーえる | +----------+--------------------------------+ 2 rows in set (0.01 sec) root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('"まい いえ"' IN BOOLEAN MODE); +----------+--------------------------------+ | FTS_N_ID | title | +----------+--------------------------------+ | 6 | まいえすきゅーえる | | 13 | まいえーすきゅーえる | +----------+--------------------------------+ 2 rows in set (0.01 sec) root@localhost [ngram]>
補足:AGAINST(‘まいえ*’ IN BOOLEAN MODE) = AGAINST(‘”まい いえ”‘ IN BOOLEAN MODE)である事をスコアで確認。
root@localhost [ngram]> SELECT FTS_N_ID,MATCH (title) AGAINST('まいえ*' IN BOOLEAN MODE) AS score FROM N_DEMO; +----------+--------------------+ | FTS_N_ID | score | +----------+--------------------+ | 1 | 0 | | 2 | 0 | | 3 | 0 | | 4 | 0 | | 5 | 0 | | 6 | 0.9228526949882507 | | 7 | 0 | | 8 | 0 | | 9 | 0 | | 10 | 0 | | 11 | 0 | | 12 | 0 | | 13 | 0.9228526949882507 | +----------+--------------------+ 13 rows in set (0.00 sec) root@localhost [ngram]> SELECT FTS_N_ID,MATCH (title) AGAINST('"まい いえ"' IN BOOLEAN MODE) AS score FROM N_DEMO; +----------+--------------------+ | FTS_N_ID | score | +----------+--------------------+ | 1 | 0 | | 2 | 0 | | 3 | 0 | | 4 | 0 | | 5 | 0 | | 6 | 0.9228526949882507 | | 7 | 0 | | 8 | 0 | | 9 | 0 | | 10 | 0 | | 11 | 0 | | 12 | 0 | | 13 | 0.9228526949882507 | +----------+--------------------+ 13 rows in set (0.01 sec) root@localhost [ngram]>
その他、追加動作確認
root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('"mysql"' IN BOOLEAN MODE); +----------+-------+ | FTS_N_ID | title | +----------+-------+ | 1 | mysql | | 2 | MYSQL | | 3 | MySQL | +----------+-------+ 3 rows in set (0.01 sec) root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('"sql"' IN BOOLEAN MODE); +----------+-------+ | FTS_N_ID | title | +----------+-------+ | 1 | mysql | | 2 | MYSQL | | 3 | MySQL | +----------+-------+ 3 rows in set (0.00 sec) root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('"mysql"' IN BOOLEAN MODE); +----------+-------+ | FTS_N_ID | title | +----------+-------+ | 1 | mysql | | 2 | MYSQL | | 3 | MySQL | +----------+-------+ 3 rows in set (0.00 sec) root@localhost [ngram]> root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('"きゅー"' IN BOOLEAN MODE); +----------+--------------------------------+ | FTS_N_ID | title | +----------+--------------------------------+ | 6 | まいえすきゅーえる | | 12 | まいーえすきゅーえる | | 13 | まいえーすきゅーえる | +----------+--------------------------------+ 3 rows in set (0.00 sec) root@localhost [ngram]> root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('"きゅーえ"' IN BOOLEAN MODE); +----------+--------------------------------+ | FTS_N_ID | title | +----------+--------------------------------+ | 12 | まいーえすきゅーえる | | 6 | まいえすきゅーえる | | 13 | まいえーすきゅーえる | +----------+--------------------------------+ 3 rows in set (0.01 sec) root@localhost [ngram]> root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('きゅー' IN BOOLEAN MODE); +----------+--------------------------------+ | FTS_N_ID | title | +----------+--------------------------------+ | 6 | まいえすきゅーえる | | 12 | まいーえすきゅーえる | | 13 | まいえーすきゅーえる | +----------+--------------------------------+ 3 rows in set (0.00 sec) root@localhost [ngram]>
MySQL can perform boolean full-text searches using the IN BOOLEAN MODE modifier.
With this modifier, certain characters have special meaning at the beginning or
end of words in the search string. In the following query, the + and – operators indicate
that a word must be present or absent, respectively, for a match to occur.
In implementing this feature, MySQL uses what is sometimes referred to as implied Boolean logic, in which
+ stands for AND
– stands for NOT
[no operator] implies OR
参照: http://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html
root@localhost [ngram]> root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('きゅー' IN BOOLEAN MODE); +----------+--------------------------------+ | FTS_N_ID | title | +----------+--------------------------------+ | 6 | まいえすきゅーえる | | 12 | まいーえすきゅーえる | | 13 | まいえーすきゅーえる | +----------+--------------------------------+ 3 rows in set (0.00 sec) root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH (title) AGAINST ('+きゅー -まいー' IN BOOLEAN MODE); +----------+--------------------------------+ | FTS_N_ID | title | +----------+--------------------------------+ | 6 | まいえすきゅーえる | | 13 | まいえーすきゅーえる | +----------+--------------------------------+ 2 rows in set (0.00 sec) root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH (title) AGAINST ('+きゅー -えー' IN BOOLEAN MODE); +----------+--------------------------------+ | FTS_N_ID | title | +----------+--------------------------------+ | 6 | まいえすきゅーえる | | 12 | まいーえすきゅーえる | +----------+--------------------------------+ 2 rows in set (0.00 sec) root@localhost [ngram]>
root@localhost [ngram]> SET GLOBAL innodb_ft_aux_table="ngram/N_DEMO"; Query OK, 0 rows affected (0.00 sec) root@localhost [ngram]> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE; +--------+--------------+-------------+-----------+--------+----------+ | WORD | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION | +--------+--------------+-------------+-----------+--------+----------+ | いえ | 15 | 15 | 1 | 15 | 3 | | いー | 14 | 14 | 1 | 14 | 3 | | えす | 14 | 14 | 1 | 14 | 9 | | える | 14 | 15 | 2 | 14 | 24 | | える | 14 | 15 | 2 | 15 | 24 | | えー | 15 | 15 | 1 | 15 | 6 | | きゅ | 14 | 15 | 2 | 14 | 15 | | きゅ | 14 | 15 | 2 | 15 | 15 | | すき | 14 | 15 | 2 | 14 | 12 | | すき | 14 | 15 | 2 | 15 | 12 | | まい | 14 | 15 | 2 | 14 | 0 | | まい | 14 | 15 | 2 | 15 | 0 | | ゅー | 14 | 15 | 2 | 14 | 18 | | ゅー | 14 | 15 | 2 | 15 | 18 | | ーえ | 14 | 15 | 2 | 14 | 6 | | ーえ | 14 | 15 | 2 | 14 | 15 | | ーえ | 14 | 15 | 2 | 15 | 21 | | ーす | 15 | 15 | 1 | 15 | 9 | +--------+--------------+-------------+-----------+--------+----------+ 18 rows in set (0.00 sec) root@localhost [ngram]>
参照: http://mysqlserverteam.com/innodb-full-text-n-gram-parser/
PlanetMySQL Voting: Vote UP / Vote DOWN