Quantcast
Channel: Planet MySQL
Viewing all articles
Browse latest Browse all 1081

MySQL5.7 with FTS NGRAM(IN BOOLEAN MODE)

$
0
0

一つ前のPOSTで検証したデータでそのまま、NGRAMにてIN BOOLEAN MODEを使い検索してみました。

Natural Language
自然言語処理
By default or with the IN NATURAL LANGUAGE MODE modifier,
the MATCH() function performs a natural language search for a string against a text collection.
– text searched for is converted to a union of n-gram values. For example, ‘sql’ is converted to ‘sq ql’ (with a default token size of 2 or bigram).

natu

Boolean
真(true)と偽(false)の2種類の値だけを扱う型 (“+”、”-“で条件設定可能)
MySQL can perform boolean full-text searches using the IN BOOLEAN MODE modifier.
With this modifier, certain characters have special meaning at the beginning or end of words in the search string.
– text searched for is converted to an n-gram phrase search. For example, ‘sql’ is converted to ‘”sq ql”‘:

bool

■ NATURAL LANGUAGE MODEとBOOLEAN MODEでの結果の違い

root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH (title) AGAINST ('sql' IN NATURAL LANGUAGE MODE);
+----------+-------+
| FTS_N_ID | title |
+----------+-------+
|        1 | mysql |
|        2 | MYSQL |
|        3 | MySQL |
|        9 | sq    |
|       11 | ql    |
+----------+-------+
5 rows in set (0.00 sec)

root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('sql' IN BOOLEAN MODE);
+----------+-------+
| FTS_N_ID | title |
+----------+-------+
|        1 | mysql |
|        2 | MYSQL |
|        3 | MySQL |
+----------+-------+
3 rows in set (0.00 sec)

root@localhost [ngram]> 

compare

■以下,BOOLEANモードの挙動についての確認
– text searched
– wildcard searches
– phrase searches

text searched for is converted to an n-gram phrase search.
For example, ‘sql’ is converted to ‘”sq ql”‘:

12.9.2 Boolean Full-Text Searches

http://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html

基本設定と検証用データ

root@localhost [ngram]> show variables like 'ngram_token_size';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| ngram_token_size | 2     |
+------------------+-------+
1 row in set (0.00 sec)

root@localhost [ngram]> select * from N_DEMO;
+----------+-----------------------------+
| FTS_N_ID | title                       |
+----------+-----------------------------+
|        1 | mysql                       |
|        2 | MYSQL                       |
|        3 | MySQL                       |
|        4 | マイエスキューエル          |
|        5 | マイエスキューエル                   |
|        6 | まいえすきゅーえる          |
|        7 | まい                        |
|        8 | えす                        |
|        9 | sq                          |
|       10 | sl                          |
|       11 | ql                          |
+----------+-----------------------------+
11 rows in set (0.00 sec)

■ example)
‘s*’はsを含むすべてのデータ


root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('s*' IN BOOLEAN MODE);
+----------+-------+
| FTS_N_ID | title |
+----------+-------+
|        1 | mysql |
|        2 | MYSQL |
|        3 | MySQL |
|        9 | sq    |
|       10 | sl    |
+----------+-------+
5 rows in set (0.00 sec)

root@localhost [ngram]> 

‘sq*’ is converted to ‘”sq”‘

root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('sq*' IN BOOLEAN MODE);
+----------+-------+
| FTS_N_ID | title |
+----------+-------+
|        1 | mysql |
|        2 | MYSQL |
|        3 | MySQL |
|        9 | sq    |
+----------+-------+
4 rows in set (0.00 sec)

root@localhost [ngram]> 

‘sql*’ is equivalent to ‘”sq ql”‘:

root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('sql*' IN BOOLEAN MODE);
+----------+-------+
| FTS_N_ID | title |
+----------+-------+
|        1 | mysql |
|        2 | MYSQL |
|        3 | MySQL |
+----------+-------+
3 rows in set (0.00 sec)

root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('"sq ql"' IN BOOLEAN MODE);
+----------+-------+
| FTS_N_ID | title |
+----------+-------+
|        1 | mysql |
|        2 | MYSQL |
|        3 | MySQL |
+----------+-------+
3 rows in set (0.00 sec)

root@localhost [ngram]> 

■ データを追加して日本語で確認
挙動は同じで、対象とするデータを抽出出来る事を確認しました。
データを増やして、もっと試してみる必要あり。

root@localhost [ngram]> select * from N_DEMO;
+----------+--------------------------------+
| FTS_N_ID | title                          |
+----------+--------------------------------+
|        1 | mysql                          |
|        2 | MYSQL                          |
|        3 | MySQL                          |
|        4 | マイエスキューエル             |
|        5 | マイエスキューエル                      |
|        6 | まいえすきゅーえる             |
|        7 | まい                           |
|        8 | えす                           |
|        9 | sq                             |
|       10 | sl                             |
|       11 | ql                             |
|       12 | まいーえすきゅーえる           |
|       13 | まいえーすきゅーえる           |
+----------+--------------------------------+
13 rows in set (0.00 sec)

root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('ま*' IN BOOLEAN MODE);
+----------+--------------------------------+
| FTS_N_ID | title                          |
+----------+--------------------------------+
|        6 | まいえすきゅーえる             |
|        7 | まい                           |
|       12 | まいーえすきゅーえる           |
|       13 | まいえーすきゅーえる           |
+----------+--------------------------------+
4 rows in set (0.01 sec)

root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('まい*' IN BOOLEAN MODE);
+----------+--------------------------------+
| FTS_N_ID | title                          |
+----------+--------------------------------+
|        6 | まいえすきゅーえる             |
|        7 | まい                           |
|       12 | まいーえすきゅーえる           |
|       13 | まいえーすきゅーえる           |
+----------+--------------------------------+
4 rows in set (0.00 sec)

root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('まいえ*' IN BOOLEAN MODE);
+----------+--------------------------------+
| FTS_N_ID | title                          |
+----------+--------------------------------+
|        6 | まいえすきゅーえる             |
|       13 | まいえーすきゅーえる           |
+----------+--------------------------------+
2 rows in set (0.01 sec)

root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('"まい いえ"' IN BOOLEAN MODE);
+----------+--------------------------------+
| FTS_N_ID | title                          |
+----------+--------------------------------+
|        6 | まいえすきゅーえる             |
|       13 | まいえーすきゅーえる           |
+----------+--------------------------------+
2 rows in set (0.01 sec)

root@localhost [ngram]> 

補足:AGAINST(‘まいえ*’ IN BOOLEAN MODE) = AGAINST(‘”まい いえ”‘ IN BOOLEAN MODE)である事をスコアで確認。

root@localhost [ngram]> SELECT FTS_N_ID,MATCH (title) AGAINST('まいえ*' IN BOOLEAN MODE) AS score FROM N_DEMO;
+----------+--------------------+
| FTS_N_ID | score              |
+----------+--------------------+
|        1 |                  0 |
|        2 |                  0 |
|        3 |                  0 |
|        4 |                  0 |
|        5 |                  0 |
|        6 | 0.9228526949882507 |
|        7 |                  0 |
|        8 |                  0 |
|        9 |                  0 |
|       10 |                  0 |
|       11 |                  0 |
|       12 |                  0 |
|       13 | 0.9228526949882507 |
+----------+--------------------+
13 rows in set (0.00 sec)

root@localhost [ngram]> SELECT FTS_N_ID,MATCH (title) AGAINST('"まい いえ"' IN BOOLEAN MODE) AS score FROM N_DEMO;
+----------+--------------------+
| FTS_N_ID | score              |
+----------+--------------------+
|        1 |                  0 |
|        2 |                  0 |
|        3 |                  0 |
|        4 |                  0 |
|        5 |                  0 |
|        6 | 0.9228526949882507 |
|        7 |                  0 |
|        8 |                  0 |
|        9 |                  0 |
|       10 |                  0 |
|       11 |                  0 |
|       12 |                  0 |
|       13 | 0.9228526949882507 |
+----------+--------------------+
13 rows in set (0.01 sec)

root@localhost [ngram]> 

score

その他、追加動作確認

root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('"mysql"' IN BOOLEAN MODE);
+----------+-------+
| FTS_N_ID | title |
+----------+-------+
|        1 | mysql |
|        2 | MYSQL |
|        3 | MySQL |
+----------+-------+
3 rows in set (0.01 sec)

root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('"sql"' IN BOOLEAN MODE);
+----------+-------+
| FTS_N_ID | title |
+----------+-------+
|        1 | mysql |
|        2 | MYSQL |
|        3 | MySQL |
+----------+-------+
3 rows in set (0.00 sec)

root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('"mysql"' IN BOOLEAN MODE);
+----------+-------+
| FTS_N_ID | title |
+----------+-------+
|        1 | mysql |
|        2 | MYSQL |
|        3 | MySQL |
+----------+-------+
3 rows in set (0.00 sec)

root@localhost [ngram]> root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('"きゅー"' IN BOOLEAN MODE);
+----------+--------------------------------+
| FTS_N_ID | title                          |
+----------+--------------------------------+
|        6 | まいえすきゅーえる             |
|       12 | まいーえすきゅーえる           |
|       13 | まいえーすきゅーえる           |
+----------+--------------------------------+
3 rows in set (0.00 sec)

root@localhost [ngram]> root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('"きゅーえ"' IN BOOLEAN MODE);
+----------+--------------------------------+
| FTS_N_ID | title                          |
+----------+--------------------------------+
|       12 | まいーえすきゅーえる           |
|        6 | まいえすきゅーえる             |
|       13 | まいえーすきゅーえる           |
+----------+--------------------------------+
3 rows in set (0.01 sec)

root@localhost [ngram]> root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('きゅー' IN BOOLEAN MODE);
+----------+--------------------------------+
| FTS_N_ID | title                          |
+----------+--------------------------------+
|        6 | まいえすきゅーえる             |
|       12 | まいーえすきゅーえる           |
|       13 | まいえーすきゅーえる           |
+----------+--------------------------------+
3 rows in set (0.00 sec)

root@localhost [ngram]> 

mysql-boolean

MySQL can perform boolean full-text searches using the IN BOOLEAN MODE modifier.
With this modifier, certain characters have special meaning at the beginning or
end of words in the search string. In the following query, the + and – operators indicate
that a word must be present or absent, respectively, for a match to occur.

In implementing this feature, MySQL uses what is sometimes referred to as implied Boolean logic, in which
+ stands for AND
– stands for NOT
[no operator] implies OR

参照: http://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html

root@localhost [ngram]> root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH(title) AGAINST('きゅー' IN BOOLEAN MODE);
+----------+--------------------------------+
| FTS_N_ID | title                          |
+----------+--------------------------------+
|        6 | まいえすきゅーえる             |
|       12 | まいーえすきゅーえる           |
|       13 | まいえーすきゅーえる           |
+----------+--------------------------------+
3 rows in set (0.00 sec)

root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH (title) AGAINST ('+きゅー -まいー' IN BOOLEAN MODE);
+----------+--------------------------------+
| FTS_N_ID | title                          |
+----------+--------------------------------+
|        6 | まいえすきゅーえる             |
|       13 | まいえーすきゅーえる           |
+----------+--------------------------------+
2 rows in set (0.00 sec)

root@localhost [ngram]> SELECT * FROM N_DEMO WHERE MATCH (title) AGAINST ('+きゅー -えー' IN BOOLEAN MODE);
+----------+--------------------------------+
| FTS_N_ID | title                          |
+----------+--------------------------------+
|        6 | まいえすきゅーえる             |
|       12 | まいーえすきゅーえる           |
+----------+--------------------------------+
2 rows in set (0.00 sec)

root@localhost [ngram]> 
root@localhost [ngram]> SET GLOBAL innodb_ft_aux_table="ngram/N_DEMO";
Query OK, 0 rows affected (0.00 sec)

root@localhost [ngram]> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE;
+--------+--------------+-------------+-----------+--------+----------+
| WORD   | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION |
+--------+--------------+-------------+-----------+--------+----------+
| いえ   |           15 |          15 |         1 |     15 |        3 |
| いー   |           14 |          14 |         1 |     14 |        3 |
| えす   |           14 |          14 |         1 |     14 |        9 |
| える   |           14 |          15 |         2 |     14 |       24 |
| える   |           14 |          15 |         2 |     15 |       24 |
| えー   |           15 |          15 |         1 |     15 |        6 |
| きゅ   |           14 |          15 |         2 |     14 |       15 |
| きゅ   |           14 |          15 |         2 |     15 |       15 |
| すき   |           14 |          15 |         2 |     14 |       12 |
| すき   |           14 |          15 |         2 |     15 |       12 |
| まい   |           14 |          15 |         2 |     14 |        0 |
| まい   |           14 |          15 |         2 |     15 |        0 |
| ゅー   |           14 |          15 |         2 |     14 |       18 |
| ゅー   |           14 |          15 |         2 |     15 |       18 |
| ーえ   |           14 |          15 |         2 |     14 |        6 |
| ーえ   |           14 |          15 |         2 |     14 |       15 |
| ーえ   |           14 |          15 |         2 |     15 |       21 |
| ーす   |           15 |          15 |         1 |     15 |        9 |
+--------+--------------+-------------+-----------+--------+----------+
18 rows in set (0.00 sec)

root@localhost [ngram]> 

filer2

参照: http://mysqlserverteam.com/innodb-full-text-n-gram-parser/


PlanetMySQL Voting: Vote UP / Vote DOWN

Viewing all articles
Browse latest Browse all 1081

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>