phoenix 二级索引

概述

在HBase中,只有一个单一的按照字典序排序的rowKey索引,当使用rowKey来进行数据查询的时候速度较快,但是如果不使用rowKey来查询的话就会使用filter来对全表进行扫描,很大程度上降低了检索性能。而Phoenix提供了二级索引技术来应对这种使用rowKey之外的条件进行检索的场景。

Phoenix支持两种类型的索引技术:Global Indexing和Local Indexing,这两种索引技术分别适用于不同的业务场景(主要是偏重于读还是偏重于写)。下面分别对这两种索引技术简单使用一下,具体性能方面没有进行测试。

以上文字摘自官方文档!!!

Covered indexes(覆盖索引)

Phoenix is particularly powerful in that we provide covered indexes - we do not need to go back to the primary table once we have found the index entry. Instead, we bundle the data we care about right in the index rows, saving read-time overhead.

phoenix的强大之处在于提供了覆盖索引 – 通过走索引表,我们将不在需要查看主表。只要在索引表中包含我们所需的字段,就可以提高查询的效率。

for example (例如):

CREATE INDEX my_index ON my_table (v1,v2) INCLUDE(v3)

global indexes (全局索引)

Local indexing targets write heavy, space constrained use cases. Just like with global indexes, Phoenix will automatically select whether or not to use a local index at query-time. With local indexes, index data and table data co-reside on same server preventing any network overhead during writes. Local indexes can be used even when the query isn’t fully covered (i.e. Phoenix automatically retrieve the columns not in the index through point gets against the data table). Unlike global indexes, all local indexes of a table are stored in a single, separate shared table prior to 4.8.0 version. From 4.8.0 onwards we are storing all local index data in the separate shadow column families in the same data table. At read time when the local index is used, every region must be examined for the data as the exact region location of index data cannot be predetermined. Thus some overhead occurs at read-time.

全局索引设用于读多写少的场景,因为所有对表进行更新的操作(insert value and delete and insert select)都需要同时更新索引表,然而索引表分布在不同的节点上,更新的时候需要跨节点的通信去更新数据(会带来性能消耗)。在读数据的时候,如果查询的字段包含在索引表之中,会通过查询索引表直接返回查询结果,否则会扫描全表。

for example :

CREATE INDEX B_C_D_IDX ON DATA_TABLE(B,C,D);

-----------------------------------------------------------------------------
| select * from DATA_TABLE where B='B' and C='C' and D='D'|   索引程度最高   |
-----------------------------------------------------------------------------
| select * from DATA_TABLE where B='B' and C='C'          |   索引程度较高   |
-----------------------------------------------------------------------------
| select * from DATA_TABLE where B='B'                    |   索引程度中     |
-----------------------------------------------------------------------------
HBase Rowkey都是通过二进制数据的字典序排列存储。
也就意味着Row key前缀匹配度越高就越容易排在一起。
查询尽量遵循前缀匹配原则,避免触发全表扫描。

Functional Indexes (函数索引)

Functional indexes (available in 4.3 and above) allow you to create an index not just on columns, but on an arbitrary expressions. Then when a query uses that expression, the index may be used to retrieve the results instead of the data table. For example, you could create an index on UPPER(FIRST_NAME||‘ ’||LAST_NAME) to allow you to do case insensitive searches on the combined first name and last name of a person.
从Phoeinx4.3以上就支持函数索引,其索引不局限于列,可以合适任意的表达式来创建索引,当在查询时用到了这些表达式时就直接返回表达式结果。
创建函数索引:

CREATE INDEX UPPERINDEX ON CSVTABLES (UPPER(USERID || '  ' || URL))

查询: select * from (UPPER(USERID || '  ' || URL)) limit 1  

Local indexes(本地索引)

Local indexing targets write heavy, space constrained use cases. Just like with global indexes, Phoenix will automatically select whether or not to use a local index at query-time.
本地索引适用于写多读少,空间有限的场景,和全局索引一样,Phoneix在查询时会自动选择是否使用本地索引,使用本地索引,为避免进行写操作所带来的网络开销,索引数据和表数据都存放在相同的服务器中,当查询的字段不完全是索引字段时本地索引也会被使用,与全局索引不同的是,所有的本地索引都单独存储在同一张共享表中,由于无法预先确定region的位置,所以在读取数据时会检查每个region上的数据因而带来一定性能开销。
创建索引:

CREATE LOCAL INDEX MYINDEX ON CSVTABLES(USERID);

查询:
select * from CSVTABLES where USERID='USERID'

删除索引

DROP INDEX my_index ON my_table

如果在数据表中删除索引列,则索引将自动删除。此外,如果数据表中删除了一个被覆盖的列,那么它也将自动从索引中删除。

Life is more than the present.