I have a table with 1000+ partitions.

"Show partitions" command only lists a small number of partitions.

How can i show all partitions?

Update:

  1. I found "show partitions" command only lists exactly 500 partitions.

  2. "select ... where ..." only processes the 500 partitions!

5

Best Answer


CLI has some limit when ouput is displayed. I suggest to export output into local file:

$hive -e 'show partitions table;' > partitions

hive> show partitions table_name;

Okay, I'm writing this answer by extending wmky's answer above & also, assuming that you've configured mysql for your metastore instead of derby.

select PART_NAME FROM PARTITIONS WHERE TBL_ID=(SELECT TBL_ID FROM TBLS WHERE TBL_NAME='<table_name>');

The above query gives you all possible values of the partition columns.

Example:

hive> desc clicks_fact;OKtime timestamp .. day date file_date varchar(8) # Partition Information # col_name data_type comment day date file_date varchar(8) Time taken: 1.075 seconds, Fetched: 28 row(s)

I'm going to fetch the values of partition columns.

mysql> select PART_NAME FROM PARTITIONS WHERE TBL_ID=(SELECT TBL_ID FROM TBLS WHERE TBL_NAME='clicks_fact');+-----------------------------------+| PART_NAME |+-----------------------------------+| day=2016-08-16/file_date=20160816 || day=2016-08-17/file_date=20160816 |........| day=2017-09-09/file_date=20170909 || day=2017-09-08/file_date=20170909 || day=2017-09-09/file_date=20170910 || day=2017-09-10/file_date=20170910 |+-----------------------------------+1216 rows in set (0.00 sec)

Returns all partition columns.

Note: JOIN table DBS ON DB_ID when there is a DB involved (i.e, when, multiple DB's have same table_name)

You can see Hive MetaStore tables,Partitions information in table of "PARTITIONS".You could use "TBLS" join "Partition" to query special table partitions.

Yet another option is to communicate with Hive Metastore via Thrift protocol.
If you write code in python, you may benefit from hmsclient library:

Hive cli:

hive> create table test_table_with_partitions(f1 string, f2 int) partitioned by (dt string);OKTime taken: 0.127 secondshive> alter table test_table_with_partitions add partition(dt=20210504) partition(dt=20210505);OKTime taken: 0.152 seconds

Python cli:

>>> from hmsclient import hmsclient>>> client = hmsclient.HMSClient(host='hive.metastore.location', port=9083)>>> with client as c:... all_partitions = c.get_partitions(db_name='default',... tbl_name='test_table_with_partitions', ... max_parts=24 * 365 * 3)...>>> print([{'dt': part.values[0]} for part in all_partitions])[{'dt': '20210504'}, {'dt': '20210505'}]

NB: max_parts is a parameter than cannot be greater than 32767 (java short max value).

If you have Airflow installed together with apache.hive extra, you create hmsclient quite easy:

hive_hook = HiveMetastoreHook()with hive_hook.metastore as hive_client:... your code goes here ...

This seems a more valid way of communication with Hive Metastore than accessing DB directly (and database engine agnostic BTW).