If you write code in python, you may benefit from hmsclient library:
Hive cli:
hive> create table test_table_with_partitions(f1 string, f2 int) partitioned by (dt string);OKTime taken: 0.127 secondshive> alter table test_table_with_partitions add partition(dt=20210504) partition(dt=20210505);OKTime taken: 0.152 seconds
Python cli:
>>> from hmsclient import hmsclient>>> client = hmsclient.HMSClient(host='hive.metastore.location', port=9083)>>> with client as c:... all_partitions = c.get_partitions(db_name='default',... tbl_name='test_table_with_partitions', ... max_parts=24 * 365 * 3)...>>> print([{'dt': part.values[0]} for part in all_partitions])[{'dt': '20210504'}, {'dt': '20210505'}]
NB: max_parts
is a parameter than cannot be greater than 32767 (java short max value).
If you have Airflow installed together with apache.hive
extra, you create hmsclient
quite easy:
hive_hook = HiveMetastoreHook()with hive_hook.metastore as hive_client:... your code goes here ...
This seems a more valid way of communication with Hive Metastore than accessing DB directly (and database engine agnostic BTW).