Hive is a component, which provides SQL-Like interface to
access data in HDFS. It provides data warehousing facilities on HDFS.
HQL statements are
broken down by the Hive service into MapReduce jobs and executed across a Hadoop cluster. For anyone with a SQL or relational database background,
this section will look very familiar to you. As with any database management
system (DBMS), you can run your Hive queries in many ways.
Create Database syntax:
CREATE DATABASE IF NOT EXISTS <dbname>
LOCATION '/lib/warehouse/sample'
COMMENT 'Holds all db tables'
WITH
DBPROPERTIES ('Use' = 'Demos',
'SchemaInfo' = 'db schema information');
·
IF NOT EXISTS clause is useful for scripts that should create a
database onthe-fly, if necessary.
·
Location used to override the default location of the
directory.
Create Table syntax:
CREATE TABLE employees (
name STRING,
salary FLOAT,
subordinates
ARRAY<STRING>,
deductions
MAP<STRING, FLOAT>,
addr STRUCT<street:STRING,
city:STRING, state:STRING, zip:INT>)
PARTITIONED
BY (country STRING, state
STRING)
COMMENT
'Description of the
table'
TBLPROPERTIES
('creator'='me',
'created_at'='2012-01-02’', ...)
LOCATION
'/user/hive/warehouse/mydb.db/employees';
FIELDS
TERMINATED BY ','
COLLECTION
ITEMS TERMINATED BY '\002'
MAP KEYS
TERMINATED BY '\003'
LINES
TERMINATED BY '\n'
STORED AS
TEXTFILE;
·
String,Float,Array,Map,Strauct
are some of the data types.
·
Struct is represented as
a particular type.
·
Deduction
is Map type, with key value pair data type
·
For
Array<string> every item in subordinate will be string
·
If
the filed terminated by ‘,’, the file will be saved in csv format.
·
Each
terminated by new line.
No comments:
Post a Comment