DataSets cheat sheet

global search for datasets
datasets on kaggle

BigSQL cheat sheet

how to run query

Warehouse

default directory (/special-folder/hive/warehouse) in HDFS to store tables

Schema

directory (/special-folder/hive/warehouse/user-personal-schema) to store tables ( can be organized in schema )

Table

directory (/special-folder/hive/warehouse/user-personal-schema/my-table-01) with zero or more files

Partitions

additional sub-directory to save special data for the table

/special-folder/hive/warehouse/user-personal-schema/my-table-01/my-column-country=USA
/special-folder/hive/warehouse/user-personal-schema/my-table-01/my-column-country=GERMANY

commands

bigsql/bin start
bigsql/bin stop
bigsql/bin status

configuration

bigsql-conf.xml

switch on compatability mode

use Big SQL 1.0 into Big SQL

set syshadoop.compatability_mode=1;

Data types

Declared type
SQL type
Hive type

using strings

avoid to use string - default value 32k
change default string length

set hadoop property bigsql.string.size=128

use VARCHAR instead of

datetime ( not date !!! )

2003-12-23 00:00:00.0

boolean

create schema

create schema "my_schema";
use "my_schema"
drop schema "my_schema" cascade;

create table ( @see hive.md )

create hadoop table IF NOT EXISTS my_table_into_my_schema ( col1 int not null primary key, col2 varchar(50)) 
row format delimited 
fields terminated by ',' 
LINES TERMINATED by '\n'
escaped BY '\\', 
null defined as '%THIS_IS_NULL%' s
stored as [<empty>, TEXT, BINARY] SEQUENCEFILE;
-- PARQUETFILE
-- ORC
-- RCFILE
-- TEXTFILE

avro table creation:

insert

insert values (not to use for prod) - each command will create its personal file with records

insert into my_table_into_my_schema values (1,'first'), (2,'second'), (3,'third');

file insert - copy file into appropriate folder ( with delimiter between columns )

call syshadoop.hcat_cache_sync('my_schema', 'my_table_into_my_schema');

create table based on select

CREATE HADOOP TABLE new_my_table STORED AS PARQUETFILE AS SELECT * FROM my_table_into_my_schema;

load data

LOAD HADOOP USING FILE URL 'sftp://my_host:22/sub-folder/file1' 
WITH SOURCE PROPERTIES ('field.delimiter' = '|')
INTO TABLE my_table_into_my_schema APPEND;

null value

BigSQL 1.0 - ''
BigSQL - \N
can be specified as part of table creation

NULL DEFINED AS 'null_representation'

JSqsh

CLI tool to work with any JDBC driver ( Java SQl SH )

getting started
command line parameters
ibm documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bigsql-cheat-sheet.md

bigsql-cheat-sheet.md

DataSets cheat sheet

global search for datasets
datasets on kaggle

BigSQL cheat sheet

Warehouse

Schema

Table

Partitions

commands

configuration

switch on compatability mode

Data types

using strings

datetime ( not date !!! )

boolean

create schema

create table ( @see hive.md )

insert

null value

JSqsh

Files

bigsql-cheat-sheet.md

Latest commit

History

bigsql-cheat-sheet.md

File metadata and controls

DataSets cheat sheet

global search for datasets datasets on kaggle

BigSQL cheat sheet

Warehouse

Schema

Table

Partitions

commands

configuration

switch on compatability mode

Data types

using strings

datetime ( not date !!! )

boolean

create schema

create table ( @see hive.md )

insert

null value

JSqsh

global search for datasets
datasets on kaggle