Search SQL Server
in the community
from expert resources
on a platform
Code Tester for Oracle
SQL Optimizer for IBM DB2
SQL Optimizer for Oracle
SQL Optimizer for SQL Server
SQL Optimizer for Sybase
Spotlight on DB2 LUW
Spotlight on Oracle
Spotlight on SQL Server
Spotlight on Sybase
Toad Business Intelligence
Toad Data Modeler
Toad Data Point
Toad Extension for Eclipse
Toad for Cloud Databases
Toad for Hadoop
Toad for IBM DB2
Toad for MySQL
Toad for Oracle
Toad for SQL Server
Toad for Sybase
Toad Mac Edition
Eating the elephant one bite at a time: knowing your tables
SQL Server Community
Comment RSS Feed
Eating the elephant one bite at a time: dropping databases
7.16.2013 at 10:42am
Eating the elephant one bite at a time: creating a database using Hive
7.15.2013 at 6:00pm
Eating the elephant, one bite at a time: Loading data using Hive
7.27.2013 at 7:12pm
Eating the elephant one bite at a time: Some tips in setting up the Hortonworks Sandbox VM
7.14.2013 at 4:56pm
Eating the Elephant: Totally free videos showing an introduction to Visualising Big Data for Business Intelligence Professionals
10.5.2013 at 6:15pm
Hadoop Summit Europe 2014 Call for Abstracts is now open
9.21.2013 at 3:44pm
SQL Server 2008
SQL Server 2012
Eating the elephant one bite at a time: knowing your tables
7.19.2013 at 11:42pm
In this segment, we will look at the most basic unit of the database: tables. We will look at their implementation in SQL Server and also in Hive.
Tables, in SQL Server, contain the data in a database. SQL Server has a number of different types of database as well as the tables that are defined by the user. These table types include partitioned tables, temporary tables, system tables and wide tables. If you need more information on these types, please visit the
official Microsoft documentation here
In Hive, there are two types of tables: internal and external tables. In
our last topic
, we touched on tables and how we need to drop them in Hive before we can delete the database (unless you use the CASCADE command).
Cindy Gross has written an excellent blog topic on internal vs external tables, and you can
access it here
For SQL Server people, it is important not to think that the internal/external point is similar to the local / global temporary tables discussion. Local temporary tables are visible only to the current session, whereas global temporary tables are visible to all sessions. Temporary tables in SQL Server cannot be partitioned.
It is important to note that Hive tables are not akin to SQL Server system tables, where SQL Server stores its internal knowledge about the configuration of the server. In System tables, users cannot query or update these tables directly, and view the information via
Instead, the Internal/External point refers to the level at which Hive manages the table, and whether Hive sees the data as being shared across different tools, or simply used by Hive only. Internal and external tables in Hive can be partitioned, and we will look at this point further in a later post. Unlike system tables, internal tables are used to store data which isn't about the system itself, as in a SQL Server system table.
For our purposes here of eating the Hadoop elephant, one bite at a time, we will look at the differences between these table types. Here is a summary below:
Managed / Internal Tables - Hive owns the data, dropping the table deletes the table metadata and the actual data.
External tables - points at the data but does not own it. Dropping the table deletes the metadata, not the actual data.
The data for this table is taken from the UNICEF Report Card, which details childrens' wellbeing in different countries. Here, the data looks at childrens' wellbeing in rich countries, and
you can read the report here
Creating an External Table
In Hue, you need to be sure that you are pointing at the correct database.
On the left hand side, you will see a drop down list of databases.
We are going to choose the database 'IncomeInequality'
The external table can be created using the following statement, which is terminated by a semi-colon ";".
CREATE EXTERNAL TABLE IF NOT EXISTS UNICEFReportCard ( CountryName STRING, AvgRankPosition FLOAT, MaterialWellBeing FLOAT, HealthAndSafety FLOAT, EducationalWellBeing FLOAT, FamilyPeersRelationships FLOAT, BehavioursAndRisks FLOAT, SubjectiveWellBeing FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '/user/hue/UNICEF';
This query can be copied and pasted into the Query button, and then you can execute the query
simply by pressing the Execute button.
In order to check that the table has been created, you can also do a simple SELECT statement in
Beeswax Query Editor in order to see if the data exists:
SELECT * FROM IncomeInequality.UNICEFReportCard;
This will return no data, but if it does not error, then we know that the table has been found.
Alternatively, you can click on the Beeswax (Hive UI) and then select the 'Tables' button.
Make sure you've selected the correct database, IncomeInequality, and then you will see the table on the right hand side.
When you click on the table, you can see two options: you can see the columns, or view a sample of the data.
So, how do we get data into the table?
In SQL Server, we'd use SQL Server Integration Services.
In Hive, using the Beeswax GUI, we load up some data files. The external table is essentially a structure over the data.
So, if we drop the table, we don't delete the data; it still resides in the data files.
There are a number of ways to load files:
The simplest is through the GUI. Simply select 'Upload' and insert your file.
If you have more than one file, you could select a zip archive and your files will be loaded.
One caveat here: When I was using Internet Explorer version 10.9,
the 'Upload' button didn't seem to work properly. I switched to FireFox, and it worked fine.
Once you've uploaded the file, you can see it in Beeswax.
Now, if you go back and view the table, you can see it contains the data from the file. Here is an example below:
You could also execute the following command from the Hive Query Editor in Beeswax:
LOAD DATA INPATH '/user/hue/UNICEF/UNICEFreportcard.csv' OVERWRITE INTO TABLE `IncomeInequality.unicefreportcard`
Using the interface or queries, how can you distinguish between an internal table and an external table?
How can you tell if a table is managed or external? Hive has a command which is like
in SQL Server is used to give you details about a table.
The Hive command is given here:
DESCRIBE EXTENDED and then the name of the table, for example:
DESCRIBE EXTENDED UNICEFReportCard;
We can see the output below. If you click on the image, it will pop up the original and you can read it better:
However, this command isn't very readable, so you might want to try the following:
DESCRIBE FORMATTED default.sample_07;
This produces a more pretty format, which is easier to read.
and you can see that this sample table, in the default database, is a managed or internal table.
The data files for an internal table are stored in the subdirectory 'warehouse',
which is where data for internal tables is stored. Hive looks after the data in these tables.
To summarise, we have looked at the different table types and how to create them, and how to upload data into them.
Thank you for joining me in this series so far. Our next step is to look at partitioning tables, and then we will start to look at analysing data.
Any questions in the meantime, please email me at email@example.com
Read the complete post at
About Toad World
Ads and Emails
Toad World is Sponsored by
Copyright © 2013 Dell Software Inc.
Translation provided by