SQL – Table Skew(Teradata)

Ok this one is a Teradata speciality. Skew….

Before I go into any detail on Skew just a quick “how teradata works” to get everyone on the same page.

Teradata bills itself as a data warehouse rather than a database because it’s designed be able to store and run huge clumps of data, it does this by spreading the data out over multi servers(amps) sort of like a RAID array. So for example you will have a 10 Amp system in theory each Amp should store about 10% of each table, that way any queries will be spread across all Amps therefore reducing load on any 1 Amp and making it run faster(parallels rather than serial if you will).(There is also redundancy so each amp has multiple table cross linked etc… but for now leave it)

What happens in reality though is that data can be SKEWed meaning that instead of a table been spread evenly across all AMP’s it will be clumped on 1 amp (normally your last amp I’ll explain later). This happens for two reasons, 1 Teradata groups data on each AMP based on Primary INDEX and 2. People write crap SQL using poor Primary INDEX.

As I mentioned before you last AMP will normally cop the brunt of SKEW this is because Teradata will plonk any Primary Indexes which are null all on the last AMP, so if you have create a badly indexed table which has say 100,000 rows and 90,000 have a null index, guess what that’s right 90,000 rows will all be sitting on Amp 10 (or what ever you last AMP is) and the other 10,000 spread nicely over the remaining 9 amps.

This means any queries you run on the table will run like a dog as all the processing will be on AMP 10 rather than across all AMPS. Also it means you will use alot of space again due to the way Teradata store data. I used the analogy before about Teradata been like RAID as it is true Teradata cross store data for redundancy just like raid, which means that if amp 10 has 90,000 rows of data Teradata will reserve 90,000 rows of data on each amp(not just the 10,000 left over) meaning that each amp has any additional 90,000 rows reserved for no reason(as a kicker it also won’t show in table size view so you will have 50gb free but be out of space…. nice huh).

To check the skew on a table you will first need it’s index.

use:

HELP
INDEX database.tablename

Once you have the table index use

SELECT
HASHAMP (HASHBUCKET (HASHROW (inserttableindexhere)))
,COUNT(1)
FROM inserttablenamehere
GROUPBY1
ORDERBY2DESC;

This will show a list of all the amp on your system and how much table data is stored on each, they should all be roughly the same give or take. But if 1 amp shows 10,000 and another shows 100….. well guess what your skew’d.
Skew is easy to fix, just make sure you primary indexes are as unique as possible, rembember indexes don’t have to be a single field you can combine fields to create an index

e.g.

CREATE
SELECT
whatever
,anotherfield

FROM

)
WITH DATA
PRIMARY
INDEX(whatever,whocares);

Anyway hope you find this handy, post up any tips you might have around improving table skew or table performance in general.

Leave a Reply

Your email address will not be published. Required fields are marked *