Ok this one is a Teradata speciality. Skew….
Before I go into any detail on Skew just a quick “how teradata works” to get everyone on the same page.
Teradata bills itself as a data warehouse rather than a database because it’s designed be able to store and run huge clumps of data, it does this by spreading the data out over multi servers(amps) sort of like a RAID array. So for example you will have a 10 Amp system in theory each Amp should store about 10% of each table, that way any queries will be spread across all Amps therefore reducing load on any 1 Amp and making it run faster(parrellel rather than serial if you will).(There is also redundacy so each amp has mulitple table cross linked etc… but for now leave it)
What happens in reality though is that data can be SKEWed meaning that instead of a table been spread evenly across all AMP’s it will be clumped on 1 amp (normally your last amp i’ll explain later). This happens for two reasons, 1 Teradata groups data on each AMP based on Primary INDEX and 2. People write crap SQL using poor Primary INDEX.
As I mentioned before you last AMP will normally cop the brunt of SKEW this is because Teradata will plonk any Primary Indexes which are null all on the last AMP, so if you have create a badly indexed table which has say 100,000 rows and 90,000 have a null index, guess what that’s right 90,000 rows will all be sitting on Amp 10 (or what ever you last AMP is) and the other 10,000 spread nicely over the remaining 9 amps.
This means any queries you run on the table will run like a dog as all the processing will be on AMP 10 rather than across all AMPS. Also it means you will use alot of space again due to the way Teradata store data. I used the analogy before about Teradata been like RAID as it is true teradata cross store data for reduntacy just like raid, which means that if amp 10 has 90,000 rows of data teradata will reserve 90,000 rows of data on each amp(not just the 10,000 left over) meaning that each amp has any additional 90,000 rows reserved for no reason(as a kicker it also won’t show in table size view so you will have 50gb free but be out of space…. nice huh).
To check the skew on a table you will first need it’s index.
Once you have the talbe index use
HASHAMP (HASHBUCKET (HASHROW (inserttableindexhere)))
Skew is easy to fix, just make sure you primary indexes are as unique as possible, rembember indexes don’t have to be a single field you can combine fields to create an index
Anyway hope you find this handy, post up any tips you might have around improving table skew or table performance in general.