Presentation is loading. Please wait.

Presentation is loading. Please wait.

Partitioning & Creating Hardware Tablespaces for Performance

Similar presentations


Presentation on theme: "Partitioning & Creating Hardware Tablespaces for Performance"— Presentation transcript:

1 Partitioning & Creating Hardware Tablespaces for Performance
By Lloyd Albin Partitioning & Creating Hardware Tablespaces for Performance 4/27/2014

2 What you will be learning
Partitioning How to split a single table into multiple tables by date or some other criteria that you will use in your where clauses, so that not all the data needs to be read during queries. How to truncate and reload child tables quickly without affecting the all the tables. How to remove child tables and archive them, but also add archived tables back in with a single command. Tablespaces How to split your data across multiple media such as SSD, Hard Drive clusters, Long Term Tape Storage, etc. How to put your temp tables, used for sorting, etc, onto your fastest media. Partitioning & Tablespaces How to combine Partitioning & Tablespaces to create a really efficient system to access your most frequently needed information. Partitioning & Creating Hardware Tablespaces for Performance 4/27/2014

3 Partitioning Creating Parent / Child Table Relationships
Partitioning & Creating Hardware Tablespaces for Performance 4/27/2014

4 Partitioning Postgres Documention:
Partitions can be created using either ranges or a list of values, such as date ranges. Here are some examples: By Month By Quarter By Year By Customer By Study Parent and Child tables do not need to live in the same schema and each child table can even be in separate schemas. For example each study could be in its own schema and the parent table link all the child studies together. Partitioning & Creating Hardware Tablespaces for Performance 4/27/2014

5 Create your parent table
This is really simple, just create a normal table. CREATE TABLE journal (   key BIGSERIAL,   record_timestamp TIMESTAMP,   dfstudy NUMERIC,   … ); Partitioning & Creating Hardware Tablespaces for Performance 4/27/2014

6 Creating a child table Now we can create a child table based on the format of the parent table. Lets create two child tables, one for this month and one for last month. It is a nicety to name the constraints, but Postgres will also auto name them for you. CREATE TABLE journal_ ( CHECK (     record_timestamp >= ' :00:00'::timestamp AND     record_timestamp < ' :00:00'::timestamp   ),   PRIMARY KEY (key) ) INHERITS (journal); CREATE INDEX journal_201404_dfstudy_idx ON journal_ (dfstudy); CREATE TABLE journal_ ( CONSTRAINT journal_201403_check CHECK (     record_timestamp >= ' :00:00'::timestamp AND     record_timestamp < ' :00:00'::timestamp   ),   CONSTRAINT journal_201403_pkey PRIMARY KEY (key) ) INHERITS (journal); CREATE INDEX journal_201403_dfstudy_idx ON journal_ (dfstudy); Partitioning & Creating Hardware Tablespaces for Performance 4/27/2014

7 Inserting Data There three ways to insert data into the child tables.
Directly into the Child Table Parent Table Trigger Parent Table Rules Partitioning & Creating Hardware Tablespaces for Performance 4/27/2014

8 Inserting Data Directly
If you reload the data on a schedule, such as a cron job, there are some benefits to this since we can quickly truncate the table and then reload the data via a COPY command. Any query that tries to read this sub table during the transaction will wait until the COMMIT completes. BEGIN; TRUNCATE TABLE journal_201403; COPY journal_ FROM STDIN WITH HEADER TRUE, DELIMITER '|'; COMMIT; Partitioning & Creating Hardware Tablespaces for Performance 4/27/2014

9 Data via Trigger If you reload the data on a schedule, such as a cron job, there are some benefits to this since we can quickly truncate the table and then reload the data via a COPY command. Any query that tries to read this sub table during the transaction will wait until the COMMIT completes. CREATE OR REPLACE FUNCTION journal_trigger_func() RETURNS TRIGGER AS $$ DEFINE   subtable TEXT;   old_subtable TEXT; BEGIN IF (TG_OP = 'INSERT') THEN   … ELSIF (TG_OP = 'DELETE') THEN … ELSIF (TG_OP = 'UPDATE') THEN   … ELSE     RAISE EXCEPTION '% is not supported via the journal table!', TG_OP; END IF; RETURN NULL; -- Abort write to parent table EXCEPTION WHEN OTHERS THEN     RAISE EXCEPTION 'Writing journal child table failed!';     RETURN NULL; END; $$ LANGUAGE plpgsql;   CREATE TRIGGER journal_trigger     BEFORE INSERT, UPDATE, DELETE ON journal     FOR EACH ROW EXECUTE PROCEDURE journal_trigger_func(); Partitioning & Creating Hardware Tablespaces for Performance 4/27/2014

10 Inserting Data via Trigger
We need to create the table name that we want to insert into and then insert the data into the table. You can also create the child tables automatically from within this function if they do not exist. IF (TG_OP = 'INSERT') THEN   subtable = 'journal_' || date_part('year', NEW.record_timestamp) || date_part('month', NEW.record_timestamp); CREATE TABLE IF NOT EXISTS subtable   IF length(subtable) = 14 THEN     EXECUTE 'INSERT INTO $1 VALUES (($2).*);' USING quote_ident(subtable), NEW; ELSE     RAISE EXCEPTION 'Failed to properly generate journal subtable name!'; END IF; ELSIF (TG_OP = 'DELETE') THEN Partitioning & Creating Hardware Tablespaces for Performance 4/27/2014

11 Deleteing Data via Trigger
You should also check to make sure the child table exists, so that you can trap that error specifically instead of using the general error handler provided by the OTHERS. ELSIF (TG_OP = 'DELETE') THEN   old_subtable = 'journal_' || date_part('year', OLD.record_timestamp) || date_part('month', OLD.record_timestamp); Check to make sure the table exists   IF length(old_subtable) = 14 THEN     EXECUTE 'DELETE FROM $1 WHERE key = $2;' USING quote_ident(old_subtable), OLD.key; ELSE     RAISE EXCEPTION 'Failed to properly generate journal old_subtable name!'; END IF; ELSIF (TG_OP = 'UPDATE') THEN Partitioning & Creating Hardware Tablespaces for Performance 4/27/2014

12 Updating Data via Trigger
You should also check to make sure the child table exists, so that you can trap that error specifically instead of using the general error handler provided by the OTHERS. ELSIF (TG_OP = 'UPDATE') THEN   subtable = 'journal_' || date_part('year', NEW.record_timestamp) || date_part('month', NEW.record_timestamp);   old_subtable = 'journal_' || date_part('year', OLD.record_timestamp) || date_part('month', OLD.record_timestamp); Check to make sure subtable exists   IF subtable = old_subtable THEN     EXECUTE 'UPDATE $1 SET record_timestamp = $2, dfstudy = $3, … WHERE key = $4;' USING quote_ident(old_subtable), NEW.record_timestamp, NEW.dfstudy, …, OLD.key; ELSE     EXECUTE 'DELETE FROM $1 WHERE key = $2;' USING quote_ident(old_subtable), OLD.key;     EXECUTE 'INSERT INTO $1 VALUES (($2).*);' USING quote_ident(subtable), NEW; END IF; ELSE Partitioning & Creating Hardware Tablespaces for Performance 4/27/2014

13 Inserting Data via Rule
Rules are not executed by COPY commands and this is why functions are normally used. Rule are faster for INSERT since they run once per query rather than once per row, but they also have significantly more overhead than a trigger. CREATE RULE journal_insert_ AS ON INSERT TO journal WHERE ( record_timestamp >= ' :00:00'::timestamp AND record_timestamp < ' :00:00'::timestamp ) DO INSTEAD     INSERT INTO journal_ VALUES (NEW.*); Partitioning & Creating Hardware Tablespaces for Performance 4/27/2014

14 Selecting Data The first example will get all the data from the parent and child tables. The second example will get the data only from the parent table and nothing from the child tables. The third and fourth examples get data directly from the child tables. The fifth example get the data from the journal & journal_ tables. SELECT * FROM journal; SELECT * FROM ONLY journal; SELECT * FROM journal_201403; SELECT * FROM journal_201404; SELECT * FROM journal WHERE record_timestamp = ' :00:00'; Partitioning & Creating Hardware Tablespaces for Performance 4/27/2014

15 Query Plan In our example we are using 18 child tables, one per month with a total of 6,425,250 rows of data at the time this example was run. 368 rows returned in 31 ms SELECT * FROM journal WHERE record_timestamp >= ' :00:00'::timestamp AND record_timestamp < ' :00:00'::timestamp; Partitioning & Creating Hardware Tablespaces for Performance 4/27/2014

16 Traditional Single Table Query Plan
In our example we are one table with a total of 6,425,250 rows of data at the time this example was run. 368 rows returned in sec The traditional table version is more than 100 times slower. CREATE TABLE journal_all AS SELECT * FROM journal; SELECT * FROM journal_all WHERE record_timestamp >= ' :00:00'::timestamp AND record_timestamp < ' :00:00'::timestamp; Partitioning & Creating Hardware Tablespaces for Performance 4/27/2014

17 Caveats No way to check to see if to of the CHECK statements collide, such as they both cover the same day. Can’t use timestamp such as NOW() for use in queries because they will cause all tables to be searched. Each CHECK constraint on the parent table must be evaluated where queries are executed against the parent table. Use up to hundred of partitions but not thousands of partitions. Partitioning & Creating Hardware Tablespaces for Performance 4/27/2014

18 Tablespaces Partitioning & Creating Hardware Tablespaces for Performance 4/27/2014

19 Multiple Disks & Directories
Tablesspaces Multiple Disks A B C Tablespaces are physical disk locations that Postgres may use to store information. Tablespaces are cluster wide, this means that definition for the tablespace is at the cluster level and are not backed up by pg_dump. Multiple databases can use the same or different tablespaces. You may even select different tablespaces per table, index, etc. Tablespaces can only be created by superusers, but may be owned by a specified user. Multiple Directories C B A Multiple Disks & Directories E D C B A Partitioning & Creating Hardware Tablespaces for Performance 4/27/2014

20 Defining Tablespaces CREATE TABLESPACE fastspace LOCATION '/mnt/sda1/postgresql/data'; Partitioning & Creating Hardware Tablespaces for Performance 4/27/2014

21 Default Tablespace For any connection, you may change the default tablespace. SET default_tablespace = 'tbname'; RESET default_tablespace; Partitioning & Creating Hardware Tablespaces for Performance 4/27/2014

22 Tablespaces for your temp tables
SET temp_tablespaces = 'tbname,…'; RESET temp_tablespaces; Partitioning & Creating Hardware Tablespaces for Performance 4/27/2014

23 Tablespaces for different Drive types
Example: SSD Storage for data that you query all the time, large data sets, indexes, temp tables. Local Storage for data that you access regularly. NAS Storage for data access less often. Tape Library Storage for data that is only access one a year or less. Multiple Disks SSD Local NAS Tape Partitioning & Creating Hardware Tablespaces for Performance 4/27/2014

24 Caveats pg_dump will not dump your tablespace information. You must use pg_dump_all to get the information. Tablespaces can’t be created inside transcations. Tablespaces are only supported on systems that support symbolic links. Partitioning & Creating Hardware Tablespaces for Performance 4/27/2014

25 Using Partitions & Tablespaces
Partitioning & Creating Hardware Tablespaces for Performance 4/27/2014

26 Tablespaces for different Drive types
Example: SSD for last two months of data Local Storage for 3 months to 1 year NAS Storage for 1 to 5 years Tape Storage for 6 years + Multiple Disks SSD Local NAS Tape Partitioning & Creating Hardware Tablespaces for Performance 4/27/2014


Download ppt "Partitioning & Creating Hardware Tablespaces for Performance"

Similar presentations


Ads by Google