Download presentation
Presentation is loading. Please wait.
Published byHarjanti Sugiarto Modified over 5 years ago
1
Blog: http://Lloyd.TheAlbins.com/ViewingNewRecords
Slides:
2
Viewing New Records By Lloyd Albin
3
In this presentation we will cover:
What you thought you knew about copying new records from a table with a serial id is all wrong once you have multiple writes happening to your database inside multiple transactions. We ran across into a situation with a MS SQL where we missed getting some records from a table that is appended only. There is a serial primary key and we would grab any new serials since the last grab of data. We found that we missed some records due to the inserts being performed by more than one thread. So developers asked how could we prevent this problem with PostgreSQL. This presentation is my response to our developers on what caused this problem and how to avoid this problem in PostgreSQL. © Fred Hutchinson Cancer Research Center
4
Creating the Problem
5
Normal aka Single Threaded Inserts
Here we demo a single threaded or non- transactional inserts. This is exactly what most developers expect. CREATE TABLE public.testing ( id SERIAL, val TEXT, PRIMARY KEY(id) ) WITH (oids = true); INSERT INTO testing (val) VALUES ('test'); INSERT INTO testing (val) VALUES ('testing'); CREATE TABLE test_copy AS SELECT * FROM testing; INSERT INTO testing (val) VALUES ('tested'); INSERT INTO test_copy AS SELECT * FROM testing WHERE id > (SELECT max(id) FROM test_copy); SELECT * FROM testing; SELECT * FROM test_copy; id val 1 test 2 testing 3 tested public.testing id val 1 test 2 testing 3 tested public.test_copy © Fred Hutchinson Cancer Research Center
6
Multi-Threaded Inserts
Here we demo a multi-threaded or multi transactional inserts. The results are not what most developers expect. Lets create our table. Then insert some data via two transactions. Then make a copy of the data. Insert some more data and finish our transaction. Then copy the rest of the data. -- Thread 3 CREATE TABLE public.testing ( id SERIAL, val TEXT, PRIMARY KEY(id) ) WITH (oids = true); -- Thread 1 BEGIN; INSERT INTO testing (val) VALUES ('test’); -- Thread 2 INSERT INTO testing (val) VALUES ('testing’); COMMIT; CREATE TABLE test_copy AS SELECT * FROM testing; INSERT INTO testing (val) VALUES ('tested’); INSERT INTO test_copy AS SELECT * FROM testing WHERE id > (SELECT max(id) FROM test_copy); © Fred Hutchinson Cancer Research Center
7
Multi-Threaded Inserts
When we compare the two tables. We can see, that the two tables are not the same. This is because we could not view record 1 at the time we copied record 2 due to the transaction not having yet been completed. -- Thread 3 SELECT * FROM testing; SELECT * FROM test_copy; id val 1 test 2 testing 3 tested public.testing id val 2 testing 3 tested public.test_copy © Fred Hutchinson Cancer Research Center
8
Looking for the Solution
This will go through what I did to find a solution.
9
System Columns Let's take a look at the system columns to see if they can help. If you turned oid on, then the oid will have the same issue as our serial id and transaction id but will just reach the problem faster as it is used across many tables. The tableoid may be joined to pg_class.oid to find the table name and schema oid. So this is also of no help to us. The xmin is our transaction id. We can't use the transaction id because record 1, " ", is numerically before record 2 " ". -- Thread 3 SELECT oid, tableoid, xmin, cmin, xmax, cmax, ctid, * FROM public.testing; System Columns oid tableoid xmin cmin xmax cmax ctid id val (0,1) 1 test (0,2) 2 testing (0,3) 3 tested public.testing © Fred Hutchinson Cancer Research Center
10
System Columns The cmin lets us know the data line within the transaction, but that is no help either. xmax and cmax is our deleting transaction and deleting data row respectively and no help to us. The ctid is our unique record indicator, but it is in the format (page, line on page) and since record 1 was written before record 2, this is also no help to us. This means that there is nothing in the system columns that will help us figure out that we need to grab record 1. -- Thread 3 SELECT oid, tableoid, xmin, cmin, xmax, cmax, ctid, * FROM public.testing; System Columns oid tableoid xmin cmin xmax cmax ctid id val (0,1) 1 test (0,2) 2 testing (0,3) 3 tested public.testing © Fred Hutchinson Cancer Research Center
11
Page Item Attributes This requires the pageinspect extension to be installed and my heap_page_item_attrs_details function that can be obtained from my github page or the sql file from the slide download page. -- Thread 3 SELECT * FROM public.heap_page_item_attrs_details('public.testing'); p lp lp_off lp_flags lp_len t_xmin t_xmax t_field3 t_ctid heap_hasnull heap_hasvarwidth heap_hasexternal heap_hasoid 1 8152 41 (0,1) False True 2 8112 44 (0,2) 3 8072 43 (0,3) public.heap_page_item_attrs_details heap_xmax_keyshr_lock heap_combocid heap_xmax_excl_lock heap_xmax_lock_only heap_xmax_shr_lock heap_lock_mask False public.heap_page_item_attrs_details This contains the heap_page_item_attrs_details code. Pageinspect Extension © Fred Hutchinson Cancer Research Center
12
Page Item Attributes This requires the pageinspect extension to be installed and my heap_page_item_attrs_details function that can be obtained from my github page or the sql file from the slide download page. -- Thread 3 SELECT * FROM public.heap_page_item_attrs_details('public.testing'); heap_xmin_committed heap_xmin_invalid heap_xmax_committed heap_xmax_invalid heap_xmin_frozen heap_xmax_is_multi heap_updated True False public.heap_page_item_attrs_details heap_moved_off heap_moved_in heap_moved heap_xact_mask heap_natts_mask heap_keys_updated heap_hot_updated False True 2 public.heap_page_item_attrs_details © Fred Hutchinson Cancer Research Center
13
Page Item Attributes This requires the pageinspect extension to be installed and my heap_page_item_attrs_details function that can be obtained from my github page or the sql file from the slide download page. While this is a lot of great information for debugging issue with bloat, etc., none of it is actually useful in this case. -- Thread 3 SELECT * FROM public.heap_page_item_attrs_details('public.testing'); heap_only_tuple heap2_xact_mask t_hoff t_bits t_oid t_attrs False 32 Null {"\x ","\x "} {"\x ","\x e67"} {"\x ","\x "} public.heap_page_item_attrs_details © Fred Hutchinson Cancer Research Center
14
Transaction Commit Timestamps
Lets take a look at our transaction commit timestamps. Ops, we don't have that turned on in our config. This means that we need to edit the postgresql.conf and change "track_commit_timestamp" from off to on. Once you have done this, you will need to restart postgres. For any new transaction, there will now be a timestamp for each commit. There is a disk space price to pay for this feature, but for most people, this is a small price to pay. SELECT pg_catalog.pg_xact_commit_timestamp(xmin), * FROM public.testing; -- ERROR: could not get commit timestamp data -- HINT: Make sure the configuration parameter "track_commit_timestamp" is set. ALTER SYSTEM SET track_commit_timestamp TO 'on'; -- Now restart postgres DROP FUNCTION public.heap_page_item_attrs_details(table_name regclass); DROP EXTENSION pageinspect; DROP TABLE public.test_copy; DROP TABLE public.testing; -- Go to the slide “Multi-Threaded Inserts” and start over on the with the testing. track_commit_timestamp pg_catalog.pg_xact_commit_timestamp(xid) © Fred Hutchinson Cancer Research Center
15
Transaction Commit Timestamps
Now, we should get results that look like this. Since the pg_xact_commit_timestamp for Record 2 is before the pg_xact_commit_timestamp of Record 1, this means a solution is possible. Lets reset our tests, starting at the slide “Multi- Threaded Inserts”, and then test our solution. SELECT pg_catalog.pg_xact_commit_timestamp(xmin), * FROM public.testing; -- Reset the test DROP TABLE public.test_copy; TRUNCATE TABLE public.testing; pg_xact_commit_timestamp id val :22: 1 test :21: 2 testing 3 tested public.testing pg_catalog.pg_xact_commit_timestamp(xid) © Fred Hutchinson Cancer Research Center
16
Solution #1 Transaction Commit Timestamps
17
Solution #1 - Transaction Commit Timestamps
Let’s run our test again. This time we will create two tables. One that is our copy of the new records and one that lets us know what to grab. -- Thread 1 BEGIN; INSERT INTO testing (val) VALUES ('test’); -- Thread 2 INSERT INTO testing (val) VALUES ('testing’); COMMIT; -- Thread 3 -- Prepping for our copy CREATE TABLE test_copy (LIKE public.testing); CREATE TABLE test_copy_last_record ( t_type TEXT, t_time TIMESTAMP WITH TIME ZONE, PRIMARY KEY(t_type) ); INSERT INTO test_copy_last_record VALUES ('next', NULL), ('last', NULL); © Fred Hutchinson Cancer Research Center
18
Solution #1 - Transaction Commit Timestamps
First we get the latest transaction’s timestamp and store it as the next transaction we want. Now we do our data copy. Reading from the last timestamp to the next timestamp. The IS NULL is to catch the first use case. Then we need to update the last timestamp from next timestamp. Note: Because pg_xact_commit_timestamp is a stable function, this means that any time you call it with the same transaction id within a transaction, it will return the same result without having to re-compute it. This means that is you have have many rows inserted by a single transaction, it will only have to lookup the value once. -- Thread 3 -- Grabbing the latest transaction timestamp INSERT INTO test_copy_last_record (t_type, t_time) SELECT 'next', max(pg_catalog.pg_xact_commit_timestamp(xmin)) FROM public.testing ON CONFLICT (t_type) DO UPDATE SET t_time = EXCLUDED.t_time; -- Insert the new data after last timestamp up to and including next timestamp INSERT INTO test_copy SELECT * FROM public.testing WHERE pg_catalog.pg_xact_commit_timestamp(xmin) <= ( SELECT t_time FROM test_copy_last_record WHERE t_type = 'next’) AND ( pg_catalog.pg_xact_commit_timestamp(xmin) > ( SELECT t_time FROM test_copy_last_record WHERE t_type = 'last’) OR ( WHERE t_type = 'last') IS NULL); -- Update the last timestamp SELECT 'last', t_time FROM public.test_copy_last_record WHERE t_type = 'next’ pg_catalog.pg_xact_commit_timestamp(xid) ON CONFLICT DO UPDATE © Fred Hutchinson Cancer Research Center
19
Solution #1 - Transaction Commit Timestamps
Now we finish our longer running transaction. Now we do the update routines again. EXCLUDED is a special table name that references the values originally proposed for the insert. -- Thread 1 INSERT INTO testing (val) VALUES ('tested’); COMMIT; -- Thread 3 -- Grabbing the latest transaction timestamp INSERT INTO test_copy_last_record (t_type, t_time) SELECT 'next', max(pg_catalog.pg_xact_commit_timestamp(xmin)) FROM public.testing ON CONFLICT (t_type) DO UPDATE SET t_time = EXCLUDED.t_time; -- Insert the new data after last timestamp up to and including next timestamp INSERT INTO test_copy SELECT * FROM public.testing WHERE pg_catalog.pg_xact_commit_timestamp(xmin) <= ( SELECT t_time FROM test_copy_last_record WHERE t_type = 'next’) AND ( pg_catalog.pg_xact_commit_timestamp(xmin) > ( SELECT t_time FROM test_copy_last_record WHERE t_type = 'last’) OR ( WHERE t_type = 'last') IS NULL); -- Update the last timestamp SELECT 'last', t_time FROM public.test_copy_last_record WHERE t_type = 'next’ © Fred Hutchinson Cancer Research Center
20
Solution #1 - Transaction Commit Timestamps
Now we finish our longer running transaction. Now we do the update routines again. -- Thread 3 SELECT * FROM test_copy; id val 1 test 2 testing 3 tested public.test_copy © Fred Hutchinson Cancer Research Center
21
Solution #1 - Transaction Commit Timestamps
Here we can see the original code and an alternate version that will be faster running. -- Thread 3 -- Original code -- Grabbing the latest transaction timestamp on table INSERT INTO test_copy_last_record (t_type, t_time) SELECT 'next', max(pg_catalog.pg_xact_commit_timestamp(xmin)) FROM public.testing ON CONFLICT (t_type) DO UPDATE SET t_time = EXCLUDED.t_time; -- Alternate faster code -- Grabbing the latest transaction timestamp SELECT 'next', "timestamp" FROM pg_catalog. pg_last_committed_xact() pg_catalog.pg_xact_commit_timestamp(xid) pg_catalog. pg_last_committed_xact() © Fred Hutchinson Cancer Research Center
22
Solution #2 Universal SQL Solution
23
Solution #2 - Universal SQL Solution
Lets reset our test again and then insert our first two records. -- Thread 3 DROP TABLE public.test_copy; TRUNCATE TABLE public.testing; -- Thread 1 BEGIN; INSERT INTO testing (val) VALUES ('test’); -- Thread 2 INSERT INTO testing (val) VALUES ('testing’); COMMIT; © Fred Hutchinson Cancer Research Center
24
Solution #2 - Universal SQL Solution
This time we will create a table to store all our copied ids. Add records to test_copy that do not appear in copied_ids. Update copied_ids with new ids from test_copy. The Primary Key could instead be a Unique Key. This is especially important if you are using a composite key and one or more of the fields can be null. -- Thread 3 BEGIN; -- Setup our two tables CREATE TABLE public.copied_ids ( id INTEGER, CONSTRAINT copied_ids_idx PRIMARY KEY(id) ); CREATE TABLE test_copy (LIKE public.testing); -- Add new records INSERT INTO test_copy SELECT testing.* FROM public.testing LEFT JOIN copied_ids ON testing.id = copied_ids.id WHERE copied_ids.id IS NULL; -- Add new ids that were copied, so that we don't copy them again. INSERT INTO copied_ids SELECT test_copy.id FROM public.test_copy COMMIT; CREATE TABLE LIKE © Fred Hutchinson Cancer Research Center
25
Solution #2 - Universal SQL Solution
Now we can copy our last record and then copy the new records. Now we can view our test_copy table to make sure it has the correct data. -- Thread 1 INSERT INTO testing (val) VALUES ('tested’); COMMIT; -- Thread 3 -- Add new records INSERT INTO test_copy SELECT testing.* FROM public.testing LEFT JOIN copied_ids ON testing.id = copied_ids.id WHERE copied_ids.id IS NULL; -- Add new ids that were copied, so that we don't copy them again. INSERT INTO copied_ids SELECT test_copy.id FROM public.test_copy SELECT * FROM test_copy; id val 1 test 2 testing 3 tested public.test_copy © Fred Hutchinson Cancer Research Center
26
Solution #2 - Universal SQL Solution
While the previous code was universally usable, there are some more PostgreSQL specific version. Version 1 Using USING instead of ON Version 2 Using EXCEPT instead of LEFT JOIN -- Thread 3 -- Version 1 -- Add new records INSERT INTO test_copy SELECT testing.* FROM public.testing LEFT JOIN copied_ids USING(id) WHERE copied_ids.id IS NULL; -- Add new ids that were copied, so that we don't copy them again. INSERT INTO copied_ids SELECT test_copy.id FROM public.test_copy LEFT JOIN copied_ids USING (id) -- Version 2 SELECT testing.* FROM ( SELECT id FROM public.testing EXCEPT SELECT id FROM public.copied_ids ) AS new_ids LEFT JOIN public.testing USING (id); © Fred Hutchinson Cancer Research Center
27
MS SQL MS SQL Solution
28
MS SQL – MS SQL Solution Use one of the above tables to view the commit_ts which is assigned when the transaction commits and can be used to tell what order the transactions were committed. The xdes_id will be your transaction id that you will need to match up to transaction that committed each record. When I tested this view today, they were empty, but this is where Microsoft says the information should be. May need to turn on a feature that I have not turned on. -- SQL Server (starting with 2008) -- Azure SQL Database SELECT * FROM sys.dm_tran_commit_table; -- Azure SQL Data Warehouse -- Parallel Data Warehouse SELECT * FROM sys.dm_pdw_nodes_tran_commit_table; sys.dm_tran_commit_table sys.dm_pdw_nodes_tran_commit_table commit_ts xdes_id commit_lbn commit_csn commit_time pdw_node_id sys.dm_tran_commit_table © Fred Hutchinson Cancer Research Center
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.