1 Overview¶

In the Gen3 Butler Registry (RFC-484), IDs must be universally unique across all Gen3 Butler instances. This is because the design requires the union of tables across multiple registry instances to search across multiple sources of data. Additionally, the key generation method must work without connectivity to LSST central services. In the case of local Gen3 Butler instances with or without connectivity, IDs do not have to be universally unique at generation time. They can be made unique later when required. Strong preference should be given to an id generation method that works in all use cases.

Examples of Gen3 Butler usage requiring universally unique IDs:

Data Backbone may do ingestions at any endpoint and must not depend upon a particular endpoint being available to generate unique IDs for all endpoints.

Staff may want to use data from the Observatory Operations Data Service and the Data Backbone in the same task.

Users may generate their own data in their own Registry and want to use it and Data Backbone data in the same task.

Users may generate their own data in their own Registry and want to share with other users.

This document describes performance and scalability findings for three candidate key generation methods used to support the universally unique IDs requirement for the Gen3 Butler software.

Based on these findings, we recommend a two-column (SiteID, ID) database generated key method. This option scales as well or better than any other option and is least likely to cause additional issues.

Note

In the following discussions, there are references to a merge process. Merge takes data from one source Registry (say inside a compute job) and loads it into another destination Registry. This merge will make IDs universally unique when necessary.

2 Description of ID generation methods¶

2.1 Registry software generates UUID1 key values.¶

Pros:

Does not rely on any database implementation for key generation.

Merge is just (bulk) inserts as IDs are already unique.

Cons:

UUID1 values are dependent on the hardware used to generate the IDs. Values are pseudo sequential with some chance of duplicate values being generated by separate processes running on the same hardware at the same time.

The sequential nature of the values results in block contention with enough concurrent sessions and insertion rates.

2.2 Registry software generates UUID4 key values.¶

Pros:

Does not rely on any database implementation for key generation.

Merge is just (bulk) inserts as IDs are already unique.

UUID4 are completely random. Duplicate values are not a concern.

Cons:

Due to the randomness of the values, UUID4 primary key maintenance increasingly becomes an issue as table and index size continue to grow relative to the size of the SGA.

2.3 Registry uses local database sequences.¶

A database sequence is the usual method to generate unique IDs for use within a database. However, a sequence alone will not meet the universally unique ID requirement across all Butler registries as there cannot be a central source of sequence values. A universally unique ID can be generated with the combination of local database sequences and an ingestion site ID that uniquely identifies each Gen3 Butler registry.

We considered several different approaches to implementing this. The most obvious implementation stores these IDs as a two-column primary key. Other implementations store these IDs as a single column value.

The variations in using local database sequences and site IDs for key generation all have similar pros and cons with respect to performance and scalability of database activity. Because of this, only one test is shown for this class of key generation methods (SEQ=two columns). Where relevant, differences will be noted in the “Test results and discussion” section of this document.

Pros:

Inserts perform better even when tables are large.

Cons:

Keeping site IDs unique is mostly policy. Some IDs would be reserved for the Data Backbone endpoints.

Merge becomes more complicated in batch jobs to minimize site IDs for a particular endpoint or user. IDs in the source registry must be updated to appropriate IDs in the destination registry.

As insertion rates and concurrency increase, blocks become “hot” and sessions begin to wait to acquire locks and latches used to manage concurrency.

Note

Special consideration must be given to raw images. They will often be ingested by more than a single Butler registry independent of the Data Backbone. Raw file key generation should be deterministic across all Butler registries. This makes sense for a number of reasons including easing migration of output datasets between Gen3 Butler registries.

Similar to raw images, certain EFD LFA files have the same requirements for the same reasons.

Note

Also considered was an external generator of unique IDs such as a central Oracle database providing unique sequence values or open source services such as Twitter Snowflake. This was not included as part of the testing because Gen3 Butler requires the ability to perform registry inserts without network connectivity to any particular service. This would also introduce additional dependencies with availability consequences.

3 Description of tests performed¶

All tests were performed on a 3-node Oracle 12.2.0.1 RAC database with a 4GB SGA per instance. Each instance is running on a server with 20 Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz cores and 128GB of RAM. The database instances were purposely made small to highlight differences in key generation methods.

Separate tables were created for each ID generation method. Each table was seeded with approximately 20 million rows. The initial seeding of data was necessary to demonstrate the scalability issues inherent with UUID4 based IDs.

Database DDL database_ddl.sql

Each ID generation method was tested by calling a bash script to run 30 concurrent jobs of the $1 supplied python script. Each python script generates 10,000 inserts utilizing its respective key generation method.

./test_harness.sh uuid1_inserts.py
./test_harness.sh uuid4_inserts.py
./test_harness.sh seq_inserts.py

Bash control script test_harness.sh

UUID1 key generation script uuid1_inserts.py

UUID4 key generation script uuid4_inserts.py

SEQ key generation script seq_inserts.py

4 Test results and discussion¶

Active session history data was collected for each key generation method that was tested. Tanel Poder’s ashtop.sql script was used to highlight differences in the workload profile for each method.

Of primary concern is the total time taken to complete each test. The output below clearly shows the UUID4 approach takes significantly more time to complete than the other two methods.

SQL> @ashtop module “module in (‘SEQ’,’UUID1’,’UUID4’)” sysdate-1 sysdate

Seconds	MODULE
2033	UUID4
397	UUID1
220	SEQ

The following output breaks down the total time for each test by instrumented wait event per database object. More detailed discussion follows the output below but you can see that database work involving index maintenance accounts for most of the processing time regardless of key generation method.

SQL> @ashtop event2,objt “module = ‘UUID4’” sysdate-1 sysdate

Seconds %This EVENT2 OBJT

1975 97% db file sequential read UUID4_PK [INDEX]

29 1% ON CPU UUID4_PK [INDEX]

12 1% gc current grant 2-way UUID4_PK [INDEX]

5 0% db file scattered read UUID4_PK [INDEX]

5 0% gc buffer busy acquire [data block] UUID4 [TABLE]

2 0% read by other session UUID4 [TABLE]

1 0% ON CPU UUID4 [TABLE]

1 0% db file scattered read UUID4 [TABLE]

1 0% db file sequential read UUID4 [TABLE]

1 0% gc current multi block request UUID4 [TABLE]

1 0% read by other session UUID4_PK [INDEX]

11 rows selected.

SQL> @ashtop event2,objt “module = ‘UUID1’” sysdate-1 sysdate

Seconds %This EVENT2 OBJT

180 45% buffer busy waits [data block] UUID1_PK [INDEX]

91 23% read by other session UUID1_PK [INDEX]

45 11% ON CPU UUID1_PK [INDEX]

43 11% enq: TX - index contention UUID1_PK [INDEX]

20 5% gc buffer busy acquire [data block] UUID1_PK [INDEX]

9 2% latch: ges resource hash list UUID1_PK [INDEX]

6 2% db file sequential read UUID1_PK [INDEX]

1 0% ON CPU UUID1 [TABLE]

1 0% db file scattered read UUID1 [TABLE]

1 0% db file sequential read UUID1 [TABLE]

SQL> @ashtop event2,objt “module = ‘SEQ’” sysdate-1 sysdate

Seconds %This EVENT2 OBJT

92 42% buffer busy waits [data block] SEQ_PK [INDEX]

60 27% ON CPU SEQ_PK [INDEX]

38 17% enq: TX - index contention SEQ_PK [INDEX]

16 7% latch: ges resource hash list SEQ_PK [INDEX]

6 3% library cache: mutex X SEQ_PK [INDEX]

4 2% ON CPU SEQ [TABLE]

1 0% buffer busy waits [data block] SEQ [TABLE]

1 0% buffer deadlock SEQ_PK [INDEX]

1 0% enq: FB - contention SEQ_PK [INDEX]

1 0% latch: ges resource hash list

Not shown are workload profiles for each key generation test with minimal existing data in target tables. Those tests showed UUID4 to be faster than the other two methods. This is due to the randomness of the key values preventing “hot” blocks from being an issue during index maintenance. Inserts are spread across all the blocks in the index.

However, once enough data accumulates in the UUID4 table relative to SGA size, PK index blocks needed for inserts are less and less likely to be found in cache. A tipping point is eventually reached and the UUID4 workload profile becomes dominated by physical I/O. As shown in the output above 97% of database time was associated with the “db file sequential read” physical I/O wait event causing the UUID4 test to take more than four times as long as the other two methods to complete when target tables have approximately 20 million preexisting rows. This problem continues to get worse as data accumulates preventing the UUID4 method from scaling. This also has negative performance side effects for simultaneous database activity since the cache is constantly being filled with index blocks needed for UUID4 key maintenance. Due to UUID4 key generation’s lack of scalability and negative side effects on other concurrent workloads we don’t recommend this option.

UUID1 and database key generation methods do not suffer from the same physical I/O and scalability problems as UUID4 but do suffer from concurrency issues associated with “hot” blocks. When sessions aren’t on CPU getting work done, they are waiting for concurrency controls to be released by other sessions for the index blocks that need modification. These concurrency issues are independent of existing table and index size and can be alleviated to some extent with well-known techniques including hash partitioned indexes, “pctfree” index attribute, “minimize records_per_block” table attribute, per instance sequence caching, and possibly Oracle’s new scalable sequence type.

UUID1 key generation suffers from the possibility of duplicate values produced by separate processes running on the same hardware generating values at the same time. Duplicate keys were generated on three occasions during performance testing. It’s for this reason we don’t recommend the use of UUID1 based keys.

Requirements for the Gen3 Butler software mandate universally unique IDs across all registries. We can satisfy this requirement by allocating unique site IDs so that keys can be maintained independently at each site. The combination of the two satisfies universal uniqueness.

There was additional discussion on whether to implement the local database sequence and site ID as a single column or two-column primary key.

One single column implementation uses the decimal portion of a number datatype to designate a static site ID for each Butler registry and bytes to the left of the decimal point for sequence generated uniqueness within sites. This strategy was abandoned because of concerns associated with implicit rounding while values are passed between variables in different programming languages. This method also violates first normal form and requires extra interpretation to extract the full meaning of the value.

Another single column implementation converts sequence and fixed width site ID values to string datatype, appends the two together, and converts back to a number. While this works logically, there is unnecessary overhead during datatype conversion and it offers no real advantage over the two-column implementation. As with the previous method, this violates first normal form. We do not recommend this option.

Finally, the two-column database generated approach was considered. This solution does not suffer from the performance overhead, rounding concerns, or violation of database normalization that the other database ID generation methods do. It is, however, the option that requires the most development work to accommodate the merge process.

5 SQLite key generation method¶

SQLite’s concurrency model and standard ID generation method are significantly different than the Oracle database. Here we provide sample code to demonstrate the feasibility of a two column universally unique ID approach in an SQLite database.

5.1 SQLite DDL¶

CREATE TABLE t(id integer not null,
    site_id integer not null,
    primary key (id, site_id));

CREATE TABLE id_tracker(table_name varchar(100),
    curr_max_id integer, primary key (table_name));

INSERT INTO id_tracker VALUES('t',1);

5.2 SQLite ID generation script¶

#!/usr/bin/env python

import sqlite3

site_id          = 9999
batch_id_size    = 100
table_name       = 't'
sqlite_filename  = 'test.sqlite3'

sql_upd_max_id   = """update id_tracker 
                        set curr_max_id = curr_max_id + ?
                        where table_name = ?"""

sql_get_max_id   = """select curr_max_id 
                        from id_tracker 
                        where table_name = ?"""

sql_ins_t        = "insert into t(id, site_id) values(?,?)"

trying_to_insert = True

while trying_to_insert:
    try:  
        with sqlite3.connect(sqlite_filename) as conn:
            cur = conn.cursor()
            cur.execute ("begin exclusive transaction")
            cur.execute(sql_upd_max_id, [batch_id_size, table_name])
            cur.execute(sql_get_max_id, table_name)
            max_id = cur.fetchone()[0] + 1
            t_id_batch = [(new_id, site_id) for new_id in range(max_id - batch_id_size , max_id)]
            cur.executemany(sql_ins_t, t_id_batch)
            trying_to_insert = False

    except sqlite3.OperationalError:
        print("Database was locked")



  

5.3 SQLite test harness bash script¶

#!/bin/bash

for i in {1..50}
do
  $1 & > /dev/null 2>1
done

6 Scripts used for testing¶

6.1 Database DDL¶

create table uuid1 
   (pk_col raw(16) not null enable, 
    uuid varchar2(128 byte), 
    constraint uuid1_pk primary key (pk_col) using index);
    
create table uuid4 
   (pk_col raw(16) not null enable, 
    uuid varchar2(128 byte), 
    constraint uuid4_pk primary key (pk_col) using index);
    
create or replace package pkg_butler_id as 
  c_butler_site_id constant number := 9999;
end pkg_butler_id;
/

create table seq
   (pk_col1 number not null enable, 
    pk_col2 number not null enable, 
    uuid varchar2(128 byte), 
    constraint seq_pk primary key (pk_col1,pk_col2) using index);

create sequence  seq_seq1 
  increment by 1 
  start with 1 cache 1000 
  noorder  nocycle  nokeep  noscale  global ;

create or replace package pkg_butler_id as 
  c_butler_site_id constant number := 9999;
end pkg_butler_id;

create or replace trigger seq_trg 
before insert on seq 
for each row 
begin
  <<column_sequences>>
  begin
    if inserting and :new.pk_col1 is null then
      select pkg_butler_id.c_butler_site_id, seq_seq1.nextval 
        into :new.pk_col1, :new.pk_col2 
        from sys.dual;
    end if;
  end column_sequences;
end;
/

alter trigger seq_trg enable;

6.2 Test harness bash script¶

#!/bin/bash

for i in {1..30}
do
  $1 & > /dev/null 2>1
done

6.3 UUID1_inserts.py¶

#!/usr/bin/env python

import uuid
import cx_Oracle

sql = "insert into uuid1(pk_col, uuid) values (:uuid1, :uuid1)"

connection = cx_Oracle.connect("/@lsst_id_test")
connection.module = "UUID1"
connection.action = "UUID30x10000"
cur = connection.cursor()

for i in range(0,10000):
    cur.execute(sql,uuid1=str(uuid.uuid1()).replace('-',''))

cur.close()
connection.commit()

6.4 UUID4_inserts.py¶

#!/usr/bin/env python

import uuid
import cx_Oracle

sql = "insert into uuid4(pk_col, uuid) values (:uuid4, :uuid4)"

connection = cx_Oracle.connect("/@lsst_id_test")
connection.module = "UUID4"
connection.action = "UUID4_30x10000"
cur = connection.cursor()

for i in range(0,10000):
    cur.execute(sql,uuid4=str(uuid.uuid4()).replace('-',''))

cur.close()
connection.commit()

6.5 SEQ_inserts.py¶

#!/usr/bin/env python

import uuid
import cx_Oracle

sql = "insert into seq(uuid) values (:uuid4)"

connection = cx_Oracle.connect("/@lsst_id_test")
connection.module = "SEQ"
connection.action = "SEQ_30x10000"
cur = connection.cursor()

dummy_uid = 'd79ec82f952243529d9a2d09be9f7ed9'

for i in range(0,10000):
    cur.execute(sql,uuid4=dummy_uid)

cur.close()
connection.commit()

DMTN-099: Options for Generating Unique IDs in the LSST Gen3 Butler Registry