Surrogate Key

1.
Common Key Terminology Let's start by describing some common terminology pertaining to keys and then work through an example. These terms are:
y
Key. A key is one or more data attributes that uniquely identify an entity. In a physical database a key would be formed of one or more table columns whose value(s) uniquely identifies a row within a relational table. Composite key. A key that is composed of two or more attributes.
Natural key. A key that is formed of attributes that already exist in the real world. For example, U.S. citizens are issued a Social Security Number (SSN) that is unique to them (this isn't guaranteed to be true, but it's pretty darn close in practice). SSN could be used as a natural key, assuming privacy laws allow it, for aPerson entity (assuming the scope of your organization is limited to the U.S.).
Surrogate key. A key with no business meaning.
Candidate key. An entity type in a logical data model will have zero or more candidate keys, also referred to simply as unique identifiers (note: some people don't believe in identifying candidate keys in LDMs, so there's no hard and fast rules). For example, if we only interact with American citizens then SSN is one candidate key for the Person entity type and the combination of name and phone number (assuming the combination is unique) is potentially a second candidate key. Both of these keys are called candidate keys because they are candidates to be chosen as the primary key, an alternate key or perhaps not even a key at all within a physical data model.
Primary key. The preferred key for an entity type.
Alternate key. Also known as a secondary key, is another unique identifier of a row within a table.
Foreign key. One or more attributes in an entity type that represents a key, either primary or secondary, in another entity type.
I prefer using surrogate keys because natural keys are by default a subject to change which is a bad behavior for a row identifier. But lets dig a bit deeper into each key type to see why this is. Heres a little table with column names that tell us what kind of a key each column is.
Surrogate keys A surrogate key is a row identifier that has no connection to the data attributes in the row but simply makes the whole row unique. And that property is also the downside of it. Because it has no connection to the data attributes we can have two rows with the exact same data in all columns except the key column. This is usually handled at the application side and is an acceptable downside. An example of a surrogate key is an integer identity or a GIUD unique identifier. Ive never seen another data type being used as a surrogate key successfully. Both have their pros and cons though. GUID unique identifier GUID is globally unique 16 byte long data type that can have 2128 different values. This makes it ideal for scenarios with multiple server moving data from one to another like replication. However for a key 16 bytes is really a lot. This causes less data to be available on a single data page which in turn causes extra IO activity because it has to retrieve more data pages. Another issue about it is that is causes perfect page splits in a clustered index because it has random 100% selectivity in its entire data type range. Integer identity Integer identity is either 4 byte INT with range from -2,147,483,648 to 2,147,483,647 or 8 byte BIGINT with range from 9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. In 99.9% of cases this range is split in half because the default setting in SQL Server is to start any integer data type from 0. As this is a surrogate key this makes no sense and theres no reason it shouldnt start from the min value.
It is a small data type which gives it the advantage of having more data in the data pages thus needing less IO for the same amount of data. Unlike the GIUD unique identifier the integer identity has ever increasing 100% selectivity in its entire data type range. This makes it a perfect candidate for a clustered because it doesnt cause page splits. If it actually is an appropriate candidate for a clustered index is a different matter. Its downside is that it is not ideal for multi server scenarios although it can be done by using another tinyint column identifying a location and making it a covering row identifier over ID and LocationId columns. And remember: Never tie any business logic to the surrogate key other than simple CRUD operations. Natural Keys A natural key is a row identifier composed of data that uniquely describes data using its own attributes. An example of a natural key is social security number or other government issued number. However this presents a huge problem from the physical database implementation point of view. In most databases a row identifier is usually also the basis for the clustered index and non-clustered indexes. But natural keys are by definition a subject to change. When the clustered index key is changed ALL indexes have to be rebuilt because nonclustered indexes contain the full key of the clustered index. So every time the natural key, which is also a clustered index changes, all indexes have to be rebuilt. And this is not including changing the actual data type or its size, jut the key value. At this point someone might say: Yes Mladen youre right about the theory of this but how many times have you seen the Natural key really change? Well so far Ive seen it 2 times both with heavy consequences. It was 2 times too many. Natural Key Fail Case 1: It was a standard customer, product, order type of application. The key in this case was the 7 char long customer ID. It was a mix of first 3 letters of the customer name plus 4 numbers that also had some business meaning. The company got acquired by another company and a new customer numbering was introduced. Every key in that database had to be changed. Due to fully breaking changes to the database the whole application had to be modified and the store went offline for 3 months loosing the company a
lot of profit. All this wouldnt have happened if they had used surrogate keys. Natural Key Fail Case 2: This one was even more far reaching. In Slovenia (my home country) we have something called a Tax ID. This is an ID that is unique for companies and individuals so every person and every company has one for tax purposes. Many systems in Slovenia used it as the natural never changing key which sounded like a reasonable thing at the time. And it was so for over 30 years. Applications came and went. But in 2004 Slovenia entered into the European Union. So we had to modify the TaxId to European standards which means that every application using it had to be changed. I know of at least one company that went out of business because of this change. Again had they used a surrogate key the only change would be the length of the TaxId column. Because of all this Ive come to the prefer the surrogate keys in majority of cases. Hopefully this gives you some insight why surrogates are in my opinion better suited as row identifiers. Although whichever you choose is still a matter of common sense and your business problem. The answer is always It depends.

Surrogate Key

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Surrogate Key

Caricato da

Copyright:

Formati disponibili

1.

Surrogate key. A key with no business meaning.

Primary key. The preferred key for an entity type.

Potrebbero piacerti anche