Here’s my second article on database design basics. I’m hoping to tackle other database design topics in future articles – let me know if there’s anything you’d like me to cover.
In part 1, I covered “CRUD’ – Creating, Reading, Updating and Deleting records.
I’m going to cover the basic building block for data storage in a database – the table. A database usually has more than one table. Tables can have relationships with other tables – and this is what makes a ‘relational database’.
I’ll be illuminating the theory with an example of an online shop. The shop has customers and products, and the customers place orders for the products.
A table comprises a set of records. A record relates to one ‘thing’. In our example, we’d have a table of customers. Each record in the table would correspond to a single customer. If we had 100 customers, there would be 100 records in our table of customers.
The records in this table all have the same format – which is why we can store them in the same table. The format is just a bunch of fields; for our customer table, we might have fields like ‘first name’, ‘surname’, ‘email address’ etc. Fields are also called columns.
If you’re familiar with spreadsheets, then it’s useful to think of a table as a single sheet in the spreadsheet. Each row corresponds to a record; each column is a field.
Now, consider the products that we want to sell. They might have fields such as ‘Product name’, and ‘unit price’. These are different fields to those in the customer table, so it’s not appropriate to store them in the same table. We need a new table, ‘products’, with different fields. Each record in this ‘products’ table corresponds to a single product that we might sell.
We’ll also have a new table for the customer orders.
As an aside, there are two schools of thought about whether a table should be named in the singular (e.g. ‘customer’, ‘order’, ‘product’) or if plural names are more appropriate. I’ve gone for plural but this is definitely open to challenge!
A key design principle for databases is to avoid duplicating data that we hold in the database. Duplication is sometimes required or unavoidable, but these are the exceptions rather than the rule.
For example, we don’t want to store a customer’s address with every order that they make; this would be wasteful of space and would also give us more work to do if they moved – we’d have to update all the places where we stored the address.
We do however want to link an order to the customer who placed it. We can do this by having one table link to another with a relationship. The order table needs to contain a field which references the customer table, so that we know who placed the order. That field needs to uniquely identify the customer; it’s no good linking on first name as there could be more than one customer with the same first name.
This brings me on to the concept of keys. In the customer table, we could identify each customer with a unique number – a key. If we add a new customer, we could get the database to automatically generate a new unique number. Usually this key is called a ‘primary key’. The primary key can be a single field or it can be made up of multiple fields – we’ll just use a single field primary key to keep things simple.
We would then use this key in the order table, so we can uniquely identify the customer for a given order. As the ID is a key in another table, we refer to this as a ‘foreign key’.
Bingo! We’ve linked the two tables together and have avoided duplicating data. This is called a relationship.
We also need to link the order to the product(s). So the order table actually links to two other tables – both the customer table and the product table.
I’ve simplified this example, but imagine if we wanted more than one product in an order (e.g. if someone checked out two things in their shopping basket). We could easily deal with that by having two rows in the order table (and re-purposing the order ID to be something like an order line ID), or by having another table which linked orders to individual lines in the order.
As well as using keys to link records in two different tables together, we can also enforce such a rule. This is done by configuring the database such that an order record always has to have a link to a valid customer record, otherwise the database won’t allow the change. This is called enforcing ‘referential integrity’. It’s a good idea to do this so that we don’t end up with incomplete data by accident (in this case, having an order without a customer would be nonsensical).
There are a few flavours of relationships:
- 1-1. One record in table A can only be linked to one record in table B.
- 1-to-Many, or Many-to-1. One record in table A can be linked to one more records in table B. This is the relationship between customer and order; one customer can have many orders, but an order can only have one customer
- Many-to-Many relationship. One or more records in table A can be linked to one or more records in table B. This applies for the link between orders and products; one order can link to multiple products, and a product can be linked to multiple orders.
There are formal ways of representing these links on diagrams. Instead of having arrowheads (like in the diagrams above), we use a plain single line to show a ‘1 to’ end of a relationship. A ‘crows foot’ – a split of 3 lines at the end of the connecting line – shows a ‘many to’ end of a relationship. These relationships look like this:
When we link tables, we need to be careful when we delete records. If we delete a product (e.g. because it’s now unavailable), and that product was included in order records (because it was previously ordered, when it was available), then the order records won’t be complete any more. Furthermore, if we enforce referential integrity in the database, the database won’t even allow us to delete the record if it is linked to other records in other tables.
This is a good reason not to delete records – instead we could have a field in the products table which shows if the product is unavailable or obsolete. We could use this field to filter out products from our catalogue, so we don’t show these to the user and they can’t be added to new orders; however, they’re still there in the database so that we have a record of what was ordered previously.
This article builds on the basic concepts introduced in part 1.
Tables contain records of identical format, and are the building blocks for data in a database. To avoid duplication of data, we use relationships between tables to link different types of records from different tables. Deleting records might have consequences for these relationships, so it’s worth considering marking obsolete records as such, rather than removing them from the database.
Neil Tubman, Terzo Digital, November 2017