Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
of Contents
1. Introduction
2. Using PostgreSQL
3. Data Organization
4. Getting Data Out
5. Data Aggregation
6. Integrity Constraints
7. Joining Tables
8. Indexing & Performance
9. Postgres-Specific Types
10. Database Administration
11. Appendix
Using PostgreSQL
Interacting with PostgreSQL directly
There are many ways you can interact with your database, sometimes the best
way is to send commands directly. This is useful when troubleshooting or testing
out a new query.
Using psql
The psql utility lets you have access to your database via a command-line
interface similar to irb. To open a connection to the database using the default
user (postgres), use this command:
psql -U postgres
In Ubuntu that may not work out of the box so try this instead:
sudo su postgres
psql
Now you should see a prompt that looks something like this:
postgres=#
This means postgres is waiting for you to start typing commands. In this case a
postgres-specific command or a SQL query. Postgres commands are preceded
with a backslash character. For example, to get the help menu type \? and to exit
type \q .
Here is a table with the most useful commands:
Command
Description
\l
List databases
\dt
List tables
\dn
List schemas
\d table_name
\q
Quit psql
\s
Show history
\du
\x
\password
Change password
Other useful commands include dumping the output from a query to a file. For
example:
SELECT * FROM users \g /tmp/users.txt
Note: You want to practice on your own local database to avoid causing any
problems. Refer to the Appendix to learn how to install and setup PostgreSQL.
Its good to know that there is a ~/.psqlrc file you can use to enable certain
options every time you start psql. For example, in mine I have \x auto to let
postgres choose the best output format depending on the situation. Check out the
help menu (with \? ) for other options that you may find useful.
Using pgAdmin
If you like graphical tools then pgAdmin might be for you. If you have a debianbased system you should be able to install pgAdmin like this:
apt-get install pgadmin3
To connect to a new server click on the plug icon or use the menu option File ->
Add server. At a minimum you need to provide:
A name to identify this server
Hostname or ip address
User and password
Now you can right-click on it and select Connect, this will show all the databases
in your server. You can navigate by expanding the trees (click on the + symbol).
You will find your tables under Schemas. You will also see entries for things you
might not be familiar with (like domains or collations), dont worry about those.
Then we need to connect to the database, execute our query and output the
results.
Here is an example:
require 'pg'
OUTPUT_FORMAT = "%7d | %-16s | %-8s | %s"
OUTPUT_FIELDS = ['pid', 'application_name', 'state', 'query']
# Establish a database connection
conn = PG.connect(dbname: 'postgres', user: 'postgres')
# Our query, ready to go!
query = "SELECT * FROM pg_stat_activity"
# Execute the query
conn.exec(query) do |result|
result.each { |row| puts OUTPUT_FORMAT % row.values_at(*OUTPUT_FIELDS) }
end
This example will connect to our local database using the postgres user, then
execute a query and nicely output the results.
Note: if you get an authentication error when trying out this code you may
need to setup host-based authentication (instructions are on the database
administration chapter)
The connect method can take other options if you need to, here is a list: host,
port, user, password, dbname, connect_timeout.
The complete documentation for pg is available here:
http://deveiate.org/code/pg/index.html
ActiveRecord
ActiveRecord is what is known as an ORM (Object-Relational Mapping). An ORM
maps database tables to classes and objects in our application. It does this by
generating SQL queries for us and running them on the database on our behalf.
Once everything is setup correctly, getting all the users from our database is as
easy as Users.all .
ActiveRecord is very popular because it plays a central role in the Ruby on Rails
framework.
Alternative ORM frameworks for Ruby include: Sequel and ROM.
This example assumes that we have an users table and one user with an id of 1.
Data Organization
Before we can insert any data into our database we need to understand how data
is organized in PostgreSQL (and in relational databases in general).
Data is stored in rows, and each row is composed of columns. A group of rows
using the same columns is called a table. And finally, tables are grouped into
databases.
After learning about data types we will see how we can create our own tables.
Right now lets see how we can explore what tables are available on our system.
Open a database connection via psql and type \dt , this will list all the tables. If
you have loaded the example data (instructions in the Appendix) you should be
seeing something like this:
List of relations
Schema | Name | Type | Owner
--------+------------+-------+---------public | countries | table | postgres
public | users | table | postgres
In postgres there is also the concept of schemas, which are just namespaces for a
group of tables. By default all tables are created in the public schema. You can
get a list of schemas in the current database by using the \dn command in psql.
If you create tables in another schema you will have to access them using this
format: schema.table . You can avoid this if you add your schema to the schema
search path.
To view the current search path:
SHOW search_path;
When changing the search_path new tables created during the current psql
session will be created under the schema thats first on the list. In this case
my_new_schema .
Data Types
Every column in our database can hold one specific data type. There are many
data types available in PostgreSQL. Here is a table covering the most useful
ones:
Type
Valid values
integer
-2147483648 to +2147483647
bigint
-9223372036854775808 to +9223372036854775807
decimal
serial
text
bool
t, true, f, false
timestamp
timestamptz
time
Creating Tables
Now that we know how PostgreSQL is structured we are ready to create our own
tables. We can do this using CREATE TABLE.
This is the general form for CREATE TABLE :
CREATE TABLE <table_name> (<columns>);
Once we send this query to the database via psql we should be able to see our
new table listed when we do \dt .
List of relations
Schema | Name | Type | Owner
-------+---------+-------+---------public | testing | table | postgres
To get more details about our new table we can use \d <table_name> . This will
give us all the columns in this table and their data types. It will also list indexes
and constraints if we have any (covered later in the book).
Column | Type | Modifiers
-------+---------+--------------------------------------------------id | integer | not null default nextval('test_id_seq'::regclass)
name | text |
age | integer |
score | numeric |
active | boolean |
Altering Tables
After we have created some tables we may want to change them. For example,
we may want to add a new column or rename an existing table.
An ALTER TABLE query is what we need in those cases. Here are some
examples.
Adding a column:
ALTER TABLE <table_name> ADD COLUMN <column_name> <column_type>;
Renaming a column:
ALTER TABLE <table_name> RENAME COLUMN <column_name> TO <new_column_name>;
Renaming a table:
ALTER TABLE <table_name> RENAME TO <new_table_name>;
Inserting Data
A database isnt too useful unless we put data in it, so lets add some data!
This is what an INSERT query looks like:
INSERT INTO <table> VALUES (<values>);
We need to provide a value for all the columns in the correct order. For an autoincrement field like id we can just say default and postgres will do the right
thing.
For example, we can add a new user to our database like this:
INSERT INTO users VALUES (default, 'Peter', 'peter@gmail.com', 30, 10);
If you just want to set certain fields you can use this syntax:
INSERT INTO <table> (<column list>) VALUES (<values>);
If we want all the columns we can use an asterisk * , but this is generally
discouraged. Often we dont need every column and we can just fetch the ones
we want.
For example, to get all the country names and their id we can run this query:
SELECT id, name FROM countries;
Conditions are very similar from those you would use in Ruby, the main difference
is that you use a single equals sign to test for equality.
Here are some examples:
SELECT <columns> FROM <table> WHERE <column> > <value>
SELECT <columns> FROM <table> WHERE <column> < <value>
SELECT <columns> FROM <table> WHERE <column> = <value>
SELECT <columns> FROM <table> WHERE <column> != <value>
You can also check for multiple conditions at the same time by using AND :
SELECT * FROM users
WHERE age > 30 AND name != 'Curtis';
Beyond WHERE
The WHERE clause is nice, but there are related keywords that can supplement it:
BETWEEN, IN and LIKE. Lets see an example of each.
Find rows in a certain range:
... WHERE age BETWEEN 0 AND 100
Order!
Another common database operation is sorting. In SQL we use ORDER BY and a
column name to get sorted results.
This is the general form for ORDER BY:
SELECT <columns> FROM <table> ORDER BY <column> <ASC/DESC>
Changing Data
Updates
This is what an update query looks like:
UPDATE <table> SET <column> = <new_value> WHERE <condition>
If you want to add to the current value (on integer columns) you can do this:
UPDATE <table> SET <column> = <column> + <new_value> WHERE <condition>
Deletes
DELETE FROM <table> WHERE <condition>
When you run a DELETE query from psql you will get a report of the amount of
rows that were deleted. For example if the query didnt delete any records you will
see:
DELETE 0
Note: Pay special attention when writing a DELETE query. Dont miss that
WHERE clause or you will end up deleting the whole table!
Data Aggregation
The database is not limited to just getting rows out, it can also aggregate data for
us. For example calculating the sum for all the values in one column.
Maximum value:
SELECT max(age) FROM users;
Counting Rows
Counting rows is another kind of aggregation we may want to do, in PostgreSQL
this operation is slow. In a Rails app we can mitigate this by using a counter cache
(which is an extra column on the table we want a count for).
This will give us the total amount of users in our database.
SELECT count(id) FROM users;
We can combine this with a WHERE clause to only count those rows that meet
certain criteria.
Group By
Grouping is a very interesting operation, with grouping we can (for example) get a
summary of the values for some column on our database.
This query will tell us how many countries we have from every continent:
SELECT
count(id) as count,
continent
FROM countries
GROUP BY continent
ORDER BY count DESC;
Isnt that cool? One more thing: to filter results by the grouped column you will
need to use the HAVING keyword.
For example, to only see continents which have more than 5 countries:
SELECT
count(id) as count,
continent
FROM countries
GROUP BY continent
HAVING count(id) > 5
ORDER BY count DESC;
This will only return Europe for our example database, since its the only one with
more than 5 countries listed.
The problem here is that there are multiples values for the created_at column, the
database doesnt know which to pick, so instead of just picking one at random it
throws that error.
You need to tell the database how to handle this via an aggregation function. For
example we may want the maximum value for this column, which in the case of a
time column it means the newest one.
This query should fix the error:
SELECT content, max(created_at) FROM tags GROUP BY content;
Integrity Constraints
Over time our data will grow and unless we take good care of it there is a good
chance that it will become inconsistent. Using constraints we can define a few
rules that will help keep our data under control.
Note: constraints are different from application-level validations.
If we try to insert a null value after enabling the constraint we will get this error:
ERROR: null value in column "password" violates not-null constraint
Another way to deal with this issue is to set default values. For example, setting
the default as 0 for an integer column may be a good idea.
ALTER TABLE <table_name> ALTER COLUMN <column> SET DEFAULT <default_value>
One more thing that constraints can help us avoid is having duplicate data. We
can accomplish this by declaring an UNIQUE constraint.
ALTER TABLE <table_name> ADD CONSTRAINT <contrain_name> UNIQUE (<columns>)
Check Constraints
Another type of constraint is the CHECK constrain. CHECK constraints allow us to
validate data before it makes its way into the database. For example, if we have
an age column we may want to make sure that it cant be negative.
ALTER TABLE users ADD CONSTRAINT postive_age CHECK (age > 0);
If we check the table details now using \d users we should see our new
constraint listed:
Check constraints:
"postive_age" CHECK (age > 0)
Joining Tables
Table relationships are the reason we call PostgreSQL a relational database.
Relations are not explicitly declared in our database (other than having foreign
keys, which we will talk about later).
What they allow us to do is to separate data into different tables (for ex. A user
has many orders and an order has many products).
A note on semantics: traditional SQL literature uses the term relation to refer
to what we call tables.
Joining Data
To put together data from two or more related tables we will have to use joins.
There are different types of joins we can use. For example, the most basic join is
performed via a WHERE clause.
Here is the syntax:
SELECT * FROM <table1, table2, tableN>
WHERE <table1.id = table2.table1_id>
Suppose we store our users country in a separate table. Using this query we can
retrieve the actual user country name, instead of the country id.
SELECT users.name, countries.name as country
FROM users, countries
WHERE users.country_id = countries.id;
Another type of JOIN is the INNER JOIN. This join will return rows that match the
condition in both tables. Here is the same example but using the JOIN clause.
The output is the same because both types of joins are very similar.
Other types of joins include: LEFT OUTER JOIN , RIGHT OUTER JOIN , FULL OUTER JOIN
and the CROSS JOIN (join every row in one table with every row on the other, this
produces 600 rows on our example users table!).
SELECT * FROM users CROSS JOIN countries;
The main difference between the types of joins is on the rows that are returned.
Most of the time your standard INNER JOIN will be what you want.
Types of Associations
There are 3 different ways that a pair of tables can be related:
One-to-one
One-to-many
Many-to-many
The one-to-one relationship is probably the less common out of the three. Every
row has exactly one corresponding row on the other table. In ActiveRecord we
use belongs_to for the model with the foreign key and has_one for the
associated model.
The one-to-many association is more common and we have an example of it in
the schema included with this book. Users belong to one country, but countries
have many users.
In ActiveRecord terms:
Country -> has_many :users
User -> belongs_to :country
In ActiveRecord terms:
Then run bundle install and once its installed you can run it with bundle exec
erd to generate a visualization of your database schema.
Loading application in 'country-data'...
Generating entity-relationship diagram for 2 models
Diagram saved to 'erd.pdf'.
This is what the output looks like for a simple app with 2 models:
You dont need to worry about the details, but its still good to know how it works.
Other types of index available in postgres are: Hash, GiST and GIN.
Most of the time you just want B-tree indexes, which are the default. For
advanced data types (like Array, hstore and JSON) you will need a GIN index. A
GIN index supports fields with multiple values.
Notice the Indexes section at the end. Every line tells us: the name of the index,
the type of index and the indexed column/s.
Unfortunately you wont be able to see it in action with our little test database
since it requires at least 10.000 rows and some database activity. If you have this
problem in your own database you will have to investigate what are the most
popular queries to see if an index could speed them up (this is covered in the
database administration section).
Once you know what you need you can create the index like this:
CREATE INDEX <index_name> ON <table> (<columns>);
Please be aware that creating an index will block all write operations on the table.
Creating an index on a large table can take a long time so you may want to
schedule a maintenance period.
There is also an option to create an index without blocking writes, using the
CONCURRENTLY option.
CREATE INDEX CONCURRENTLY <index_name> ON <table> (<columns>);
If you choose that option, be aware that the operation can be interrupted and
generate an invalid index, just drop the index and start again if that happens. You
will also have increased server load during the operation.
Index selectivity
One concept we need to be familiar with when thinking about indexes is index
selectivity. The selectivity of an index is the number of different values the
candidate column can have divided by the total number of rows. The perfect
We are looking for a selectivity ratio of at least 0.80. If its lower than that we dont
want to create an index in most cases.
Removing indexes
Indexes are cool and all, but having too many of them can be disastrous for your
performance. Every time you insert or update a record the indexes have to be
updated too.
This is how you drop an index:
DROP INDEX <index_name>;
Good candidates for removal are: low usage indexes (you can find this with
pghero_unused_indexes or inspecting the pg_stat_user_indexes table), low
selectivity indexes.
Explain it to me!
Slow database queries can make your application perform very slowly. The good
news is that there are ways to analyze and find out which queries are being slow
and why.
If we want to analyze a query we have to append EXPLAIN in front of it. Postgres
will not run this query but it will give us the query plan. This plan is composed of
the operations postgres needs to do to execute the query and get our data ready.
Postgres is also able to calculate an estimated cost and estimated amount of rows
that need to be read.
Here is the output of explain for a simple select query (EXPLAIN SELECT *
FROM users):
QUERY PLAN
------------------------------------------------------Seq Scan on users (cost=0.00..1.40 rows=40 width=92)
(1 row)
The explain output is composed of what postgres calls nodes. Every node is an
operation on the database: scanning a table, scanning an index, sorting, filtering
output by a where clause, etc.
Every operation has some estimated cost, rows and width (average size in bytes
of every row).To optimize a query we want to look for high-cost operations.
Using EXPLAIN ANALYZE we can get more accurate results, since the query will
actually be run and real stats can be gathered.
This is the query plan for the same query, but this time using EXPLAIN ANALYZE :
QUERY PLAN
-------------------------------------------------------------Seq Scan on users (cost=0.00..1.40 rows=40 width=92)
(actual time=0.009..0.066 rows=40 loops=1)
Planning time: 0.277 ms
Execution time: 0.150 ms
(3 rows)
A great site that can help analyze your EXPLAIN queries is explain.depesz.com.
There is a self-hosted version of this site on github if you are worried about
sharing your query data with an external entity.
The site will parse your query and use color-coding to indicate the slowest parts.
Postgres-Specific Types
PostgreSQL is a very powerful database, to the point that it can substitute what
many people would use a NoSQL database for (schema-less data).
In this chapter we are going to explore the following postgres-specific data types:
ARRAYS, HSTORE and JSON.
This will create a tags array which will contain values of type text. Arrays can be
composed of any postgres built-in type.
Query
When doing a query for an array field you have to use another array to compare
against. There are two main conditions we can check for: exact equality and
inclusion.
Example: find by equality
SELECT * FROM testing WHERE tags = array[1, 2, 3]
Update
To update an array column we have a few options: replacing the entire array,
changing a single value or appending extra values.
Example: using the concatenation operator
UPDATE testing SET tags = tags || array['f', 'g'] WHERE age = 1;
Delete
The array_remove function is the easiest way to delete an element from an array
field. You can see it working using this query (which doesnt change any data).
SELECT array_remove(ARRAY[1,2,3,2], 2)
Indexing arrays
As I mentioned before its possible to index arrays, here is the syntax for that:
CREATE INDEX <index_name> ON <table> USING gin(<column>);
Array functions
There are some array-specific functions we can use to help us. For example,
using the unnest function we can treat an array field like a temporary table /
regular rows.
SELECT unnest(tags) FROM testing WHERE tags is NOT NULL;
Gives us:
unnest
-------a
c
b
(3 rows)
One practical use for unnest is to aggregate array values across the table. This
query gives us a count of every tag we have:
SELECT
unnest(tags),
count(*)
FROM testing
GROUP BY unnest(tags)
ORDER BY 2 DESC;
CRUD operations
Creating a table with an hstore field:
CREATE TABLE products (id serial, attributes hstore);
Querying
The main operations we may want to do with hstore fields are:
Getting the value for a specific key
Find all rows that have a specific key defined
Find all rows which have a specific key/value pair defined
Operation
Description
Get value
hstore ? key
Contains key
Examples:
SELECT attributes -> 'brand' FROM testing;
If you need more info about the hstore type you can find the official documentation
here: http://www.postgresql.org/docs/9.4/static/hstore.html
JSON operations
Here are a couple examples of using a JSONB column.
Adding a JSONB column:
ALTER TABLE testing ADD COLUMN json JSONB;
Getting a value:
SELECT json -> 'test' FROM testing;
FROM testing
WHERE json @> '{"total": 500}'::jsonb
Note: if you want to update a single value you will need some custom
functions, you can find them here:
https://gist.github.com/matugm/f12c5f28d40d83d65a2f#file-json-update-sql
There is no built-in delete operation for json columns.
Database Administration
Once your application is up and running there are some tasks you need to do to
keep it in top condition.
Listing users
In psql we can use \du to get a list of users.
List of roles
Role name | Attributes | Member of
-----------+------------------------------------------------+---------- postgres | Superuser, Create role, Create DB, Replication | {}
In this output we can see that we only have one user (postgres) and that it has
Superuser privileges. Ideally you will want to have a low-privileged user for your
application to use, which will give you better security in case your application
server is compromised or the secrets are leaked.
The pg_hba.conf file in postgres allows you to configure who can connect to the
database and how. There are a number of authentication methods you can
configure. One that I find very useful in a development setup is the trust method.
The trust method will allow you to login without a password as long as the
connection is coming from the configured ip address. To enable this, add or
uncomment this line to your pg_hba file:
host all all 127.0.0.1/32 trust
Then reload your postgres service. Using sudo service reload postgres or sudo
systemctl reload postgres .
Description
trust
reject
md5
password
peer
Configuration
Config files & config info
To find the location of your configuration file you can use the following query:
SHOW config_file;
Another interesting query is this one, which will give you a count of current
configuration parameters by type.
SELECT count(*), source FROM pg_settings GROUP BY source ORDER BY count;
Output:
count | source
-------+--------------------- 1 | environment variable
2 | client
12 | override
19 | configuration file
205 | default
(5 rows)
In my case you can see that Im running mostly default values, with exactly 19
parameters coming from the configuration file.
Configuration Tuning
There are a few parameters in the postgres configuration file that we can tweak
for better performance. Note that the ideal numbers depend on your hardware and
use case.
Setting
Recommended value
work_mem
shared_buffers
effective_cache_size
checkpoint_segments
Logging
Reading the logs can be useful when we are trying to troubleshoot our database
or just doing a regular health check. In Ubuntu the logs can be found under
/var/log/postgresql by default. If you are on a different distribution then a pretty
Database statistics
If you are wondering how your database is doing, then there a some queries you
can run to gather information.
Active queries
Using this simple query you can get a list of the current activity on your database.
Watch out for long-running queries consuming too many resources!
SELECT application_name, query_start, state, query FROM pg_stat_activity;
Output:
-[ RECORD 1 ]----+----------------------------------------------------application_name | bin/rails
query_start | 2015-08-15 02:08:32.775124+02
state | idle
query | SELECT 1
-[ RECORD 2 ]----+----------------------------------------------------application_name | psql
query_start | 2015-08-15 19:50:53.504786+02
state | active
query | SELECT application_name,query_start,state,query FROM
In this case we can see an idle query from rails. This query is used to check that
the database connection is alive, so we dont need to worry about it.
Here is the relevant source code from ActiveRecord:
https://github.com/rails/rails/blob/v4.1.0/activerecord/lib/active_record/connection_
adapters/postgresql_adapter.rb#L587
Table stats
A good thing about postgres is that it keeps a lot of meta-data about its
operations. With this query we can get per-table stats (like the number of inserts
and deletes).
select * from pg_stat_user_tables;
That query will give you a lot of information, but it may be hard to visualize.
Especially if you have a lot of tables. Since this is just like a regular table we can
query and filter it. So I wrote this complicated-looking query which prints a
summary of all your tables.
SELECT *,
round(100 *
(
float4(total_reads) /
float4(total_writes + total_reads))::numeric, 2
) AS read_percent,
round(100 *
(
float4(total_writes) /
float4(total_writes + total_reads))::numeric, 2
) AS write_percent
FROM
(SELECT relname,
n_tup_ins AS inserts,
n_tup_upd AS updates,
n_tup_del AS deletes,
(n_tup_ins + n_tup_upd + n_tup_del) AS total_writes,
(seq_scan + idx_scan) AS total_reads
FROM pg_stat_user_tables) AS T;
Table size
If you want to check out the disk space your tables are using pghero has a
function to help you:
SELECT * FROM pghero_relation_sizes WHERE type = 'table';
Note: for this to work you must have installed pghero as instructed in the
Adding new indexes section.
Alternatively, you can use a bash script to get the size of all your database files.
For this script to work you must be inside the base directory, which is under the
data directory.
One easy one to find your data directory is by using this command: pg_lsclusters
(debian / ubuntu only) or ps -U postgres -f .
du -sh * | while read SIZE OID; do echo "$SIZE `oid2name -q |
tr -s ' ' | grep -E "^ $OID " | cut -f 3 -d ' '`"; done | sort -rn | column
-t
Note: copy & paste both lines, they are part of the same command!
Query stats
Before we can use query stats we have to do some setup first.
1.Add to postgresql.conf:
shared_preload_libraries = 'pg_stat_statements'
2.Reload service
Depending on your system you may need to run:
sudo service reload postgresql
or
sudo systemctl reload postgresql
3.Enable extension
CREATE EXTENSION pg_stat_statements;
4.Get stats
Before you can get some meaningful stats you have to leave your app running
(assuming its on production) for a few hours. Then run this query:
SELECT
total_time as total,
(total_time/calls) as avg, calls, query
FROM pg_stat_statements
ORDER BY total DESC
LIMIT 10;
Backups
You put a lot of effort into your database, but you need to take another step to
make sure you can recover in case of a disaster. Having a backup plan is
important as you may imagine. In this section I will explain how you can create
backups from your database data and how to restore them.
Plain-text Backup
Using the pg_dump utility you can create a backup file of your database. The
format of this file is plain text. You can open it and see how its composed of SQL
commands.
This command will generate the backup as backup-<current_date>.sql :
pg_dump -U postgres > backup-$(date +%Y-%m-%d).sql
The utility has a few useful options. For example, -s lets you get an empty copy
of your tables, this can be useful when setting up the application in a testing
environment. Using -n and -t you can dump specific schemas and tables.
To restore a pg_dump backup:
psql -U postgres -1 database_name < backup.sql
You will need to create the database if it doesnt exist, using the createdb
command.
Tip: always test your recovery procedure in advance (in a dev server with the
same postgres version) to make sure it will work when its needed.
The only problem with pg_dump is that recovery is not very efficient for large
databases (> 1GB). In the next section we will explore another backup method.
Physical Backup
Another way to take a backup of your database is to have a copy of the data files.
You cant just copy the files directly since there are changes happening to them on
a live server.
If you use the pg_basebackup utility it will take care of everything for you, here is
the command:
pg_basebackup -x -P -D backup$(date +%Y-%m-%d) -h localhost -U postgres
Note: you need to have the replication permission set on pg_hba.conf for this
to work
This will produce a base.tar.gz file inside a folder named after the current date.
To restore a backup done with pg_basebackup follow these steps:
1. Stop the postgres service
2. Rename the old data folder to data.old
3. Create a new data folder, make sure it belongs to the postgres user and it
has 0700 as permissions
4. Change into the new data folder and copy over your base.tar.gz backup file
5. Uncompress with tar zxvf base.tar.gz
6. Start the postgres service
After recovery you will see this in the logs:
LOG: database system was interrupted; last known up at 2015-08-23
LOG: redo starts at 0/38000084
LOG: consistent recovery state reached at 0/380000A8
LOG: redo done at 0/380000A8
Note: even if you are using pg_basebackup as your main backup I would still
recommend having a plain-text backup just in case there are any issues with
the recovery process.
Appendix
Installing Postgres
In Ubuntu you can use this command:
sudo apt-get install postgresql postgresql-contrib
An important line here is the locale, you want it to be UTF-8 to avoid encoding
issues. If you run into locale issues two commands to investigate are locale and
locale-gen .
Note: Ubuntu doesnt always ship the latest version of postgres. For example,
in 14.04 LTS the version of postgres is 9.3 which doesnt support the JSONB
data type. The postgres developers maintain an APT repository which
contains the latest version. You can find the instructions here:
https://wiki.postgresql.org/wiki/Apt
A - Atomicity
Changes in the database happen in an atomic way, meaning that either the full
change is applied or none of it.
C - Consistency
Transactions will always leave the database in a valid state. This means (for
example) that you will never have duplicated rows if you have defined a UNIQUE
constraint.
I - Isolation
If multiple database operations are run at the same time this ensures that they will
be executed in the correct order. In postgres this is done via the MVCC
(Multiversion Concurrency Control) system.
You can learn more about MVCC here:
http://www.postgresql.org/docs/9.4/static/mvcc-intro.html
D - Durability
Once an operation is committed it stays like that, even in the event of a system
crash or software error. Postgres can do this thanks to the WAL (Write-Ahead
Log) system.
You can read more about the WAL here:
http://www.postgresql.org/docs/9.4/static/wal-intro.html
Readline Cheatsheet
Psql uses the readline library to read user input, this means that you can use a
few shortcuts to be faster.
Shortcut
Description
CTRL + a
CTRL + e
CTRL + w
CTRL + u
CTRL + k
CTRL + y
CTRL + l
CTRL + r
Tip: readline is very popular so you can use these shortcuts in other
command-line interfaces like Bash.
SQL
Country.all
Country.where(name:
Spain)
User.pluck(:email)
User.max(:age)
User.all.limit(5)
User.create(name: Peter)
User.first.destroy
Tip: using the rails db command inside a rails project will open a psql
session.