|
|
PostgreSQL
Status: Just notes so far
Pagecode: T->2 A->SAml H->trsa[t,a,si]d[t,a,si] C->SA[ccceji]
Last changed: Friday 2010-09-03 [15:24 UTC]
Abstract:
PostgreSQL is a ORDBMS (Object-Relational Database Management System) i.e. it is a system for managing data stored in "relations" (essentially a mathematical term for table) plus, it has the notion of an object-oriented database model. The notion of storing data in tables is so commonplace today that it might seem inherently obvious, but there are a number of other ways of organizing databases. Files and directories on Unix-like operating systems form an example of a hierarchical database. A more modern development is the object-oriented database. Each table is a named collection of rows. Each row of a given table has the same set of named columns, and each column is of a specific data type. Whereas columns have a fixed order in each row, it is important to remember that SQL (Structured Query Language) does not guarantee the order of the rows within the table in any way (although they can be explicitly sorted for display). Tables are grouped into databases, and a collection of databases managed by a single PostgreSQL server instance constitutes a database cluster.
|
Table of Contents
|
WRITEME
Quickstart
- creatuser
- createdb, dropdb
Introduction
- http://www.postgresql.org/about/advantages
- http://www.postgresql.org/docs/8.4/interactive/acronyms.html
- http://www.postgresql.org/docs/8.4/interactive/tutorial-arch.html
- The database server program is called
postgres. The
first/core/main database of any PostgreSQL cluster is also called
postgres. See
http://www.postgresql.org/docs/8.4/interactive/manage-ag-templatedbs.html
- We could for example run several PostgresSQL clusters (a server
process with its databases is called cluster) on the same
machine, each cluster having it is own
postgres main database.
- As is typical of client/server applications, the client and the
server can be on different hosts. In that case they communicate
over a TCP/IP network connection.
- The PostgreSQL server can handle multiple concurrent connections
from clients. To achieve this it starts ("forks") a new process
for each connection. From that point on, the client and the new
server process communicate without intervention by the original
postgres process. Thus, the master server process is always
running, waiting for client connections, whereas client and
associated server processes come and go. (All of this is of course
invisible to the user. We only mention it here for completeness.)
- A running PostgreSQL server can manage many databases. Typically,
a separate database is used for each project or for each user.
- PostgreSQL user accounts are distinct from operating system user
accounts. You will need to become the operating system user under
which PostgreSQL was installed (usually postgres) to create the
first user account.
- As an explanation for why this works: PostgreSQL user names are
separate from operating system user accounts. When you connect to
a database, you can choose what PostgreSQL user name to connect
as; if you don't, it will default to the same name as your
current operating system account. As it happens, there will
always be a PostgreSQL user account that has the same name as the
operating system user that started the server, and it also
happens that that user always has permission to create databases.
Instead of logging in as that user you can also specify the -U
option everywhere to select a PostgreSQL user name to connect as.
- http://www.postgresql.org/docs/8.4/interactive/user-manag.html
- From /etc/postgresql/8.4/main/posgresql.conf: Re-commenting a setting is NOT sufficient
to revert it to the default value; you need to reload the server.
This file is read on server startup and when the server receives a
SIGHUP signal. If you edit the file on a running system, you have
to SIGHUP the server for the changes to take effect, or use
"pg_ctl reload". Some parameters, which are marked below, require
a server shutdown and restart to take effect.
Installation and Configuration
- PGUSER Default connection parameters;
PGHOST environment variable
to the name of the database server machine. The environment
variable PGPORT
- All commands shipped by
postgresql-common have detailed manpages.
See postgresql-common(7) for the documentation of the database
client program wrapping, and user_clusters(5) and postgresqlrc(5)
for the cluster configuration.
- The documentation of the database server and client functions, SQL
commands, modules, etc. documented is shipped in the per-version
packages
postgresql-doc-<version>.
Resources
Memory
wks:~# sysctl -a 2> /dev/null | grep kernel.sh
kernel.shmmax = 536870912
kernel.shmall = 33554432
kernel.shmmni = 4096
wks:~#
Administration
- Before you can do anything, you must initialize a database storage
area on disk. We call this a database cluster. (SQL uses the term
catalog cluster.) A database cluster is a collection of databases
that is managed by a single instance of a running database server.
- After initialization, a database cluster will contain a database
named
postgres, which is meant as a default database for use by
utilities, users and third party applications. The database server
itself does not require the postgres database to exist, but many
external utility programs assume it exists.
- Another database created within each cluster during initialization
is called
template1. As the name suggests, this will be used as a
template for subsequently created databases; it should not be used
for actual work.
- Port assignment: Please note that the
pg_* tools automatically
manage the server ports unless you specify them manually. The first
cluster which is ever created (by any major version) will run on
the default port 5432, and each new cluster will use the next
higher free one. Please use pg_lsclusters for displaying the
cluster <-> port mapping, and please have a look at the
pg_createcluster manpage (the --port option) for details.
- In filesystem terms, a database cluster will be a single directory
under which all data will be stored. We call this the data
directory or data area. Because the data directory contains all the
data stored in the database, it is essential that it be secured
from unauthorized access.
- Locale: initdb also initializes the default locale for the database
cluster. Normally, it will just take the locale settings in the
environment and apply them to the initialized database. It is
possible to specify a different locale for the database; more
information about that can be found in Section 22.1. The default
sort order used within the particular database cluster is set by
initdb, and while you can create new databases using different sort
order, the order used in the template databases that initdb creates
cannot be changed without dropping and recreating them. There is
also a performance impact for using locales other than C or POSIX.
Therefore, it is important to make this choice correctly the first
time.
initdb also sets the default character set encoding for the
database cluster. Normally this should be chosen to match the
locale setting.
pg_createcluster vs. initdb: Please note that you can of course
also use the upstream tools for creating clusters, such as initdb.
However, please note that in this case you cannot expect any of
above pg_* tools to work, since they use different configuration
settings (SSL, data directories, etc.) and file locations (e. g.
/etc/postgresql/8.3/main/postgresql.conf). If in doubt, then do not
use initdb, but only pg_createcluster. Since merely installing
postgresql-X.Y will already set up a default cluster which is ready
to work, most people do not need to bother about initdb or
pg_createcluster at all.
- The database server program is called
postgres
- It is best not to use
SIGKILL to shut down the server. Doing so
will prevent the server from releasing shared memory and
semaphores, which might then have to be done manually before a new
server can be started. Furthermore, SIGKILL kills the postgres
process without letting it relay the signal to its subprocesses, so
it will be necessary to kill the individual subprocesses by hand as
well.
- PostgreSQL offers encryption at several levels, and provides
flexibility in protecting data from disclosure due to database
server theft, unscrupulous administrators, and insecure networks.
Encryption might also be required to secure sensitive data such as
medical records or financial transactions.
http://www.postgresql.org/docs/8.4/interactive/encryption-options.html
- Secure TCP/IP Connections with SSL: PostgreSQL has native support
for using SSL connections to encrypt client/server communications
for increased security. This requires that OpenSSL is installed on
both client and server systems and that support in PostgreSQL is
enabled at build time
- One way to set these parameters is to edit the file
postgresql.conf, which is normally kept in the data directory.
- A second way to set these configuration parameters is to give them
as a command-line option to the postgres command, such as:
postgres -c log_connections=yes -c
log_destination='syslog'; Command-line options override any
conflicting settings in postgresql.conf. Note that this means you
won't be able to change the value on-the-fly by editing
postgresql.conf, so while the command-line method might be
convenient, it can cost you flexibility later.
- Furthermore, it is possible to assign a set of parameter settings
to a user or a database. Whenever a session is started, the default
settings for the user and database involved are loaded. The
commands ALTER USER and ALTER DATABASE, respectively, are used to
configure these settings. Per-database settings override anything
received from the postgres command-line or the configuration file,
and in turn are overridden by per-user settings; both are
overridden by per-session settings.
- In addition to parameter settings, the
postgresql.conf file can
contain include directives, which specify another file to read and
process as if it were inserted into the configuration file at this
point. Include directives simply look like: include 'filename'; If
the file name is not an absolute path, it is taken as relative to
the directory containing the referencing configuration file.
Inclusions can be nested.
- The configuration file is reread whenever the main server process
receives a SIGHUP signal (which is most easily sent by means of
pg_ctl reload). The main server process also propagates this signal
to all currently running server processes so that existing sessions
also get the new value. Alternatively, you can send the signal to a
single server process directly.
- Some parameters can only be set at server start; any changes to
their entries in the configuration file will be ignored until the
server is restarted.
shared_preload_libraries use that one to load PostgreSQL procedural
language libraries for example. By preloading a shared library, the
library startup time is avoided when the library is first used.
However, the time to start each new server process might increase
slightly, even if that process never uses the library. So this
parameter is recommended only for libraries that will be used in
most sessions.
- Background Writer: There is a separate server process called the
background writer, whose function is to issue writes of "dirty"
shared buffers. The intent is that server processes handling user
queries should seldom or never have to wait for a write to occur,
because the background writer will do it. However there is a net
overall increase in I/O load, because a repeatedly-dirtied page
might otherwise be written only once per checkpoint interval, but
the background writer might write it several times in the same
interval. The parameters discussed in this subsection can be used
to tune the behavior for local needs.
- logging:
log_connections: Note: Some client programs, like psql,
attempt to connect twice while determining if a password is
required, so duplicate "connection received" messages do not
necessarily indicate a problem.
Client Authentication:
- PostgreSQL offers a number of different client authentication
methods. The method used to authenticate a particular client
connection can be selected on the basis of (client) host address,
database, and user.
- Client authentication is controlled by a configuration file, which
traditionally is named
pg_hba.conf and is stored in the database
cluster's data directory. (HBA stands for host-based
authentication.) A default pg_hba.conf file is installed when the
data directory is initialized by initdb
- A record is made up of a number of fields which are separated by
spaces and/or tabs. Fields can contain white space if the field
value is quoted. Records cannot be continued across lines.
- Note: Remote TCP/IP connections will not be possible unless the
server is started with an appropriate value for the
listen_addresses configuration parameter, since the default
behavior is to listen for TCP/IP connections only on the local
loopback address localhost.
- there is no real distinction between users and groups in
PostgreSQL; a + mark really means "match any of the roles that are
directly or indirectly members of this role", while a name without
a + mark matches only that specific role
- in config files, files can be included by the
@ constructs
- Since the pg_hba.conf records are examined sequentially for each
connection attempt, the order of the records is significant.
Typically, earlier records will have tight connection match
parameters and weaker authentication methods, while later records
will have looser match parameters and stronger authentication
methods. For example, one might wish to use trust authentication
for local TCP/IP connections but require a password for remote
TCP/IP connections. In this case a record specifying trust
authentication for connections from 127.0.0.1 would appear before a
record specifying password authentication for a wider range of
allowed client IP addresses.
- Tip: To connect to a particular database, a user must not only pass
the pg_hba.conf checks, but must have the
CONNECT privilege for the
database. If you wish to restrict which users can connect to which
databases, it's usually easier to control this by granting/revoking
CONNECT privilege than to put the rules into pg_hba.conf entries.
- Username maps: To use username mapping, specify
map=map-name in the options field in pg_hba.conf.
- There is no restriction regarding how many database users a given
operating system user can correspond to, nor vice versa. Thus,
entries in a map should be thought of as meaning "this operating
system user is allowed to connect as this database user", rather
than implying that they are equivalent.
- Authentication Methods, Password Authentication: PostgreSQL
database passwords are separate from operating system user
passwords. The password for each database user is stored in the
pg_authid system catalog.
- Passwords can be managed with the SQL commands
CREATE USER and
ALTER USER, e.g., CREATE USER foo WITH PASSWORD 'secret';.
- By default, that is, if no password has been set up, the stored
password is null and password authentication will always fail for
that user.
Database Roles and Privileges
- PostgreSQL manages database access permissions using the concept of
roles. A role can be thought of as either a database user, or a
group of database users, depending on how the role is set up. Roles
can own database objects (for example, tables) and can assign
privileges on those objects to other roles to control who has
access to which objects. Furthermore, it is possible to grant
membership in a role to another role, thus allowing the member role
use of privileges assigned to the role it is a member of.
- The concept of roles subsumes the concepts of "users" and "groups".
In PostgreSQL versions before 8.1, users and groups were distinct
kinds of entities, but now there are only roles. Any role can act
as a user, a group, or both.
- Database roles are global across a database cluster installation
(and not per individual database).
- the sql commands
CREATE ROLE name; and DROP ROLE name; can be used.
For convenience, the programs createuser and dropuser are provided
as wrappers around these SQL commands that can be called from the
shell command line: createuser name and dropuser name
- Every connection to the database server is made in the name of some
particular role, and this role determines the initial access
privileges for commands issued on that connection.
- The role name to use for a particular database connection is
indicated by the client that is initiating the connection request
in an application-specific fashion. For example, the psql program
uses the
-U command line option to indicate the role to connect as.
- Many applications assume the name of the current operating system
user by default (including createuser and psql). Therefore it is
often convenient to maintain a naming correspondence between roles
and operating system users.
- Role Attributes: A database role can have a number of attributes
that define its privileges and interact with the client
authentication system. see
http://www.postgresql.org/docs/8.4/interactive/role-attributes.html
- Database passwords are separate from operating system passwords.
Specify a password upon role creation with
CREATE ROLE name
PASSWORD 'string'.
- A role's attributes can be modified after creation with
ALTER
ROLE.
- It is good practice to create a role that has the CREATEDB and
CREATEROLE privileges, but is not a superuser, and then use this
role for all routine management of databases and roles. This
approach avoids the dangers of operating as a superuser for tasks
that do not really require it.
- To remove a role-specific default setting, use
ALTER ROLE rolename
RESET varname;
- Privileges: When an object is created, it is assigned an owner. The
owner is normally the role that executed the creation statement.
For most kinds of objects, the initial state is that only the owner
(or a superuser) can do anything with the object. To allow other
roles to use it, privileges must be granted. There are several
different kinds of privilege:
SELECT, INSERT, UPDATE, DELETE,
TRUNCATE, REFERENCES, TRIGGER, CREATE, CONNECT, TEMPORARY, EXECUTE,
and USAGE. For more information on the different types of
privileges supported by PostgreSQL, see the GRANT reference page.
- Role Membership: It is frequently convenient to group users
together to ease management of privileges: that way, privileges can
be granted to, or revoked from, a group as a whole. In PostgreSQL
this is done by creating a role that represents the group, and then
granting membership in the group role to individual user roles.
- To set up a group role, first create the role:
CREATE ROLE name;
Typically a role being used as a group would not have the LOGIN
attribute, though you can set it if you wish.
- The members of a role can use the privileges of the group role in
two ways.
- First, every member of a group can explicitly do SET ROLE to
temporarily "become" the group role. In this state, the database
session has access to the privileges of the group role rather than
the original login role, and any database objects created are
considered owned by the group role not the login role.
- Second, member roles that have the INHERIT attribute automatically
have use of privileges of roles they are members of.
Managing Databases:
- Note: The SQL standard calls databases catalogs, but there is no
difference in practice.
- Every instance of a running PostgreSQL server manages one or more
databases. Databases are therefore the topmost hierarchical level
for organizing SQL objects ("database objects").
- A database is a named collection of SQL objects ("database
objects"). More accurately, a database is a collection of schemas
and the schemas contain the tables, functions, etc. So the full
hierarchy is: server, database, schema, table (or some other kind
of object, such as a function).
- Generally, every database object (tables, functions, etc.) belongs
to one and only one database. (But there are a few system catalogs,
for example pg_database, that belong to a whole cluster and are
accessible from each database within the cluster.)
- When connecting to the database server, a client must specify in
its connection request the name of the database it wants to connect
to. It is not possible to access more than one database per
connection. But an application is not restricted in the number of
connections it opens to the same or other databases.
- Databases are physically separated and access control is managed at
the connection level. If one PostgreSQL server instance is to house
projects or users that should be separate and for the most part
unaware of each other, it is therefore recommendable to put them
into separate databases. If the projects or users are interrelated
and should be able to use each other's resources they should be put
in the same database, but possibly into separate schemas. Schemas
are a purely logical structure and who can access what is managed
by the privilege system.
- Databases are created with the SQL command
CREATE DATABASE
<dbname>; where dbname follows the usual rules for SQL identifiers.
The current role automatically becomes the owner of the new
database. It is the privilege of the owner of a database to remove
it later on (which also removes all the objects in it, even if they
have a different owner).
- Another, probably easier way is using
createdb <dbname>. Note that
createdb without any arguments will create a database with the
current user name, which might or might not be what you want.
- Since you need to be connected to the database server in order to
execute the
CREATE DATABASE command, the question remains how the
first database at any given site can be created. The first database
is always created by the initdb command when the data storage area
is initialized. This database is called postgres. So to create the
first ordinary database you can connect to postgres.
- A second database, template1, is also created by initdb. Whenever a
new database is created within the cluster, template1 is
essentially cloned.
- This means that any changes you make in
template1 are propagated to
all subsequently created databases. Therefore it is unwise to use
template1 for real work, but when used judiciously this feature can
be convenient.
- Sometimes you want to create a database for someone else. That role
should become the owner of the new database, so he can configure
and manage it himself. To achieve that, use one of the following
commands:
CREATE DATABASE dbname OWNER rolename; from the SQL environment,
or
createdb -O rolename dbname from the shell. You must be a
superuser to be allowed to create a database for someone else
(that is, for a role you are not a member of).
CREATE DATABASE actually works by copying an existing database. By
default, it copies the standard system database named template1.
Thus that database is the template from which new databases are
made.
- There is a second standard system database named
template0. This
database contains the same data as the initial contents of
template1, that is, only the standard objects predefined by your
version of PostgreSQL. template0 should never be changed after
initdb. By instructing CREATE DATABASE to copy template0 instead of
template1, you can create a virgin user database that contains none
of the site-local additions in template1. This is particularly
handy when restoring a pg_dump dump — the dump script should be
restored in a virgin database to ensure that one recreates the
correct contents of the dumped database, without any conflicts with
objects that might have been added to template1 later on.
- Another common reason for copying
template0 instead of template1 is
that new encoding and locale settings can be specified when copying
template0, whereas a copy of template1 must use the same settings
it does. This is because template1 might contain encoding-specific
or locale-specific data, while template0 is known not to.
- To create a database by copying
template0, use CREATE DATABASE
dbname TEMPLATE template0; from the SQL environment, or createdb -T
template0 dbname from the shell.
- It is possible to create additional template databases, and indeed
one can copy any database in a cluster by specifying its name as
the template for
CREATE DATABASE. It is important to understand,
however, that this is not (yet) intended as a general-purpose COPY
DATABASE facility. The principal limitation is that no other
sessions can be connected to the source database while it is being
copied. CREATE DATABASE will fail if any other connection exists
when it starts; otherwise, new connections to the source database
are locked out until CREATE DATABASE completes.
- Two useful flags exist in
pg_database for each database — the
columns datistemplate and datallowconn:
datistemplate can be set to indicate that a database is intended
as a template for CREATE DATABASE. If this flag is set, the
database can be cloned by any user with CREATEDB privileges; if it
is not set, only superusers and the owner of the database can
clone it. Both template0 and template1 should always be marked
with datistemplate = true.
- If
datallowconn is false, then no new connections to that database
will be allowed (but existing sessions are not killed simply by
setting the flag false). The template0 database is normally marked
datallowconn = false to prevent modification of it.
template1 and template0 do not have any special status beyond the
fact that the name template1 is the default source database name
for CREATE DATABASE. For example, one could drop template1 and
recreate it from template0 without any ill effects. This course of
action might be advisable if one has carelessly added a bunch of
junk in template1. To delete template1, it must have
pg_database.datistemplate = false.
- The postgres database is also created when a database cluster is
initialized. This database is meant as a default database for users
and applications to connect to. It is simply a copy of template1
and can be dropped and recreated if required.
- Databases are destroyed with the command
DROP DATABASE dbname; Only
the owner of the database, or a superuser, can drop a database.
Dropping a database removes all objects that were contained within
the database. The destruction of a database cannot be undone. For
convenience, there is also a shell program to drop databases dropdb
dbname. Unlike createdb, it is not the default action to drop the
database with the current user name.
- You cannot execute the
DROP DATABASE command while connected to the
victim database. You can, however, be connected to any other
database, including the template1 database. template1 would be the
only option for dropping the last user database of a given cluster.
- Tablespaces in PostgreSQL allow database administrators to define
locations in the file system where the files representing database
objects can be stored. Once created, a tablespace can be referred
to by name when creating database objects. By using tablespaces, an
administrator can control the disk layout of a PostgreSQL
installation. Tablespaces allow an administrator to use knowledge
of the usage pattern of database objects to optimize performance.
For example, an index which is very heavily used can be placed on a
very fast, highly available disk, such as an expensive solid state
device. At the same time a table storing archived data which is
rarely used or not performance critical could be stored on a less
expensive, slower disk system.
- PostgreSQL makes use of symbolic links to simplify the
implementation of tablespaces. This means that tablespaces can be
used only on systems that support symbolic links.
- To define a tablespace, use the
CREATE TABLESPACE command e.g.
CREATE TABLESPACE fastspace LOCATION '/mnt/sda1/postgresql/data';
The location must be an existing, empty directory that is owned by
the PostgreSQL system user. All objects subsequently created within
the tablespace will be stored in files underneath this directory.
- Note: There is usually not much point in making more than one
tablespace per logical file system, since you cannot control the
location of individual files within a logical file system.
However, PostgreSQL does not enforce any such limitation, and
indeed it is not directly aware of the file system boundaries on
your system. It just stores files in the directories you tell it
to use.
- If a database is created without specifying a tablespace for it,
it uses the same tablespace as the template database it is copied
from. Two tablespaces are automatically created by
initdb. The
pg_global tablespace is used for shared system catalogs. The
pg_default tablespace is the default tablespace of the template1
and template0 databases (and, therefore, will be the default
tablespace for other databases as well, unless overridden by a
TABLESPACE clause in CREATE DATABASE).
- Once created, a tablespace can be used from any database, provided
the requesting user has sufficient privilege. This means that a
tablespace cannot be dropped until all objects in all databases
using the tablespace have been removed.
- Tables, indexes, and entire databases can be assigned to particular
tablespaces. To do so, a user with the
CREATE privilege on a given
tablespace must pass the tablespace name as a parameter to the
relevant command. For example, the following creates a table in the
tablespace space1: CREATE TABLE foo(i int) TABLESPACE space1;
Alternatively, use the default_tablespace parameter: SET
default_tablespace = space1; CREATE TABLE foo(i int);
- When default_tablespace is set to anything but an empty string, it
supplies an implicit
TABLESPACE clause for C=REATE TABLE= and
CREATE INDEX commands that do not have an explicit one.
- To remove an empty tablespace, use the
DROP TABLESPACE command.
Localization:
- Do not confuse i18n and l10n with encoding also known as character
sets
- Note that the locale behavior of the server is determined by the
environment variables seen by the server, not by the environment of
any client. Therefore, be careful to configure the correct locale
settings before starting the server. A consequence of this is that
if client and server are set up in different locales, messages
might appear in different languages depending on where they
originated.
- When we speak of inheriting the locale from the execution
environment, this means the following on most operating systems:
For a given locale category, say the collation, the following
environment variables are consulted in this order until one is
found to be set:
LC_ALL, LC_COLLATE (or the variable corresponding
to the respective category), LANG. If none of these environment
variables are set then the locale defaults to C.
- Locale support refers to an application respecting cultural
preferences regarding alphabets, sorting, number formatting, etc.
- PostgreSQL uses the standard ISO C and POSIX locale facilities
provided by the server operating system.
- Locale support is automatically initialized when a database cluster
is created using initdb. initdb will initialize the database
cluster with the locale setting of its execution environment by
default, so if your system is already set to use the locale that
you want in your database cluster then there is nothing else you
need to do.
- There are following locale subcategories
LC_COLLATE, String sort order
LC_CTYPE, Character classification (What is a letter? Its upper-case equivalent?)
LC_MESSAGES, Language of messages
LC_MONETARY, Formatting of currency amounts
LC_NUMERIC, Formatting of numbers
LC_TIME, Formatting of dates and times
- The nature of some locale categories is that their value has to be
fixed when the database is created. You can use different settings
for different databases, but once a database is created, you cannot
change them for that database anymore.
LC_COLLATE and LC_CTYPE are
these categories. They affect the sort order of indexes, so they
must be kept fixed, or indexes on text columns will become corrupt.
- The other locale categories can be changed whenever desired by
setting the server configuration parameters that have the same name
as the locale categories. The values that are chosen by initdb are
actually only written into the configuration file
postgresql.conf
to serve as defaults when the server is started. If you delete
these assignments from postgresql.conf then the server will inherit
the settings from its execution environment.
- Client applications that handle server-side errors by parsing the
text of the error message will obviously have problems when the
server's messages are in a different language. Authors of such
applications are advised to make use of the error code scheme
instead.
- Character Set Support: The character set support in PostgreSQL
allows you to store text in a variety of character sets (also
called encodings), including single-byte character sets such as the
ISO 8859 series and multiple-byte character sets such as EUC
(Extended Unix Code), UTF-8, and Mule internal code.
- The default character set is selected while initializing your
PostgreSQL database cluster using
initdb. It can be overridden when
you create a database, so you can have multiple databases each with
a different character set.
- An important restriction, however, is that each database's
character set must be compatible with the database's
LC_CTYPE and
LC_COLLATE locale settings. For C or POSIX locale, any character
set is allowed, but for other locales there is only one character
set that will work correctly.
- On most modern operating systems, PostgreSQL can determine which
character set is implied by an LC_CTYPE setting, and it will
enforce that only the matching database encoding is used. On older
systems it is your responsibility to ensure that you use the
encoding expected by the locale you have selected. A mistake in
this area is likely to lead to strange misbehavior of
locale-dependent operations such as sorting.
- PostgreSQL supports automatic character set conversion between
server and client for certain character set combinations. The
conversion information is stored in the
pg_conversion system
catalog.
Maintenance
Routine Database Maintenance Tasks:
- http://bucardo.org/check_postgres/ is available for monitoring
database health and reporting unusual conditions.
check_postgres.pl
integrates with Nagios and MRTG, but can be run standalone too.
- PostgreSQL, like any database software, requires that certain tasks
be performed regularly to achieve optimum performance. The tasks
discussed here are required, but they are repetitive in nature and
can easily be automated using standard tools such as cron scripts
or Windows' Task Scheduler. But it is the database administrator's
responsibility to set up appropriate scripts, and to check that
they execute successfully.
- vacuuming
- PostgreSQL databases require periodic maintenance known as
vacuuming. For many installations, it is sufficient to let
vacuuming be performed by the autovacuum daemon
- PostgreSQL's VACUUM command has to process each table on a
regular basis for several reasons:
- To recover or reuse disk space occupied by updated or deleted rows.
- To update data statistics used by the PostgreSQL query planner.
- To protect against loss of very old data due to transaction ID
wraparound.
- There are two variants of
VACUUM: standard VACUUM and VACUUM
FULL. VACUUM FULL can reclaim more disk space but runs much more
slowly. Also, the standard form of VACUUM can run in parallel
with production database operations which is not true for VACUUM
FULL.
VACUUM creates a substantial amount of I/O traffic, which can
cause poor performance for other active sessions. There are
configuration parameters that can be adjusted to reduce the
performance impact of background vacuuming.
- The usual goal of routine vacuuming is to do standard VACUUMs
often enough to avoid needing
VACUUM FULL. The autovacuum daemon
attempts to work this way, and in fact will never issue VACUUM
FULL.
- If you have multiple databases in a cluster, do not forget to
VACUUM each one; the program vacuumdb might be helpful.
- Neither form of VACUUM is entirely satisfactory when a table
contains large numbers of dead row versions as a result of
massive update or delete activity. If you have such a table and
you need to reclaim the excess disk space it occupies, the best
way is to use CLUSTER or one of the table-rewriting variants of
ALTER TABLE. These commands rewrite an entire new copy of the
table and build new indexes for it. Like VACUUM FULL, they
require exclusive lock. Note that they also temporarily use extra
disk space, since the old copies of the table and indexes can't
be released until the new ones are complete. In the worst case
where your disk is nearly full, VACUUM FULL may be the only
workable alternative.
- If you have a table whose entire contents are deleted on a
periodic basis, consider doing it with TRUNCATE rather than using
DELETE followed by VACUUM. TRUNCATE removes the entire content of
the table immediately, without requiring a subsequent VACUUM or
VACUUM FULL to reclaim the now-unused disk space. The
disadvantage is that strict MVCC semantics are violated.
- updating statistics
- The PostgreSQL query planner relies on statistical information
about the contents of tables in order to generate good plans for
queries. These statistics are gathered by the
ANALYZE command,
which can be invoked by itself or as an optional step in VACUUM.
It is important to have reasonably accurate statistics, otherwise
poor choices of plans might degrade database performance.
- The autovacuum daemon, if enabled, will automatically issue
ANALYZE commands whenever the content of a table has changed
sufficiently. However, administrators might prefer to rely on
manually-scheduled ANALYZE operations, particularly if it is
known that update activity on a table will not affect the
statistics of "interesting" columns. The daemon schedules ANALYZE
strictly as a function of the number of rows inserted or updated;
it has no knowledge of whether that will lead to meaningful
statistical changes.
- It is possible to run ANALYZE on specific tables and even just
specific columns of a table, so the flexibility exists to update
some statistics more frequently than others if your application
requires it. In practice, however, it is usually best to just
analyze the entire database, because it is a fast operation.
ANALYZE uses a statistical random sampling of the rows of a table
rather than reading every single row.
- Tip: Although per-column tweaking of ANALYZE frequency might not
be very productive, you might well find it worthwhile to do
per-column adjustment of the level of detail of the statistics
collected by ANALYZE. Columns that are heavily used in WHERE
clauses and have highly irregular data distributions might
require a finer-grain data histogram than other columns. See
ALTER TABLE SET STATISTICS, or change the database-wide default
using the default_statistics_target configuration parameter.
- PostgreSQL has an optional but highly recommended feature called
autovacuum, whose purpose is to automate the execution of VACUUM
and ANALYZE commands.
- The "autovacuum daemon" actually consists of multiple processes.
There is a persistent daemon process, called the autovacuum
launcher, which is in charge of starting autovacuum worker
processes for all databases. The launcher will distribute the
work across time, attempting to start one worker on each database
every autovacuum_naptime seconds. One worker will be launched for
each database, with a maximum of autovacuum_max_workers processes
running at the same time. If there are more than
autovacuum_max_workers databases to be processed, the next
database will be processed as soon as the first worker finishes.
Each worker process will check each table within its database and
execute VACUUM and/or ANALYZE as needed.
- The default thresholds and scale factors are taken from
postgresql.conf, but it is possible to override them on a
table-by-table basis; see Storage Parameters for more
information.
- reindexing
- In PostgreSQL releases before 7.4, periodic reindexing was
frequently necessary to avoid "index bloat", due to lack of
internal space reclamation in B-tree indexes.
- In PostgreSQL 7.4 and later, index pages that have become
completely empty are reclaimed for re-use. There is still a
possibility for inefficient use of space: if all but a few index
keys on a page have been deleted, the page remains allocated. So
a usage pattern in which all but a few keys in each range are
eventually deleted will see poor use of space. For such usage
patterns, periodic reindexing is recommended.
- It might be worthwhile to reindex periodically just to improve
access speed. In some situations it is worthwhile to rebuild
indexes periodically with the REINDEX command.
- log file management
- backups
- As with everything that contains valuable data, PostgreSQL
databases should be backed up regularly. While the procedure is
essentially simple, it is important to have a clear understanding
of the underlying techniques and assumptions. There are three
fundamentally different approaches to backing up PostgreSQL data:
- SQL dump
- File system level backup
- Continuous archiving
- SQL dump: The idea behind this dump method is to generate a text
file with SQL commands that, when fed back to the server, will
recreate the database in the same state as it was at the time of
the dump. PostgreSQL provides the utility program pg_dump for
this purpose. The basic usage of this command is:
pg_dump dbname
> outfile
pg_dump does not operate with special permissions. In particular,
it must have read access to all tables that you want to back up,
so in practice you almost always have to run it as a database
superuser.
- Like any other PostgreSQL client application, pg_dump will by
default connect with the database user name that is equal to the
current operating system user name. To override this, either
specify the -U option or set the environment variable PGUSER.
Remember that pg_dump connections are subject to the normal
client authentication mechanisms
- Dumps created by pg_dump are internally consistent, that is, the
dump represents a snapshot of the database as of the time pg_dump
begins running. pg_dump does not block other operations on the
database while it is working. If your database schema relies on
OIDs (for instance as foreign keys) you must instruct pg_dump to
dump the OIDs as well. To do this, use the -o command line
option.
- The text files created by pg_dump are intended to be read in by
the psql program. The general command form to restore a dump is
psql dbname < infile=. where infile is what you used as outfile
for the pg_dump command. The database dbname will not be created
by this command, so you must create it yourself from template0
before executing psql (e.g., with createdb -T template0 dbname)
- Before restoring a SQL dump, all the users who own objects or
were granted permissions on objects in the dumped database must
already exist. If they do not, then the restore will fail to
recreate the objects with the original ownership and/or
permissions.
- By default, the psql script will continue to execute after an SQL
error is encountered. You might wish to use the following command
at the top of the script to alter that behaviour and have psql
exit with an exit status of 3 if an SQL error occurs:
\set
ON_ERROR_STOP
- you can specify that the whole dump should be restored as a
single transaction, so the restore is either fully completed or
fully rolled back. This mode can be specified by passing the -1
or —single-transaction command-line options to psql. When using
this mode, be aware that even the smallest of errors can rollback
a restore that has already run for many hours. However, that
might still be preferable to manually cleaning up a complex
database after a partially restored dump.
- The ability of pg_dump and psql to write to or read from pipes
makes it possible to dump a database directly from one server to
another, for example:
pg_dump -h host1 dbname \| psql -h host2
dbname
- The dumps produced by pg_dump are relative to template0. This
means that any languages, procedures, etc. added via template1
will also be dumped by pg_dump. As a result, when restoring, if
you are using a customized template1, you must create the empty
database from template0, as in the example above.
pg_dump dumps only a single database at a time, and it does not
dump information about roles or tablespaces (because those are
cluster-wide rather than per-database). To support convenient
dumping of the entire contents of a database cluster, the
pg_dumpall program is provided. pg_dumpall backs up each database
in a given cluster, and also preserves cluster-wide data such as
role and tablespace definitions. The basic usage of this command
is: pg_dumpall > outfile. The resulting dump can be restored
with psql: psql -f infile postgres
pg_dumpall works by emitting commands to re-create roles,
tablespaces, and empty databases, then invoking pg_dump for each
database. This means that while each database will be internally
consistent, the snapshots of different databases might not be
exactly in-sync.
- It is always necessary to have database superuser access when
restoring a
pg_dumpall dump, as that is required to restore the
role and tablespace information.
- If you use tablespaces, be careful that the tablespace paths in
the dump are appropriate for the new installation.
- Handling large databases: Use pg_dump's custom dump format. If
PostgreSQL was built on a system with the zlib compression
library installed, the custom dump format will compress data as
it writes it to the output file. This will produce dump file
sizes similar to using gzip, but it has the added advantage that
tables can be restored selectively. The following command dumps a
database using the custom dump format:
pg_dump -Fc dbname >
filename. A custom-format dump is not a script for psql, but
instead must be restored with pg_restore, for example: pg_restore
-d dbname filename. For very large databases, you might need to
combine split with one of the other two approaches.
- File System Level Backup:
http://www.postgresql.org/docs/8.4/interactive/backup-file.html
- If your database is spread across multiple file systems, there
might not be any way to obtain exactly-simultaneous frozen
snapshots of all the volumes. For example, if your data files and
WAL log are on different disks, or if tablespaces are on
different file systems, it might not be possible to use snapshot
backup because the snapshots must be simultaneous.
- Another option is to use rsync to perform a file system backup.
This is done by first running rsync while the database server is
running, then shutting down the database server just long enough
to do a second rsync. The second rsync will be much quicker than
the first, because it has relatively little data to transfer, and
the end result will be consistent because the server was down.
This method allows a file system backup to be performed with
minimal downtime.
- Continuous Archiving and Point-In-Time Recovery (PITR): At all
times, PostgreSQL maintains a write ahead log (WAL) in the
pg_xlog/ subdirectory of the cluster's data directory. The log
describes every change made to the database's data files. This
log exists primarily for crash-safety purposes: if the system
crashes, the database can be restored to consistency by
"replaying" the log entries made since the last checkpoint.
However, the existence of the log makes it possible to use a
third strategy for backing up databases: we can combine a
file-system-level backup with backup of the WAL files. If
recovery is needed, we restore the backup and then replay from
the backed-up WAL files to bring the backup up to current time.
- We do not need a perfectly consistent backup as the starting
point. Any internal inconsistency in the backup will be
corrected by log replay (this is not significantly different
from what happens during crash recovery). So we don't need file
system snapshot capability, just tar or a similar archiving
tool.
- Since we can string together an indefinitely long sequence of
WAL files for replay, continuous backup can be achieved simply
by continuing to archive the WAL files. This is particularly
valuable for large databases, where it might not be convenient
to take a full backup frequently.
- There is nothing that says we have to replay the WAL entries all
the way to the end. We could stop the replay at any point and
have a consistent snapshot of the database as it was at that
time. Thus, this technique supports point-in-time recovery: it
is possible to restore the database to its state at any time
since your base backup was taken.
- If we continuously feed the series of WAL files to another
machine that has been loaded with the same base backup file, we
have a warm standby system: at any point we can bring up the
second machine and it will have a nearly-current copy of the
database.
- As with the plain file-system-backup technique, this method can
only support restoration of an entire database cluster, not a
subset.
- Also, it requires a lot of archival storage: the base backup
might be bulky, and a busy system will generate many megabytes of
WAL traffic that have to be archived. Still, it is the preferred
backup technique in many situations where high reliability is
needed.
- To recover successfully using continuous archiving (also called
"online backup" by many database vendors), you need a continuous
sequence of archived WAL files that extends back at least as far
as the start time of your backup.
- In an abstract sense, a running PostgreSQL system produces an
indefinitely long sequence of WAL records. The system physically
divides this sequence into WAL segment files, which are normally
16MB apiece
- The segment files are given numeric names that reflect their
position in the abstract WAL sequence. When not using WAL
archiving, the system normally creates just a few segment files
and then "recycles" them by renaming no-longer-needed segment
files to higher segment numbers. When archiving WAL data, we need
to capture the contents of each segment file once it is filled,
and save that data somewhere before the segment file is recycled
for reuse.
- To provide the database administrator with as much flexibility as
possible, PostgreSQL tries not to make any assumptions about how
the archiving will be done. Instead, PostgreSQL lets the
administrator specify a shell command to be executed to copy a
completed segment file to wherever it needs to go. The command
could be as simple as a cp, or it could invoke a complex shell
script — it's all up to you.
archive_command = 'cp -i %p /mnt/server/archivedir/%f
</dev/null' The archive command will be executed under the
ownership of the same user that the PostgreSQL server is running
as.
- Since the series of WAL files being archived contains effectively
everything in your database, you will want to be sure that the
archived data is protected from prying eyes; for example, archive
into a directory that does not have group or world read access.
- It is important that the archive command return zero exit status
if and only if it succeeded. Upon getting a zero result,
PostgreSQL will assume that the file has been successfully
archived, and will remove or recycle it.
- The archive command should generally be designed to refuse to
overwrite any pre-existing archive file. This is an important
safety feature to preserve the integrity of your archive in case
of administrator error (such as sending the output of two
different servers to the same archive directory).
- Warm Standby Servers for High Availability: Continuous archiving
can be used to create a high availability (HA) cluster
configuration with one or more standby servers ready to take over
operations if the primary server fails. This capability is widely
referred to as warm standby or log shipping.
- The primary and standby server work together to provide this
capability, though the servers are only loosely coupled. The
primary server operates in continuous archiving mode, while each
standby server operates in continuous recovery mode, reading the
WAL files from the primary. No changes to the database tables are
required to enable this capability, so it offers low
administration overhead in comparison with some other replication
approaches. This configuration also has relatively low
performance impact on the primary server.
- Directly moving WAL records from one database server to another
is typically described as log shipping. PostgreSQL implements
file-based log shipping, which means that WAL records are
transferred one file (WAL segment) at a time. WAL files (16MB)
can be shipped easily and cheaply over any distance, whether it
be to an adjacent system, another system on the same site or
another system on the far side of the globe. The bandwidth
required for this technique varies according to the transaction
rate of the primary server.
- It should be noted that the log shipping is asynchronous, i.e.,
the WAL records are shipped after transaction commit. As a result
there is a window for data loss should the primary server suffer
a catastrophic failure: transactions not yet shipped will be
lost.
- The standby server is not available for access, since it is
continually performing recovery processing. Recovery performance
is sufficiently good that the standby will typically be only
moments away from full availability once it has been activated.
As a result, we refer to this capability as a warm standby
configuration that offers high availability. Restoring a server
from an archived base backup and rollforward will take
considerably longer, so that technique only offers a solution for
disaster recovery, not high availability.
- the hardware architecture must be the same — shipping from, say,
a 32-bit to a 64-bit system will not work.
- In general, log shipping between servers running different major
PostgreSQL release levels will not be possible. It is the policy
of the PostgreSQL Global Development Group not to make changes to
disk formats during minor release upgrades, so it is likely that
running different minor release levels on primary and standby
servers will work successfully. However, no formal support for
that is offered and you are advised to keep primary and standby
servers at the same release level as much as possible
- There is no special mode required to enable a standby server. The
operations that occur on both primary and standby servers are
entirely normal continuous archiving and recovery tasks. The only
point of contact between the two database servers is the archive
of WAL files that both share: primary writing to the archive,
standby reading from the archive.
- Care must be taken to ensure that WAL archives for separate
primary servers do not become mixed together or confused. The
archive need not be large, if it is only required for the standby
operation.
- The magic that makes the two loosely coupled servers work
together is simply a restore_command used on the standby that,
when asked for the next WAL file, waits for it to become
available from the primary. The restore_command is specified in
the recovery.conf file on the standby server. Normal recovery
processing would request a file from the WAL archive, reporting
failure if the file was unavailable. For standby processing it is
normal for the next WAL file to be unavailable, so we must be
patient and wait for it to appear.
- A working example of a waiting restore_command is provided as a
contrib module named
pg_standby
- PostgreSQL does not provide the system software required to
identify a failure on the primary and notify the standby system
and then the standby database server
- The size of the WAL archive can be minimized by using the %r
option of the restore_command. This option specifies the last
archive file name that needs to be kept to allow the recovery to
restart correctly. This can be used to truncate the archive once
files are no longer required, if the archive is writable from the
standby server.
- If the primary server fails and then immediately restarts, you
must have a mechanism for informing it that it is no longer the
primary. This is sometimes known as STONITH (Shoot the Other Node
In The Head), which is necessary to avoid situations where both
systems think they are the primary, which will lead to confusion
and ultimately data loss.
- Many failover systems use just two systems, the primary and the
standby, connected by some kind of heartbeat mechanism to
continually verify the connectivity between the two and the
viability of the primary.
- Regular switching from primary to standby is useful, since it
allows regular downtime on each system for maintenance. This also
serves as a test of the failover mechanism to ensure that it will
really work when you need it. Written administration procedures
are advised.
- Incrementally Updated Backups: In a warm standby configuration,
it is possible to offload the expense of taking periodic base
backups from the primary server; instead base backups can be made
by backing up a standby server's files. This concept is generally
known as incrementally updated backups, log change accumulation,
or more simply, change accumulation.
- http://www.postgresql.org/docs/8.4/interactive/migration.html and
/usr/share/doc/postgresql-8.4/README.Debian.gz —> Default
clusters and upgrading
High Availability, Load Balancing, and Replication:
- read-only database servers can be combined relatively easily too.
Unfortunately, most database servers have a read/write mix of
requests, and read/write servers are much harder to combine. This
synchronization problem is the fundamental difficulty for servers
working together. Because there is no single solution that
eliminates the impact of the sync problem for all use cases, there
are multiple solutions. Each solution addresses this problem in a
different way, and minimizes its impact for a specific workload.
- Some solutions deal with synchronization by allowing only one
server to modify the data. Servers that can modify data are called
read/write or "master" servers. Servers that can reply to read-only
queries are called "slave" servers. Servers that cannot be accessed
until they are changed to master servers are called "standby"
servers.
- Some solutions are synchronous, meaning that a data-modifying
transaction is not considered committed until all servers have
committed the transaction. This guarantees that a failover will not
lose any data and that all load-balanced servers will return
consistent results no matter which server is queried. In contrast,
asynchronous solutions allow some delay between the time of a
commit and its propagation to the other servers, opening the
possibility that some transactions might be lost in the switch to a
backup server, and that load balanced servers might return slightly
stale results. Asynchronous communication is used when synchronous
would be too slow.
- Solutions can also be categorized by their granularity. Some
solutions can deal only with an entire database server, while
others allow control at the per-table or per-database level.
- Performance must be considered in any choice. There is usually a
trade-off between functionality and performance. For example, a
full synchronous solution over a slow network might cut performance
by more than half, while an asynchronous one might have a minimal
performance impact.
- http://www.postgres-r.org/documentation/terms
Monitoring Database Activity:
- http://www.postgresql.org/docs/8.4/interactive/monitoring.html
- A database administrator frequently wonders, "What is the system
doing right now?" This chapter discusses how to find that out.
Several tools are available for monitoring database activity and
analyzing performance. Most of this chapter is devoted to
describing PostgreSQL's statistics collector, but one should not
neglect regular Unix monitoring programs such as ps, top, iostat,
and vmstat. Also, once one has identified a poorly-performing
query, further investigation might be needed using PostgreSQL's
EXPLAIN command.
- Standard Unix Tools: Unix monitoring programs such as ps, top,
iostat, and vmstat.
- The Statistics Collector: PostgreSQL's statistics collector is a
subsystem that supports collection and reporting of information
about server activity.
- the collector can count accesses to tables and indexes in both
disk-block and individual-row terms. It also tracks total numbers
of rows in each table, and the last vacuum and analyze times for
each table. It can also count calls to user-defined functions and
the total time spent in each one.
- PostgreSQL also supports determining the exact command currently
being executed by other server processes. This is an independent
facility that does not depend on the collector process.
- Since collection of statistics adds some overhead to query
execution, the system can be configured to collect or not collect
information. This is controlled by configuration parameters that
are normally set in
postgresql.conf. Normally these parameters are
set in postgresql.conf so that they apply to all server processes,
but it is possible to turn them on or off in individual sessions
using the SET command.
- The statistics collector communicates with the backends needing
information (including autovacuum) through temporary files. These
files are stored in the pg_stat_tmp subdirectory. When the
postmaster shuts down, a permanent copy of the statistics data is
stored in the global subdirectory.
- When using the statistics to monitor current activity, it is
important to realize that the information does not update
instantaneously. Each individual server process transmits new
statistical counts to the collector just before going idle; so a
query or transaction still in progress does not affect the
displayed totals.
- Viewing Locks: Another useful tool for monitoring database activity
is the pg_locks system table. It allows the database administrator
to view information about the outstanding locks in the lock
manager.
- Dynamic Tracing: PostgreSQL provides facilities to support dynamic
tracing of the database server. This allows an external utility to
be called at specific points in the code and thereby trace
execution. A number of probes or trace points are already inserted
into the source code. These probes are intended to be used by
database developers and administrators. By default the probes are
not compiled into PostgreSQL; the user needs to explicitly tell the
configure script to make the probes available.
Monitoring Disk Usage:
- Determining Disk Usage: Each table has a primary heap disk file
where most of the data is stored. If the table has any columns with
potentially-wide values, there is also a TOAST file associated with
the table, which is used to store values too wide to fit
comfortably in the main table
- There will be one index on the TOAST table, if present. There might
also be indexes associated with the base table. Each table and
index is stored in a separate disk file — possibly more than one
file, if the file would exceed one gigabyte.
- You can monitor disk space from three ways: using SQL functions
listed in Table 9-57, using VACUUM information, and from the
command line using the tools in contrib/oid2name.
- Disk Full Failure: The most important disk monitoring task of a
database administrator is to make sure the disk doesn't grow full.
A filled data disk will not result in data corruption, but it might
prevent useful activity from occurring. If the disk holding the WAL
files grows full, database server panic and consequent shutdown
might occur.
Reliability and the Write-Ahead Log:
- Reliability is an important property of any serious database
system, and PostgreSQL does everything possible to guarantee
reliable operation. One aspect of reliable operation is that all
data recorded by a committed transaction should be stored in a
nonvolatile area that is safe from power loss, operating system
failure, and hardware failure (except failure of the nonvolatile
area itself, of course).
- Successfully writing the data to the computer's permanent storage
(disk drive or equivalent) ordinarily meets this requirement. In
fact, even if a computer is fatally damaged, if the disk drives
survive they can be moved to another computer with similar hardware
and all committed transactions will remain intact.
- While forcing data periodically to the disk platters might seem
like a simple operation, it is not. Because disk drives are
dramatically slower than main memory and CPUs, several layers of
caching exist between the computer's main memory and the disk
platters. First, there is the operating system's buffer cache,
which caches frequently requested disk blocks and combines disk
writes. Fortunately, all operating systems give applications a way
to force writes from the buffer cache to disk, and PostgreSQL uses
those features.
- Next, there might be a cache in the disk drive controller; this is
particularly common on RAID controller cards. Some of these caches
are write-through, meaning writes are passed along to the drive as
soon as they arrive. Others are write-back, meaning data is passed
on to the drive at some later time. Such caches can be a
reliability hazard because the memory in the disk controller cache
is volatile, and will lose its contents in a power failure. Better
controller cards have battery-backed caches, meaning the card has a
battery that maintains power to the cache in case of system power
loss. After power is restored the data will be written to the disk
drives.
- And finally, most disk drives have caches. Some are write-through
while some are write-back, and the same concerns about data loss
exist for write-back drive caches as exist for disk controller
caches. Consumer-grade IDE and SATA drives are particularly likely
to have write-back caches that will not survive a power failure. To
check write caching on Linux use hdparm -I; it is enabled if there
is a * next to Write cache. hdparm -W to turn off write caching.
- Another risk of data loss is posed by the disk platter write
operations themselves. Disk platters are divided into sectors,
commonly 512 bytes each. Every physical read or write operation
processes a whole sector. When a write request arrives at the
drive, it might be for 512 bytes, 1024 bytes, or 8192 bytes, and
the process of writing could fail due to power loss at any time,
meaning some of the 512-byte sectors were written, and others were
not. To guard against such failures, PostgreSQL periodically writes
full page images to permanent storage before modifying the actual
page on disk. By doing this, during crash recovery PostgreSQL can
restore partially-written pages. If you have a battery-backed disk
controller or file-system software that prevents partial page
writes (e.g., ReiserFS 4), you can turn off this page imaging by
using the
full_page_writes parameter.
- Write-Ahead Logging (WAL) is a standard method for ensuring data
integrity. A detailed description can be found in most (if not all)
books about transaction processing. Briefly, WAL's central concept
is that changes to data files (where tables and indexes reside)
must be written only after those changes have been logged, that is,
after log records describing the changes have been flushed to
permanent storage. If we follow this procedure, we do not need to
flush data pages to disk on every transaction commit, because we
know that in the event of a crash we will be able to recover the
database using the log: any changes that have not been applied to
the data pages can be redone from the log records. (This is
roll-forward recovery, also known as REDO.)
- Using WAL results in a significantly reduced number of disk writes,
because only the log file needs to be flushed to disk to guarantee
that a transaction is committed, rather than every data file
changed by the transaction.
- Asynchronous Commit: Asynchronous commit is an option that allows
transactions to complete more quickly, at the cost that the most
recent transactions may be lost if the database should crash. In
many applications this is an acceptable trade-off.
- As described in the previous section, transaction commit is
normally synchronous: the server waits for the transaction's WAL
records to be flushed to permanent storage before returning a
success indication to the client. The client is therefore
guaranteed that a transaction reported to be committed will be
preserved, even in the event of a server crash immediately after.
However, for short transactions this delay is a major component of
the total transaction time. Selecting asynchronous commit mode
means that the server returns success as soon as the transaction is
logically completed, before the WAL records it generated have
actually made their way to disk. This can provide a significant
boost in throughput for small transactions.
- Asynchronous commit introduces the risk of data loss. There is a
short time window between the report of transaction completion to
the client and the time that the transaction is truly committed
(that is, it is guaranteed not to be lost if the server crashes).
The risk that is taken by using asynchronous commit is of data
loss, not data corruption. If the database should crash, it will
recover by replaying WAL up to the last record that was flushed.
The database will therefore be restored to a self-consistent state,
but any transactions that were not yet flushed to disk will not be
reflected in that state.
- Thus asynchronous commit should not be used if the client will take
external actions relying on the assumption that the transaction
will be remembered. As an example, a bank would certainly not use
asynchronous commit for a transaction recording an ATM's dispensing
of cash. But in many scenarios, such as event logging, there is no
need for a strong guarantee of this kind.
- If the database crashes during the risk window between an
asynchronous commit and the writing of the transaction's WAL
records, then changes made during that transaction will be lost.
The duration of the risk window is limited because a background
process (the "WAL writer") flushes unwritten WAL records to disk
every wal_writer_delay milliseconds. The actual maximum duration of
the risk window is three times wal_writer_delay because the WAL
writer is designed to favor writing whole pages at a time during
busy periods.
- An immediate-mode shutdown is equivalent to a server crash, and
will therefore cause loss of any unflushed asynchronous commits.
- Asynchronous commit provides behavior different from setting fsync
= off. fsync is a server-wide setting that will alter the behavior
of all transactions. It disables all logic within PostgreSQL that
attempts to synchronize writes to different portions of the
database, and therefore a system crash (that is, a hardware or
operating system crash, not a failure of PostgreSQL itself) could
result in arbitrarily bad corruption of the database state. In many
scenarios, asynchronous commit provides most of the performance
improvement that could be obtained by turning off fsync, but
without the risk of data corruption.
- WAL Configuration: Checkpoints are points in the sequence of
transactions at which it is guaranteed that the data files have
been updated with all information written before the checkpoint. At
checkpoint time, all dirty data pages are flushed to disk and a
special checkpoint record is written to the log file.
- WAL Internals: WAL is automatically enabled; no action is required
from the administrator except ensuring that the disk-space
requirements for the WAL logs are met, and that any necessary
tuning is done
- WAL logs are stored in the directory
pg_xlog under the data
directory, as a set of segment files, normally each 16 MB in size
(but the size can be changed by altering the --with-wal-segsize
configure option when building the server). Each segment is divided
into pages, normally 8 kB each (this size can be changed via the
--with-wal-blocksize configure option)
- Segment files are given ever-increasing numbers as names, starting
at
000000010000000000000000. The numbers do not wrap, at present,
but it should take a very very long time to exhaust the available
stock of numbers.
- It is of advantage if the log is located on another disk than the
main database files. This can be achieved by moving the directory
pg_xlog to another location (while the server is shut down, of
course) and creating a symbolic link from the original location in
the main data directory to the new location. The aim of WAL, to
ensure that the log is written before database records are altered,
can be subverted by disk drives that falsely report a successful
write to the kernel, when in fact they have only cached the data
and not yet stored it on the disk. A power failure in such a
situation might still lead to irrecoverable data corruption.
Administrators should try to ensure that disks holding PostgreSQL's
WAL log files do not make such false reports.
- After a checkpoint has been made and the log flushed, the
checkpoint's position is saved in the file pg_control. Therefore,
when recovery is to be done, the server first reads pg_control and
then the checkpoint record; then it performs the REDO operation by
scanning forward from the log position indicated in the checkpoint
record.
Regression Tests:
- The regression tests are a comprehensive set of tests for the SQL
implementation in PostgreSQL. They test standard SQL operations as
well as the extended capabilities of PostgreSQL.
- http://www.postgresql.org/docs/8.4/interactive/regress.html
- Some properly installed and fully functional PostgreSQL
installations can "fail" some of these regression tests due to
platform-specific artifacts such as varying floating-point
representation and message wording. The tests are currently
evaluated using a simple diff comparison against the outputs
generated on a reference system, so the results are sensitive to
small system differences.
Performance
- http://samiux.wordpress.com/2009/07/26/howto-performance-tuning-for-postgresql-on-ubuntudebian/
- http://www.postgresql.org/docs/8.4/interactive/runtime-config-resource.html
- from http://www.postgresql.org/docs/8.4/interactive/runtime-config-query.html
- Turning one of these settings off permanently is seldom a good
idea, however. Better ways to improve the quality of the plans
chosen by the optimizer include adjusting the Planner Cost
Constants, running
ANALYZE more frequently, increasing the value
of the default_statistics_target configuration parameter, and
increasing the amount of statistics collected for specific columns
using ALTER TABLE SET STATISTICS.
- http://www.postgresql.org/docs/8.4/interactive/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR
- http://www.postgresql.org/docs/8.4/interactive/runtime-config-wal.html
- from
http://www.postgresql.org/docs/8.4/interactive/runtime-config-query.html,
Planner Cost Constants: random_page_cost (floating point): Sets the
planner's estimate of the cost of a non-sequentially-fetched disk
page. The default is 4.0. Reducing this value relative to
seq_page_cost will cause the system to prefer index scans; raising
it will make index scans look relatively more expensive. You can
raise or lower both values together to change the importance of
disk I/O costs relative to CPU costs, which are described by the
following parameters.
- Tip: Although the system will let you set random_page_cost to less
than seq_page_cost, it is not physically sensible to do so.
However, setting them equal makes sense if the database is
entirely cached in RAM, since in that case there is no penalty for
touching pages out of sequence. Also, in a heavily-cached database
you should lower both values relative to the CPU parameters, since
the cost of fetching a page already in RAM is much smaller than it
would normally be.
- Tablespaces in PostgreSQL allow database administrators to define
locations in the file system where the files representing database
objects can be stored. Once created, a tablespace can be referred
to by name when creating database objects. By using tablespaces, an
administrator can control the disk layout of a PostgreSQL
installation. Tablespaces allow an administrator to use knowledge
of the usage pattern of database objects to optimize performance.
For example, an index which is very heavily used can be placed on a
very fast, highly available disk, such as an expensive solid state
device. At the same time a table storing archived data which is
rarely used or not performance critical could be stored on a less
expensive, slower disk system.
Programming for/with PostgreSQL
- http://www.postgresql.org/docs/8.4/interactive/client-interfaces.html
- http://www.postgresql.org/docs/8.4/interactive/biblio.html#MELT93
- http://www.postgresql.org/docs/8.4/interactive/biblio.html#DATE97
- Two dashes ("—") introduce comments. Whatever follows them is
ignored up to the end of the line. SQL is case insensitive about
key words and identifiers, except when identifiers are
double-quoted to preserve the case
- Constants that are not simple numeric values usually must be
surrounded by single quotes
- Many developers consider explicitly listing the columns better
style than relying on the order implicitly.
- You could also have used COPY to load large amounts of data from
flat-text files. This is usually faster because the COPY command is
optimized for this application while allowing less flexibility than
INSERT. An example would be COPY weather FROM
'/home/user/weather.txt'; where the file name for the source file
must be available to the backend server machine, not the client,
since the backend server reads the file directly.
- To retrieve data from a table, the table is queried. An SQL SELECT
statement is used to do this.
- The statement is divided into a select list (the part that lists
the columns to be returned),
- a table list (the part that lists the tables from which to
retrieve the data), and
- an optional qualification (the part that specifies any
restrictions).
- You can write expressions, not just simple column references, in
the select list.
- A query can be "qualified" by adding a WHERE clause that specifies
which rows are wanted. The WHERE clause contains a Boolean (truth
value) expression, and only rows for which the Boolean expression
is true are returned. The usual Boolean operators (AND, OR, and
NOT) are allowed in the qualification.
- While SELECT * is useful for off-the-cuff queries, it is widely
considered bad style in production code, since adding a column to
the table would change the results.
- Thus far, our queries have only accessed one table at a time.
Queries can access multiple tables at once, or access the same
table in such a way that multiple rows of the table are being
processed at the same time. A query that accesses multiple rows of
the same or different tables at one time is called a join query
- It is widely considered good style to qualify all column names in a
join query, so that the query won't fail if a duplicate column name
is later added to one of the tables.
- types of joins:
- self join
- inner join
- outer join
- left outer join
- right outer join
- full outer join
- An aggregate function computes a single result from multiple input
rows.
- It is important to understand the interaction between aggregates
and SQL's WHERE and HAVING clauses. The fundamental difference
between WHERE and HAVING is this: WHERE selects input rows before
groups and aggregates are computed (thus, it controls which rows go
into the aggregate computation), whereas HAVING selects group rows
after groups and aggregates are computed. Thus, the WHERE clause
must not contain aggregate functions; it makes no sense to try to
use an aggregate to determine which rows will be inputs to the
aggregates. On the other hand, the HAVING clause always contains
aggregate functions. (Strictly speaking, you are allowed to write a
HAVING clause that doesn't use aggregates, but it's seldom useful.
The same condition could be used more efficiently at the WHERE
stage.)
View:
- You can create a view over the query, which gives a name to the
query that you can refer to like an ordinary table. Views allow you
to encapsulate the details of the structure of your tables, which
might change as your application evolves, behind consistent
interfaces. Views can be used in almost any place a real table can
be used. Building views upon other views is not uncommon.
Transaction:
- Transactions are a fundamental concept of all database systems. The
essential point of a transaction is that it bundles multiple steps
into a single, all-or-nothing operation. The intermediate states
between the steps are not visible to other concurrent transactions,
and if some failure occurs that prevents the transaction from
completing, then none of the steps affect the database at all.
- A transaction is said to be atomic: from the point of view of other
transactions, it either happens completely or not at all.
- A transactional database guarantees that all the updates made by a
transaction are logged in permanent storage (i.e., on disk) before
the transaction is reported complete.
- Another important property of transactional databases is closely
related to the notion of atomic updates: when multiple transactions
are running concurrently, each one should not be able to see the
incomplete changes made by others. For example, if one transaction
is busy totalling all the branch balances, it would not do for it
to include the debit from Alice's branch but not the credit to
Bob's branch, nor vice versa. So transactions must be
all-or-nothing not only in terms of their permanent effect on the
database, but also in terms of their visibility as they happen. The
updates made so far by an open transaction are invisible to other
transactions until the transaction completes, whereupon all the
updates become visible simultaneously.
- PostgreSQL actually treats every SQL statement as being executed
within a transaction. If you do not issue a BEGIN command, then
each individual statement has an implicit BEGIN and (if successful)
COMMIT wrapped around it. A group of statements surrounded by BEGIN
and COMMIT is sometimes called a transaction block.
- It's possible to control the statements in a transaction in a more
granular fashion through the use of savepoints. Savepoints allow
you to selectively discard parts of the transaction, while
committing the rest. After defining a savepoint with SAVEPOINT, you
can if needed roll back to the savepoint with ROLLBACK TO. All the
transaction's database changes between defining the savepoint and
rolling back to it are discarded, but changes earlier than the
savepoint are kept.
- After rolling back to a savepoint, it continues to be defined, so
you can roll back to it several times. Conversely, if you are sure
you won't need to roll back to a particular savepoint again, it can
be released, so the system can free some resources. Keep in mind
that either releasing or rolling back to a savepoint will
automatically release all savepoints that were defined after it.
Window Functions:
- A window function performs a calculation across a set of table rows
that are somehow related to the current row. This is comparable to
the type of calculation that can be done with an aggregate
function. But unlike regular aggregate functions, use of a window
function does not cause rows to become grouped into a single output
row — the rows retain their separate identities. Behind the
scenes, the window function is able to access more than just the
current row of the query result.
- A window function call always contains an OVER clause following the
window function's name and argument(s).
- Window functions are permitted only in the SELECT list and the
ORDER BY clause of the query. They are forbidden elsewhere, such as
in GROUP BY, HAVING and WHERE clauses. This is because they
logically execute after the processing of those clauses. Also,
window functions execute after regular aggregate functions. This
means it is valid to include an aggregate function call in the
arguments of a window function, but not vice versa.
SQL
- SQL input consists of a sequence of commands. A command is composed
of a sequence of tokens, terminated by a semicolon (";"). The end
of the input stream also terminates a command. Which tokens are
valid depends on the syntax of the particular command. A token can
be a key word, an identifier, a quoted identifier, a literal (or
constant), or a special character symbol. Tokens are normally
separated by whitespace (space, tab, newline), but need not be if
there is no ambiguity (which is generally only the case if a
special character is adjacent to some other token type).
Additionally, comments can occur in SQL input. They are not tokens,
they are effectively equivalent to whitespace.
- the UPDATE command always requires a SET token to appear in a
certain position
- key words, that is, words that have a fixed meaning in the SQL
language
- identifiers. They identify names of tables, columns, or other
database objects, depending on the command they are used in.
- Key words and identifiers have the same lexical structure, meaning
that one cannot know whether a token is an identifier or a key word
without knowing the language.
- A convention often used is to write key words in upper case and
names in lower case, e.g. UPDATE my_table SET a = 5;
- The Unicode escape syntax works only when the server encoding is
UTF8. When other server encodings are used, only code points in the
ASCII range (up to \007F) can be specified.
- Quoting an identifier also makes it case-sensitive, whereas
unquoted names are always folded to lower case. For example, the
identifiers FOO, foo, and "foo" are considered the same by
PostgreSQL, but "Foo" and "FOO" are different from these three and
each other. (The folding of unquoted names to lower case in
PostgreSQL is incompatible with the SQL standard, which says that
unquoted names should be folded to upper case. Thus, foo should be
equivalent to "FOO" not "foo" according to the standard. If you
want to write portable applications you are advised to always quote
a particular name or never quote it.)
- There are three kinds of implicitly-typed constants in PostgreSQL:
strings, bit strings, and numbers. Constants can also be specified
with explicit types, which can enable more accurate representation
and more efficient handling by the system.
- Although standard_conforming_strings currently defaults to off, the
default will change to on in a future release for improved
standards compliance. Applications are therefore encouraged to
migrate away from using backslash escapes. If you need to use a
backslash escape to represent a special character, write the string
constant with an E to be sure it will be handled the same way in
future releases.
Value Expressions:
- Value expressions are used in a variety of contexts, such as in the
target list of the SELECT command, as new column values in INSERT
or UPDATE, or in search conditions in a number of commands. The
result of a value expression is sometimes called a scalar, to
distinguish it from the result of a table expression (which is a
table). Value expressions are therefore also called scalar
expressions (or even simply expressions). The expression syntax
allows the calculation of values from primitive parts using
arithmetic, logical, set, and other operations.
- A row constructor is an expression that builds a row value (also
called a composite value) using values for its member fields
Data Definition:
- Subsequently, we discuss how tables can be organized into schemas
- The number and order of the columns is fixed, and each column has a
name. The number of rows is variable — it reflects how much data is
stored at a given moment.
- SQL does not make any guarantees about the order of the rows in a
table.
- Furthermore, SQL does not assign unique identifiers to rows, so it
is possible to have several completely identical rows in a table.
- Each column has a data type. The data type constrains the set of
possible values that can be assigned to a column and assigns
semantics to the data stored in the column so that it can be used
for computations.
- When you create many interrelated tables it is wise to choose a
consistent naming pattern for the tables and columns. For instance,
there is a choice of using singular or plural nouns for table
names, both of which are favored by some theorist or other.
- There is a limit on how many columns a table can contain. Depending
on the column types, it is between 250 and 1600. However, defining
a table with anywhere near this many columns is highly unusual and
often a questionable design.
- A column can be assigned a default value. When a new row is created
and no values are specified for some of the columns, those columns
will be filled with their respective default values. A data
manipulation command can also request explicitly that a column be
set to its default value, without having to know what that value
is. If no default value is declared explicitly, the default value
is the null value.
- In general, a unique constraint is violated when there is more than
one row in the table where the values of all of the columns
included in the constraint are equal.
- A table can have at most one primary key. A primary key indicates
that a column or group of columns can be used as a unique
identifier for rows in the table. Note that a unique constraint
does not, by itself, provide a unique identifier because it does
not exclude null values. Relational database theory dictates that
every table must have a primary key. This rule is not enforced by
PostgreSQL, but it is usually best to follow it.
- A foreign key constraint specifies that the values in a column (or
a group of columns) must match the values appearing in some row of
another table. We say this maintains the referential integrity
between two related tables.
- Finally, we should mention that a foreign key must reference
columns that either are a primary key or form a unique constraint.
- When you create a database object, you become its owner. By
default, only the owner of an object can do anything with the
object. In order to allow other users to use it, privileges must be
granted. However, users that have the superuser attribute can
always access any object.
Schemas:
- A PostgreSQL database cluster contains one or more named databases.
Users and groups of users are shared across the entire cluster, but
no other data is shared across databases. Any given client
connection to the server can access only the data in a single
database, the one specified in the connection request.
- Users of a cluster do not necessarily have the privilege to access
every database in the cluster. Sharing of user names means that
there cannot be different users named, say, joe in two databases in
the same cluster; but the system can be configured to allow joe
access to only some of the databases.
- Often you will want to create a schema owned by someone else (since
this is one of the ways to restrict the activities of your users to
well-defined namespaces). The syntax for that is: CREATE SCHEMA
schemaname AUTHORIZATION username;
- Schema names beginning with pg_ are reserved for system purposes
and cannot be created by users.
- In the previous sections we created tables without specifying any
schema names. By default such tables (and other objects) are
automatically put into a schema named "public". Every new database
contains such a schema.
- By default, users cannot access any objects in schemas they do not
own. To allow that, the owner of the schema must grant the USAGE
privilege on the schema. To allow users to make use of the objects
in the schema, additional privileges might need to be granted, as
appropriate for the object.
- Note that by default, everyone has CREATE and USAGE privileges on
the schema public. This allows all users that are able to connect
to a given database to create objects in its public schema.
- In addition to public and user-created schemas, each database
contains a
pg_catalog schema, which contains the system tables and
all the built-in data types, functions, and operators. pg_catalog
is always effectively part of the search path. If it is not named
explicitly in the path then it is implicitly searched before
searching the path's schemas.
- In PostgreSQL versions before 7.3, table names beginning with pg_
were reserved. This is no longer true: you can create such a table
name if you wish, in any non-system schema. However, it's best to
continue to avoid such names, to ensure that you won't suffer a
conflict if some future version defines a system table named the
same as your table.
- http://www.postgresql.org/docs/8.4/interactive/ddl-schemas.html#DDL-SCHEMAS-PATTERNS
Inheritance
- PostgreSQL implements table inheritance, which can be a useful tool
for database designers. (SQL:1999 and later define a type
inheritance feature, which differs in many respects from the
features described here.)
- In PostgreSQL, a table can inherit from zero or more other tables,
and a query can reference either all rows of a table or all rows of
a table plus all of its descendant tables
- Inheritance does not automatically propagate data from INSERT or
COPY commands to other tables in the inheritance hierarchy.
- All check constraints and not-null constraints on a parent table
are automatically inherited by its children. Other types of
constraints (unique, primary key, and foreign key constraints) are
not inherited.
- A table can inherit from more than one parent table, in which case
it has the union of the columns defined by the parent tables.
- If the same column name appears in multiple parent tables, or in
both a parent table and the child's definition, then these columns
are "merged" so that there is only one such column in the child
table. To be merged, columns must have the same data types, else an
error is raised. The merged column will have copies of all the
check constraints coming from any one of the column definitions it
came from, and will be marked not-null if any of them are.
- Table inheritance is typically established when the child table is
created, using the INHERITS clause of the CREATE TABLE statement.
Alternatively, a table which is already defined in a compatible way
can have a new parent relationship added, using the INHERIT variant
of ALTER TABLE. To do this the new child table must already include
columns with the same names and types as the columns of the parent.
It must also include check constraints with the same names and
check expressions as those of the parent. One convenient way to
create a compatible table that will later be made a new child is to
use the LIKE clause in CREATE TABLE. This creates a new table with
the same columns as the source table. If there are any CHECK
constraints defined on the source table, the INCLUDING CONSTRAINTS
option to LIKE should be specified, as the new child must have
constraints matching the parent to be considered compatible.
- Similarly an inheritance link can be removed from a child using the
NO INHERIT variant of ALTER TABLE. Dynamically adding and removing
inheritance links like this can be useful when the inheritance
relationship is being used for table partitioning.
- A parent table cannot be dropped while any of its children remain.
- Neither can columns or check constraints of child tables be dropped
or altered if they are inherited from any parent tables.
- If you wish to remove a table and all of its descendants, one easy
way is to drop the parent table with the CASCADE option.
- ALTER TABLE will propagate any changes in column data definitions
and check constraints down the inheritance hierarchy. Again,
dropping columns that are depended on by other tables is only
possible when using the CASCADE option. ALTER TABLE follows the
same rules for duplicate column merging and rejection that apply
during CREATE TABLE.
- Caveats:
http://www.postgresql.org/docs/8.4/interactive/ddl-inherit.html —>
Caveats
- More generally, note that not all SQL commands are able to work on
inheritance hierarchies.
Partitioning:
- PostgreSQL supports basic table partitioning. Partitioning refers
to splitting what is logically one large table into smaller
physical pieces.
- Currently, PostgreSQL supports partitioning via table inheritance.
Each partition must be created as a child table of a single parent
table. The parent table itself is normally empty; it exists just to
represent the entire data set.
- The following forms of partitioning can be implemented in
PostgreSQL:
- Range Partitioning: The table is partitioned into "ranges" defined
by a key column or set of columns, with no overlap between the
ranges of values assigned to different partitions. For example one
might partition by date ranges, or by ranges of identifiers for
particular business objects.
- List Partitioning: The table is partitioned by explicitly listing
which key values appear in each partition.
- We will refer to the child tables as partitions, though they are in
every way normal PostgreSQL tables.
- Ensure that the constraint_exclusion configuration parameter is not
disabled in postgresql.conf. If it is, queries will not be
optimized as desired.
- Managing Partitions: Another option that is often preferable is to
remove the partition from the partitioned table but retain access
to it as a table in its own right: ALTER TABLE measurement_y2006m02
NO INHERIT measurement; This allows further operations to be
performed on the data before it is dropped. For example, this is
often a useful time to back up the data using COPY, pg_dump, or
similar tools. It might also be a useful time to aggregate data
into smaller formats, perform other data manipulations, or run
reports.
- Partitioning and Constraint Exclusion: Constraint exclusion is a
query optimization technique that improves performance for
partitioned tables
- Note that constraint exclusion is driven only by CHECK constraints,
not by the presence of indexes. Therefore it isn't necessary to
define indexes on the key columns. Whether an index needs to be
created for a given partition depends on whether you expect that
queries that scan the partition will generally scan a large part of
the partition or just a small part. An index will be helpful in the
latter case but not the former.
- Alternative Partitioning Methods: A rule has significantly more
overhead than a trigger, but the overhead is paid once per query
rather than once per row, so this method might be advantageous for
bulk-insert situations. In most cases, however, the trigger method
will offer better performance.
- Another disadvantage of the rule approach is that there is no
simple way to force an error if the set of rules doesn't cover the
insertion date; the data will silently go into the master table
instead
Dependency Tracking:
- When you create complex database structures involving many tables
with foreign key constraints, views, triggers, functions, etc. you
implicitly create a net of dependencies between the objects.
- To ensure the integrity of the entire database structure,
PostgreSQL makes sure that you cannot drop objects that other
objects still depend on.
- Data is conceptually inserted one row at a time. Of course you can
also insert more than one row, but there is no way to insert less
than one row. Even if you know only some column values, a complete
row must be created.
- The modification of data that is already in the database is
referred to as updating.
- You can update individual rows, all the rows in a table, or a
subset of all rows.
- Each column can be updated separately; the other columns are not
affected.
- It is not an error to attempt an update that does not match any
rows.
- You can update more than one column in an UPDATE command by listing
more than one assignment in the SET clause. For example: UPDATE
mytable SET a = 5, b = 3, c = 1 WHERE a > 0;
- Just as adding data is only possible in whole rows, you can only
remove entire rows from a table.
Queries:
- The process of retrieving or the command to retrieve data from a
database is called a query. In SQL the SELECT command is used to
specify queries. The general syntax of the SELECT command is
[WITH
with_queries] SELECT select_list FROM table_expression
[sort_specification]
- The method of retrieval depends on the client application. For
example, the psql program will display an ASCII-art table on the
screen, while client libraries will offer functions to extract
individual values from the query result
- In strict SQL, GROUP BY can only group by columns of the source
table but PostgreSQL extends this to also allow GROUP BY to group
by columns in the select list. Grouping by value expressions
instead of simple column names is also allowed.
- As shown in the previous section, the table expression in the
SELECT command constructs an intermediate virtual table by possibly
combining tables, views, eliminating rows, grouping, etc. This
table is finally passed on to processing by the select list. The
select list determines which columns of the intermediate table are
actually output.
- Column Labels: For protection against possible future keyword
additions, it is recommended that you always either write AS or
double-quote the output column name.
- After the select list has been processed, the result table can
optionally be subject to the elimination of duplicate rows. The
DISTINCT key word is written directly after SELECT to specify this:
SELECT DISTINCT select_list ... Obviously, two rows are considered
distinct if they differ in at least one column value. Null values
are considered equal in this comparison.
- Combining Queries: In order to calculate the union, intersection,
or difference of two queries, the two queries must be "union
compatible", which means that they return the same number of
columns and the corresponding columns have compatible data types
- WITH provides a way to write subqueries for use in a larger SELECT
query. The subqueries can be thought of as defining temporary
tables that exist just for this query.
Numeric Types:
- Specifying NUMERIC without any precision or scale creates a column
in which numeric values of any precision and scale can be stored,
up to the implementation limit on precision. The types decimal and
numeric are equivalent. Both types are part of the SQL standard.
- In addition to ordinary numeric values, the numeric type allows the
special value NaN, meaning "not-a-number". Any operation on NaN
yields another NaN.
- In most implementations of the "not-a-number" concept, NaN is not
considered equal to any other numeric value (including NaN). In
order to allow numeric values to be sorted and used in tree-based
indexes, PostgreSQL treats NaN values as equal, and greater than
all non-NaN values.
- In addition to ordinary numeric values, the floating-point types
have several special values: Infinity, Infinity, NaN
Character Types:
- There is no performance difference among these three types, apart
from increased storage space when using the blank-padded type, and
a few extra CPU cycles to check the length when storing into a
length-constrained column. While character(n) has performance
advantages in some other database systems, there is no such
advantage in PostgreSQL; in fact character(n) is usually the
slowest of the three because of its additional storage costs. In
most situations text or character varying should be used instead.
Binary Data Types:
- A binary string is a sequence of octets (or bytes). Binary strings
are distinguished from character strings in two ways: First, binary
strings specifically allow storing octets of value zero and other
"non-printable" octets (usually, octets outside the range 32 to
126). Character strings disallow zero octets, and also disallow any
other octet values and sequences of octet values that are invalid
according to the database's selected character set encoding.
Second, operations on binary strings process the actual bytes,
whereas the processing of character strings depends on locale
settings. In short, binary strings are appropriate for storing data
that the programmer thinks of as "raw bytes", whereas character
strings are appropriate for storing text.
Time Zones:
- To address these difficulties, we recommend using date/time types
that contain both date and time when using time zones. We do not
recommend using the type time with time zone (though it is
supported by PostgreSQL for legacy applications and for compliance
with the SQL standard). PostgreSQL assumes your local time zone for
any type containing only date or time.
- All timezone-aware dates and times are stored internally in UTC.
They are converted to local time in the zone specified by the
timezone configuration parameter before being displayed to the
client.
Boolean Types:
- boolean values are output using the letters t and f
- boolean uses 1 byte of storage.
Enum:
- Enum labels are case sensitive, so 'happy' is not the same as 'HAPPY
- An enum value occupies four bytes on disk. The length of an enum
value's textual label is limited by the NAMEDATALEN setting
compiled into PostgreSQL; in standard builds this means at most 63
bytes.
Arrays:
- Arrays are not sets; searching for specific array elements can be a
sign of database misdesign. Consider using a separate table with a
row for each item that would be an array element. This will be
easier to search, and is likely to scale better for a large number
of elements.
Composite Types:
- A composite type represents the structure of a row or record; it is
essentially just a list of field names and their data types.
PostgreSQL allows composite types to be used in many of the same
ways that simple types can be used. For example, a column of a
table can be declared to be of a composite type.
- Note: Remember that what you write in an SQL command will first be
interpreted as a string literal, and then as a composite. This
doubles the number of backslashes you need (assuming escape string
syntax is used). For example, to insert a text field containing a
double quote and a backslash in a composite value, you'd need to
write:
INSERT ... VALUES (E'("\\"\\\\")'); The string-literal
processor removes one level of backslashes, so that what arrives at
the composite-value parser looks like ("\"\\"). In turn, the string
fed to the text data type's input routine becomes "\. (If we were
working with a data type whose input routine also treated
backslashes specially, bytea for example, we might need as many as
eight backslashes in the command to get one backslash into the
stored composite field.) Dollar quoting (see Section 4.1.2.4) can
be used to avoid the need to double backslashes.
- Tip: The
ROW constructor syntax is usually easier to work with than
the composite-literal syntax when writing composite values in SQL
commands. In ROW, individual field values are written the same way
they would be written when not members of a composite.
Object Identifier Types:
- Object identifiers (OIDs) are used internally by PostgreSQL as
primary keys for various system tables. OIDs are not added to
user-created tables, unless WITH OIDS is specified when the table
is created, or the default_with_oids configuration variable is
enabled.
- There are also several alias types for oid: regproc, regprocedure,
regoper, regoperator, regclass, regtype, regconfig, and
regdictionary. Table 8-22 shows an overview.
- The oid type is currently implemented as an unsigned four-byte
integer. Therefore, it is not large enough to provide database-wide
uniqueness in large databases, or even in large individual tables.
So, using a user-created table's OID column as a primary key is
discouraged. OIDs are best used only for references to system
tables.
Functions and Operators:
- Do not write
expression = NULL because NULL is not
"equal to" NULL. (The null value represents an unknown value, and
it is not known whether two unknown values are equal.) This
behavior conforms to the SQL standard.
- Some applications might expect that
expression = NULL
returns true if expression evaluates to the null value. It is
highly recommended that these applications be modified to comply
with the SQL standard. However, if that cannot be done the
transform_null_equals configuration variable is available. If it
is enabled, PostgreSQL will convert x = NULL clauses
to x IS NULL.
- If the expression is row-valued, then IS NULL is true when the row
expression itself is null or when all the row's fields are null,
while IS NOT NULL is true when the row expression itself is
non-null and all the row's fields are non-null. Because of this
behavior, IS NULL and IS NOT NULL do not always return inverse
results for row-valued expressions, i.e., a row-valued expression
that contains both NULL and non-null values will return false for
both tests. This definition conforms to the SQL standard, and is a
change from the inconsistent behavior exhibited by PostgreSQL
versions prior to 8.2.
- A null input is treated as the logical value "unknown".
Index:
- To remove an index, use the DROP INDEX command. Indexes can be
added to and removed from tables at any time.
- Once an index is created, no further intervention is required: the
system will update the index when the table is modified, and it
will use the index in queries when it thinks doing so would be more
efficient than a sequential table scan.
- Indexes can also benefit UPDATE and DELETE commands with search
conditions. Indexes can moreover be used in join searches. Thus, an
index defined on a column that is part of a join condition can also
significantly speed up queries with joins.
- Creating an index on a large table can take a long time. By
default, PostgreSQL allows reads (selects) to occur on the table in
parallel with index creation, but writes (INSERTs, UPDATEs,
DELETEs) are blocked until the index build is finished.
- After an index is created, the system has to keep it synchronized
with the table. This adds overhead to data manipulation operations.
Therefore indexes that are seldom or never used in queries should
be removed.
- unique indexes: Indexes can also be used to enforce uniqueness of a
column's value, or the uniqueness of the combined values of more
than one column. The preferred way to add a unique constraint to a
table is ALTER TABLE ... ADD CONSTRAINT. The use of indexes to
enforce unique constraints could be considered an implementation
detail that should not be accessed directly. One should, however,
be aware that there's no need to manually create indexes on unique
columns; doing so would just duplicate the automatically-created
index.
- http://www.postgresql.org/docs/8.4/interactive/locking-indexes.html
Full Text Search:
- A lexeme is a string, just like a token, but it has been normalized
so that different forms of the same word are made alike. For
example, normalization almost always includes folding upper-case
letters to lower-case, and often involves removal of suffixes (such
as s or es in English).
- A text search configuration specifies all options necessary to
transform a document into a tsvector: the parser to use to break
text into tokens, and the dictionaries to use to transform each
token into a lexeme.
- A data type tsvector is provided for storing preprocessed
documents, along with a type tsquery for representing processed
queries
- Full text searches can be accelerated using indexes
- What Is a Document? A document is the unit of searching in a full
text search system; for example, a magazine article or email
message
- Ispell Dictionary; The Ispell dictionary template supports
morphological dictionaries, which can normalize many different
linguistic forms of a word into the same lexeme. For example, an
English Ispell dictionary can match all declensions and
conjugations of the search term bank, e.g., banking, banked, banks,
banks', and bank's. The standard PostgreSQL distribution does not
include any Ispell configuration files. Dictionaries for a large
number of languages are available from Ispell.
- GiST and GIN Index Types: There are two kinds of indexes that can
be used to speed up full text searches. Note that indexes are not
mandatory for full text searching, but in cases where a column is
searched on a regular basis, an index is usually desirable.
- limitations for 8.4 are
http://www.postgresql.org/docs/8.4/interactive/textsearch-limitations.html
Concurrency Control:
- http://www.postgresql.org/docs/8.4/interactive/mvcc-intro.html
- The main advantage of using the MVCC model of concurrency control
rather than locking is that in MVCC locks acquired for querying
(reading) data do not conflict with locks acquired for writing
data, and so reading never blocks writing and writing never blocks
reading.
- Table- and row-level locking facilities are also available in
PostgreSQL for applications that cannot adapt easily to MVCC
behavior. However, proper use of MVCC will generally provide better
performance than locks. In addition, application-defined advisory
locks provide a mechanism for acquiring locks that are not tied to
a single transaction.
Locales:
- Client applications that handle server-side errors by parsing the
text of the error message will obviously have problems when the
server's messages are in a different language. Authors of such
applications are advised to make use of the error code scheme
instead. http://www.postgresql.org/docs/8.4/interactive/locale.html
Miscellaneous
Samples
sa@wks:~$ ll /usr/share/postgresql/8.4/
total 832K
drwxr-xr-x 2 root root 4.0K 2009-07-17 16:50 contrib
-rw-r--r-- 1 root root 74K 2009-07-11 18:24 conversion_create.sql
-rw-r--r-- 1 root root 92K 2009-07-11 18:24 information_schema.sql
drwxr-xr-x 4 root root 4.0K 2009-07-16 18:41 man
-rw-r--r-- 1 root root 3.5K 2009-07-11 18:24 pg_hba.conf.sample
-rw-r--r-- 1 root root 1.6K 2009-07-11 18:24 pg_ident.conf.sample
-rw-r--r-- 1 root root 643 2009-07-11 18:25 pg_service.conf.sample
-rw-r--r-- 1 root root 438K 2009-07-11 18:24 postgres.bki
-rw-r--r-- 1 root root 79K 2009-07-11 18:24 postgres.description
-rw-r--r-- 1 root root 17K 2009-07-11 18:24 postgresql.conf.sample
-rw-r--r-- 1 root root 40 2009-07-11 18:24 postgres.shdescription
-rw-r--r-- 1 root root 211 2009-07-11 18:25 psqlrc.sample
-rw-r--r-- 1 root root 2.9K 2009-07-11 18:24 recovery.conf.sample
-rw-r--r-- 1 root root 14K 2009-07-11 18:25 snowball_create.sql
-rw-r--r-- 1 root root 32K 2009-07-11 18:24 sql_features.txt
-rw-r--r-- 1 root root 18K 2009-07-11 18:24 system_views.sql
drwxr-xr-x 2 root root 4.0K 2009-07-16 18:41 timezonesets
drwxr-xr-x 2 root root 4.0K 2009-07-16 18:41 tsearch_data
sa@wks:~$
Compilation Settings
sa@wks:~$ pg_config
BINDIR = /usr/lib/postgresql/8.4/bin
DOCDIR = /usr/share/doc/postgresql
HTMLDIR = /usr/share/doc/postgresql
INCLUDEDIR = /usr/include/postgresql
PKGINCLUDEDIR = /usr/include/postgresql
INCLUDEDIR-SERVER = /usr/include/postgresql/8.4/server
LIBDIR = /usr/lib
PKGLIBDIR = /usr/lib/postgresql/8.4/lib
LOCALEDIR = /usr/share/locale
MANDIR = /usr/share/postgresql/8.4/man
SHAREDIR = /usr/share/postgresql/8.4
SYSCONFDIR = /etc/postgresql-common
PGXS = /usr/lib/postgresql/8.4/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=/usr/include' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--libexecdir=/usr/lib/postgresql-8.4' '--disable-maintainer-mode' '--disable-dependency-tracking' '--srcdir=.' '--mandir=/usr/share/postgresql/8.4/man' '--with-docdir=/usr/share/doc/postgresql-doc-8.4' '--sysconfdir=/etc/postgresql-common' '--datadir=/usr/share/postgresql/8.4' '--bindir=/usr/lib/postgresql/8.4/bin' '--includedir=/usr/include/postgresql/' '--enable-nls' '--enable-integer-datetimes' '--enable-thread-safety' '--enable-debug' '--disable-rpath' '--with-tcl' '--with-perl' '--with-python' '--with-pam' '--with-krb5' '--with-gssapi' '--with-openssl' '--with-libxml' '--with-libxslt' '--with-ldap' '--with-ossp-uuid' '--with-gnu-ld' '--with-tclconfig=/usr/lib/tcl8.5' '--with-tkconfig=/usr/lib/tk8.5' '--with-includes=/usr/include/tcl8.5' '--with-system-tzdata=/usr/share/zoneinfo' '--with-pgport=5432' 'CFLAGS=-g -O2 -g -Wall -O2 -fPIC' 'LDFLAGS= -Wl,--as-needed' 'build_alias=x86_64-linux-gnu' 'CC=cc' 'CPPFLAGS='
CC = cc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/tcl8.5
CFLAGS = -g -O2 -g -Wall -O2 -fPIC -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -g
CFLAGS_SL = -fpic
LDFLAGS = -Wl,--as-needed -Wl,--as-needed
LDFLAGS_SL =
LIBS = -lpgport -lxslt -lxml2 -lpam -lssl -lcrypto -lkrb5 -lcom_err -lgssapi_krb5 -lz -lreadline -lcrypt -ldl -lm
VERSION = PostgreSQL 8.4.0
sa@wks:~$
Files/Directories
wks:/var/lib/monkeysphere# la /usr/lib/postgresql/8.4/bin/
total 6884
drwxr-xr-x 2 root root 4096 2009-07-17 16:50 .
drwxr-xr-x 4 root root 4096 2009-07-16 18:41 ..
-rwxr-xr-x 1 root root 60456 2009-07-11 18:26 clusterdb
-rwxr-xr-x 1 root root 60520 2009-07-11 18:26 createdb
-rwxr-xr-x 1 root root 56216 2009-07-11 18:26 createlang
-rwxr-xr-x 1 root root 60808 2009-07-11 18:26 createuser
-rwxr-xr-x 1 root root 56232 2009-07-11 18:26 dropdb
-rwxr-xr-x 1 root root 56216 2009-07-11 18:26 droplang
-rwxr-xr-x 1 root root 56232 2009-07-11 18:26 dropuser
-rwxr-xr-x 1 root root 80160 2009-07-11 18:26 initdb
-rwxr-xr-x 1 root root 22808 2009-07-11 18:26 oid2name
-rwxr-xr-x 1 root root 43608 2009-07-11 18:26 pgbench
-rwxr-xr-x 1 root root 22808 2009-07-11 18:26 pg_controldata
-rwxr-xr-x 1 root root 35608 2009-07-11 18:26 pg_ctl
-rwxr-xr-x 1 root root 281752 2009-07-11 18:26 pg_dump
-rwxr-xr-x 1 root root 77512 2009-07-11 18:26 pg_dumpall
-rwxr-xr-x 1 root root 31088 2009-07-11 18:26 pg_resetxlog
-rwxr-xr-x 1 root root 130088 2009-07-11 18:26 pg_restore
-rwxr-xr-x 1 root root 22768 2009-07-11 18:26 pg_standby
-rwxr-xr-x 1 root root 5305056 2009-07-11 18:26 postgres
lrwxrwxrwx 1 root root 8 2009-07-16 18:41 postmaster -> postgres
-rwxr-xr-x 1 root root 369040 2009-07-11 18:26 psql
-rwxr-xr-x 1 root root 60488 2009-07-11 18:26 reindexdb
-rwxr-xr-x 1 root root 31672 2009-07-11 18:26 vacuumdb
-rwxr-xr-x 1 root root 22784 2009-07-11 18:26 vacuumlo
wks:/var/lib/monkeysphere#
Datadir
wks:~# pg_lsclusters
Version Cluster Port Status Owner Data directory Log file
8.4 main 5432 online postgres /var/lib/postgresql/8.4/main /var/log/postgresql/postgresql-8.4-main.log
wks:~# la /var/lib/postgresql/8.4/main/
total 56
drwx------ 11 postgres postgres 4096 2009-07-25 07:45 .
drwxr-xr-x 3 root root 4096 2009-07-16 18:41 ..
drwx------ 6 postgres postgres 4096 2009-07-17 15:37 base
drwx------ 2 postgres postgres 4096 2009-07-25 07:45 global
drwx------ 2 postgres postgres 4096 2009-07-16 18:41 pg_clog
drwx------ 4 postgres postgres 4096 2009-07-16 18:41 pg_multixact
drwx------ 2 postgres postgres 4096 2009-07-29 13:36 pg_stat_tmp
drwx------ 2 postgres postgres 4096 2009-07-16 18:41 pg_subtrans
drwx------ 2 postgres postgres 4096 2009-07-16 18:41 pg_tblspc
drwx------ 2 postgres postgres 4096 2009-07-16 18:41 pg_twophase
-rw------- 1 postgres postgres 4 2009-07-16 18:41 PG_VERSION
drwx------ 3 postgres postgres 4096 2009-07-16 18:41 pg_xlog
-rw------- 1 postgres postgres 133 2009-07-25 07:45 postmaster.opts
-rw------- 1 postgres postgres 54 2009-07-25 07:45 postmaster.pid
lrwxrwxrwx 1 root root 36 2009-07-16 18:42 server.crt -> /etc/ssl/certs/ssl-cert-snakeoil.pem
lrwxrwxrwx 1 root root 38 2009-07-16 18:42 server.key -> /etc/ssl/private/ssl-cert-snakeoil.key
wks:~#
SSL
The PostgreSQL server packages support SSL, which provides encrypted and
authenticated network communication. SSL should be used if you have an
untrusted network between a database server and a client and these exchange
security sensitive data like passwords or confidential database contents.
postgresql-common makes use of the snakeoil SSL certificate that is
generated by the ssl-cert package, so that SSL works out of the box.
When a cluster is created with pg_createcluster, SSL support will
automatically be enabled in that cluster, and symlinks server.key
and server.crt are created in the data directory which point to the
snakeoil certificate in /etc/ssl; therefore all clusters use the same
certificate by default. Of course you can replace these symlinks by
cluster-specific certificates.
In addition, if /etc/postgresql-common/root.crt exists, a symbolic
link root.crt will be created in the data directory.
/etc/postgresql-common/root.crt is a dummy file by default, so that
client-side authentication is not performed. To enable it, you should
add some root certificates to it. A reasonable choice is to just
symlink the file to /etc/ssl/certs/ssl-cert-snakeoil.pem; in this
case, client certificates need to be signed by the snakeoil
certificate, which might be desirable in many cases. See
/usr/share/doc/postgresql-doc-<version>/html/ssl-tcp.html
Non-local Connections
from pg_hba.conf
# If you want to allow non-local connections, you need to add more
# "host" records. In that case you will also need to make PostgreSQL listen
# on a non-local interface via the listen_addresses configuration parameter,
# or via the -i or -h command line switches.
Python
Cluster / Replication / High Availability / Load Balancing
Additional Supplied Modules
Client Interfaces
Procedural Languages
Python
Extensions
Utilities
GUI
|