auf.suno
Connector, geek, tech evangelist, libertarian, business enabler/angel, globetrotter, sportsman, agnostic, cosmopolitan, funny finch ...

This is the website of Markus Gattol. It is composed and driven exclusively by Open Source Software. The speciality of this website is that
it is seamlessly integrating into my daily working environment (Emacs + Python + MongoDB + Linux) which therefore means it becomes
a fully fledged and automatized publishing and communication platform. It will be under construction until 2012.

Open Source / Free Software, because freedom is in everyone's language ...
Frihed Svoboda Libertà Vrijheid เสรีภาพ Liberté Freiheit Cê̤ṳ-iù Ελευθερία Свобода חרות Bebas Libertada 自由
auf.suno
Website Sections
Home
FAQs
About Me
Weblog
Photo Albums
PostgreSQL
Status: Just notes so far
Pagecode: T->2 A->SAml H->trsa[t,a,si]d[t,a,si] C->SA[ccceji]
Last changed: Friday 2010-09-03 [15:24 UTC]
Abstract:

PostgreSQL is a ORDBMS (Object-Relational Database Management System) i.e. it is a system for managing data stored in "relations" (essentially a mathematical term for table) plus, it has the notion of an object-oriented database model. The notion of storing data in tables is so commonplace today that it might seem inherently obvious, but there are a number of other ways of organizing databases. Files and directories on Unix-like operating systems form an example of a hierarchical database. A more modern development is the object-oriented database. Each table is a named collection of rows. Each row of a given table has the same set of named columns, and each column is of a specific data type. Whereas columns have a fixed order in each row, it is important to remember that SQL (Structured Query Language) does not guarantee the order of the rows within the table in any way (although they can be explicitly sorted for display). Tables are grouped into databases, and a collection of databases managed by a single PostgreSQL server instance constitutes a database cluster.
Table of Contents
Quickstart
Introduction
Installation and Configuration
Resources
Administration
Maintenance
Performance
Programming for/with PostgreSQL
SQL
Miscellaneous
Samples
Compilation Settings
Files/Directories
SSL
Non-local Connections
Python
Cluster / Replication / High Availability / Load Balancing
Additional Supplied Modules
Client Interfaces
Procedural Languages
Extensions
Utilities

WRITEME

Quickstart

  • creatuser
  • createdb, dropdb

Introduction

  • http://www.postgresql.org/about/advantages
  • http://www.postgresql.org/docs/8.4/interactive/acronyms.html
  • http://www.postgresql.org/docs/8.4/interactive/tutorial-arch.html
    • The database server program is called postgres. The first/core/main database of any PostgreSQL cluster is also called postgres. See http://www.postgresql.org/docs/8.4/interactive/manage-ag-templatedbs.html
      • We could for example run several PostgresSQL clusters (a server process with its databases is called cluster) on the same machine, each cluster having it is own postgres main database.
    • As is typical of client/server applications, the client and the server can be on different hosts. In that case they communicate over a TCP/IP network connection.
    • The PostgreSQL server can handle multiple concurrent connections from clients. To achieve this it starts ("forks") a new process for each connection. From that point on, the client and the new server process communicate without intervention by the original postgres process. Thus, the master server process is always running, waiting for client connections, whereas client and associated server processes come and go. (All of this is of course invisible to the user. We only mention it here for completeness.)
    • A running PostgreSQL server can manage many databases. Typically, a separate database is used for each project or for each user.
    • PostgreSQL user accounts are distinct from operating system user accounts. You will need to become the operating system user under which PostgreSQL was installed (usually postgres) to create the first user account.
      • As an explanation for why this works: PostgreSQL user names are separate from operating system user accounts. When you connect to a database, you can choose what PostgreSQL user name to connect as; if you don't, it will default to the same name as your current operating system account. As it happens, there will always be a PostgreSQL user account that has the same name as the operating system user that started the server, and it also happens that that user always has permission to create databases. Instead of logging in as that user you can also specify the -U option everywhere to select a PostgreSQL user name to connect as.
      • http://www.postgresql.org/docs/8.4/interactive/user-manag.html
    • From /etc/postgresql/8.4/main/posgresql.conf: Re-commenting a setting is NOT sufficient to revert it to the default value; you need to reload the server. This file is read on server startup and when the server receives a SIGHUP signal. If you edit the file on a running system, you have to SIGHUP the server for the changes to take effect, or use "pg_ctl reload". Some parameters, which are marked below, require a server shutdown and restart to take effect.

Installation and Configuration

  • PGUSER Default connection parameters; PGHOST environment variable to the name of the database server machine. The environment variable PGPORT
  • All commands shipped by postgresql-common have detailed manpages. See postgresql-common(7) for the documentation of the database client program wrapping, and user_clusters(5) and postgresqlrc(5) for the cluster configuration.
  • The documentation of the database server and client functions, SQL commands, modules, etc. documented is shipped in the per-version packages postgresql-doc-<version>.

Resources

Memory

  • http://www.postgresql.org/docs/8.4/interactive/kernel-resources.html
  • http://wiki.openvz.org/Postgresql_and_shared_memory
  • http://forum.openvz.org/index.php?t=msg&goto=33831&
    • well, I still guess that for openvz the HN and VE settings for shared memory have to be adapted not just the VE; below an example for the HN can be seen. I have to try to find out ...
wks:~# sysctl -a 2> /dev/null | grep kernel.sh
kernel.shmmax = 536870912
kernel.shmall = 33554432
kernel.shmmni = 4096
wks:~#
  • http://www.postgresql.org/docs/8.4/interactive/kernel-resources.html

Administration

  • Before you can do anything, you must initialize a database storage area on disk. We call this a database cluster. (SQL uses the term catalog cluster.) A database cluster is a collection of databases that is managed by a single instance of a running database server.
  • After initialization, a database cluster will contain a database named postgres, which is meant as a default database for use by utilities, users and third party applications. The database server itself does not require the postgres database to exist, but many external utility programs assume it exists.
  • Another database created within each cluster during initialization is called template1. As the name suggests, this will be used as a template for subsequently created databases; it should not be used for actual work.
  • Port assignment: Please note that the pg_* tools automatically manage the server ports unless you specify them manually. The first cluster which is ever created (by any major version) will run on the default port 5432, and each new cluster will use the next higher free one. Please use pg_lsclusters for displaying the cluster <-> port mapping, and please have a look at the pg_createcluster manpage (the --port option) for details.
  • In filesystem terms, a database cluster will be a single directory under which all data will be stored. We call this the data directory or data area. Because the data directory contains all the data stored in the database, it is essential that it be secured from unauthorized access.
  • Locale: initdb also initializes the default locale for the database cluster. Normally, it will just take the locale settings in the environment and apply them to the initialized database. It is possible to specify a different locale for the database; more information about that can be found in Section 22.1. The default sort order used within the particular database cluster is set by initdb, and while you can create new databases using different sort order, the order used in the template databases that initdb creates cannot be changed without dropping and recreating them. There is also a performance impact for using locales other than C or POSIX. Therefore, it is important to make this choice correctly the first time.
  • initdb also sets the default character set encoding for the database cluster. Normally this should be chosen to match the locale setting.
  • pg_createcluster vs. initdb: Please note that you can of course also use the upstream tools for creating clusters, such as initdb. However, please note that in this case you cannot expect any of above pg_* tools to work, since they use different configuration settings (SSL, data directories, etc.) and file locations (e. g. /etc/postgresql/8.3/main/postgresql.conf). If in doubt, then do not use initdb, but only pg_createcluster. Since merely installing postgresql-X.Y will already set up a default cluster which is ready to work, most people do not need to bother about initdb or pg_createcluster at all.
  • The database server program is called postgres
  • It is best not to use SIGKILL to shut down the server. Doing so will prevent the server from releasing shared memory and semaphores, which might then have to be done manually before a new server can be started. Furthermore, SIGKILL kills the postgres process without letting it relay the signal to its subprocesses, so it will be necessary to kill the individual subprocesses by hand as well.
  • PostgreSQL offers encryption at several levels, and provides flexibility in protecting data from disclosure due to database server theft, unscrupulous administrators, and insecure networks. Encryption might also be required to secure sensitive data such as medical records or financial transactions. http://www.postgresql.org/docs/8.4/interactive/encryption-options.html
  • Secure TCP/IP Connections with SSL: PostgreSQL has native support for using SSL connections to encrypt client/server communications for increased security. This requires that OpenSSL is installed on both client and server systems and that support in PostgreSQL is enabled at build time
  • One way to set these parameters is to edit the file postgresql.conf, which is normally kept in the data directory.
  • A second way to set these configuration parameters is to give them as a command-line option to the postgres command, such as: postgres -c log_connections=yes -c log_destination='syslog'; Command-line options override any conflicting settings in postgresql.conf. Note that this means you won't be able to change the value on-the-fly by editing postgresql.conf, so while the command-line method might be convenient, it can cost you flexibility later.
  • Furthermore, it is possible to assign a set of parameter settings to a user or a database. Whenever a session is started, the default settings for the user and database involved are loaded. The commands ALTER USER and ALTER DATABASE, respectively, are used to configure these settings. Per-database settings override anything received from the postgres command-line or the configuration file, and in turn are overridden by per-user settings; both are overridden by per-session settings.
  • In addition to parameter settings, the postgresql.conf file can contain include directives, which specify another file to read and process as if it were inserted into the configuration file at this point. Include directives simply look like: include 'filename'; If the file name is not an absolute path, it is taken as relative to the directory containing the referencing configuration file. Inclusions can be nested.
  • The configuration file is reread whenever the main server process receives a SIGHUP signal (which is most easily sent by means of pg_ctl reload). The main server process also propagates this signal to all currently running server processes so that existing sessions also get the new value. Alternatively, you can send the signal to a single server process directly.
  • Some parameters can only be set at server start; any changes to their entries in the configuration file will be ignored until the server is restarted.
  • shared_preload_libraries use that one to load PostgreSQL procedural language libraries for example. By preloading a shared library, the library startup time is avoided when the library is first used. However, the time to start each new server process might increase slightly, even if that process never uses the library. So this parameter is recommended only for libraries that will be used in most sessions.
  • Background Writer: There is a separate server process called the background writer, whose function is to issue writes of "dirty" shared buffers. The intent is that server processes handling user queries should seldom or never have to wait for a write to occur, because the background writer will do it. However there is a net overall increase in I/O load, because a repeatedly-dirtied page might otherwise be written only once per checkpoint interval, but the background writer might write it several times in the same interval. The parameters discussed in this subsection can be used to tune the behavior for local needs.
  • logging: log_connections: Note: Some client programs, like psql, attempt to connect twice while determining if a password is required, so duplicate "connection received" messages do not necessarily indicate a problem.

Client Authentication:

  • PostgreSQL offers a number of different client authentication methods. The method used to authenticate a particular client connection can be selected on the basis of (client) host address, database, and user.
  • Client authentication is controlled by a configuration file, which traditionally is named pg_hba.conf and is stored in the database cluster's data directory. (HBA stands for host-based authentication.) A default pg_hba.conf file is installed when the data directory is initialized by initdb
  • A record is made up of a number of fields which are separated by spaces and/or tabs. Fields can contain white space if the field value is quoted. Records cannot be continued across lines.
  • Note: Remote TCP/IP connections will not be possible unless the server is started with an appropriate value for the listen_addresses configuration parameter, since the default behavior is to listen for TCP/IP connections only on the local loopback address localhost.
  • there is no real distinction between users and groups in PostgreSQL; a + mark really means "match any of the roles that are directly or indirectly members of this role", while a name without a + mark matches only that specific role
  • in config files, files can be included by the @ constructs
  • Since the pg_hba.conf records are examined sequentially for each connection attempt, the order of the records is significant. Typically, earlier records will have tight connection match parameters and weaker authentication methods, while later records will have looser match parameters and stronger authentication methods. For example, one might wish to use trust authentication for local TCP/IP connections but require a password for remote TCP/IP connections. In this case a record specifying trust authentication for connections from 127.0.0.1 would appear before a record specifying password authentication for a wider range of allowed client IP addresses.
  • Tip: To connect to a particular database, a user must not only pass the pg_hba.conf checks, but must have the CONNECT privilege for the database. If you wish to restrict which users can connect to which databases, it's usually easier to control this by granting/revoking CONNECT privilege than to put the rules into pg_hba.conf entries.
  • Username maps: To use username mapping, specify map=map-name in the options field in pg_hba.conf.
  • There is no restriction regarding how many database users a given operating system user can correspond to, nor vice versa. Thus, entries in a map should be thought of as meaning "this operating system user is allowed to connect as this database user", rather than implying that they are equivalent.
  • Authentication Methods, Password Authentication: PostgreSQL database passwords are separate from operating system user passwords. The password for each database user is stored in the pg_authid system catalog.
  • Passwords can be managed with the SQL commands CREATE USER and ALTER USER, e.g., CREATE USER foo WITH PASSWORD 'secret';.
  • By default, that is, if no password has been set up, the stored password is null and password authentication will always fail for that user.

Database Roles and Privileges

  • PostgreSQL manages database access permissions using the concept of roles. A role can be thought of as either a database user, or a group of database users, depending on how the role is set up. Roles can own database objects (for example, tables) and can assign privileges on those objects to other roles to control who has access to which objects. Furthermore, it is possible to grant membership in a role to another role, thus allowing the member role use of privileges assigned to the role it is a member of.
  • The concept of roles subsumes the concepts of "users" and "groups". In PostgreSQL versions before 8.1, users and groups were distinct kinds of entities, but now there are only roles. Any role can act as a user, a group, or both.
  • Database roles are global across a database cluster installation (and not per individual database).
  • the sql commands CREATE ROLE name; and DROP ROLE name; can be used. For convenience, the programs createuser and dropuser are provided as wrappers around these SQL commands that can be called from the shell command line: createuser name and dropuser name
  • Every connection to the database server is made in the name of some particular role, and this role determines the initial access privileges for commands issued on that connection.
  • The role name to use for a particular database connection is indicated by the client that is initiating the connection request in an application-specific fashion. For example, the psql program uses the -U command line option to indicate the role to connect as.
  • Many applications assume the name of the current operating system user by default (including createuser and psql). Therefore it is often convenient to maintain a naming correspondence between roles and operating system users.
  • Role Attributes: A database role can have a number of attributes that define its privileges and interact with the client authentication system. see http://www.postgresql.org/docs/8.4/interactive/role-attributes.html
  • Database passwords are separate from operating system passwords. Specify a password upon role creation with CREATE ROLE name PASSWORD 'string'.
  • A role's attributes can be modified after creation with ALTER ROLE.
  • It is good practice to create a role that has the CREATEDB and CREATEROLE privileges, but is not a superuser, and then use this role for all routine management of databases and roles. This approach avoids the dangers of operating as a superuser for tasks that do not really require it.
  • To remove a role-specific default setting, use ALTER ROLE rolename RESET varname;
  • Privileges: When an object is created, it is assigned an owner. The owner is normally the role that executed the creation statement. For most kinds of objects, the initial state is that only the owner (or a superuser) can do anything with the object. To allow other roles to use it, privileges must be granted. There are several different kinds of privilege: SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER, CREATE, CONNECT, TEMPORARY, EXECUTE, and USAGE. For more information on the different types of privileges supported by PostgreSQL, see the GRANT reference page.
  • Role Membership: It is frequently convenient to group users together to ease management of privileges: that way, privileges can be granted to, or revoked from, a group as a whole. In PostgreSQL this is done by creating a role that represents the group, and then granting membership in the group role to individual user roles.
  • To set up a group role, first create the role: CREATE ROLE name; Typically a role being used as a group would not have the LOGIN attribute, though you can set it if you wish.
  • The members of a role can use the privileges of the group role in two ways.
    • First, every member of a group can explicitly do SET ROLE to temporarily "become" the group role. In this state, the database session has access to the privileges of the group role rather than the original login role, and any database objects created are considered owned by the group role not the login role.
    • Second, member roles that have the INHERIT attribute automatically have use of privileges of roles they are members of.

Managing Databases:

  • Note: The SQL standard calls databases catalogs, but there is no difference in practice.
  • Every instance of a running PostgreSQL server manages one or more databases. Databases are therefore the topmost hierarchical level for organizing SQL objects ("database objects").
  • A database is a named collection of SQL objects ("database objects"). More accurately, a database is a collection of schemas and the schemas contain the tables, functions, etc. So the full hierarchy is: server, database, schema, table (or some other kind of object, such as a function).
  • Generally, every database object (tables, functions, etc.) belongs to one and only one database. (But there are a few system catalogs, for example pg_database, that belong to a whole cluster and are accessible from each database within the cluster.)
  • When connecting to the database server, a client must specify in its connection request the name of the database it wants to connect to. It is not possible to access more than one database per connection. But an application is not restricted in the number of connections it opens to the same or other databases.
  • Databases are physically separated and access control is managed at the connection level. If one PostgreSQL server instance is to house projects or users that should be separate and for the most part unaware of each other, it is therefore recommendable to put them into separate databases. If the projects or users are interrelated and should be able to use each other's resources they should be put in the same database, but possibly into separate schemas. Schemas are a purely logical structure and who can access what is managed by the privilege system.
  • Databases are created with the SQL command CREATE DATABASE <dbname>; where dbname follows the usual rules for SQL identifiers. The current role automatically becomes the owner of the new database. It is the privilege of the owner of a database to remove it later on (which also removes all the objects in it, even if they have a different owner).
  • Another, probably easier way is using createdb <dbname>. Note that createdb without any arguments will create a database with the current user name, which might or might not be what you want.
  • Since you need to be connected to the database server in order to execute the CREATE DATABASE command, the question remains how the first database at any given site can be created. The first database is always created by the initdb command when the data storage area is initialized. This database is called postgres. So to create the first ordinary database you can connect to postgres.
  • A second database, template1, is also created by initdb. Whenever a new database is created within the cluster, template1 is essentially cloned.
  • This means that any changes you make in template1 are propagated to all subsequently created databases. Therefore it is unwise to use template1 for real work, but when used judiciously this feature can be convenient.
  • Sometimes you want to create a database for someone else. That role should become the owner of the new database, so he can configure and manage it himself. To achieve that, use one of the following commands:
    • CREATE DATABASE dbname OWNER rolename; from the SQL environment, or
    • createdb -O rolename dbname from the shell. You must be a superuser to be allowed to create a database for someone else (that is, for a role you are not a member of).
  • CREATE DATABASE actually works by copying an existing database. By default, it copies the standard system database named template1. Thus that database is the template from which new databases are made.
  • There is a second standard system database named template0. This database contains the same data as the initial contents of template1, that is, only the standard objects predefined by your version of PostgreSQL. template0 should never be changed after initdb. By instructing CREATE DATABASE to copy template0 instead of template1, you can create a virgin user database that contains none of the site-local additions in template1. This is particularly handy when restoring a pg_dump dump — the dump script should be restored in a virgin database to ensure that one recreates the correct contents of the dumped database, without any conflicts with objects that might have been added to template1 later on.
  • Another common reason for copying template0 instead of template1 is that new encoding and locale settings can be specified when copying template0, whereas a copy of template1 must use the same settings it does. This is because template1 might contain encoding-specific or locale-specific data, while template0 is known not to.
  • To create a database by copying template0, use CREATE DATABASE dbname TEMPLATE template0; from the SQL environment, or createdb -T template0 dbname from the shell.
  • It is possible to create additional template databases, and indeed one can copy any database in a cluster by specifying its name as the template for CREATE DATABASE. It is important to understand, however, that this is not (yet) intended as a general-purpose COPY DATABASE facility. The principal limitation is that no other sessions can be connected to the source database while it is being copied. CREATE DATABASE will fail if any other connection exists when it starts; otherwise, new connections to the source database are locked out until CREATE DATABASE completes.
  • Two useful flags exist in pg_database for each database — the columns datistemplate and datallowconn:
    • datistemplate can be set to indicate that a database is intended as a template for CREATE DATABASE. If this flag is set, the database can be cloned by any user with CREATEDB privileges; if it is not set, only superusers and the owner of the database can clone it. Both template0 and template1 should always be marked with datistemplate = true.
    • If datallowconn is false, then no new connections to that database will be allowed (but existing sessions are not killed simply by setting the flag false). The template0 database is normally marked datallowconn = false to prevent modification of it.
  • template1 and template0 do not have any special status beyond the fact that the name template1 is the default source database name for CREATE DATABASE. For example, one could drop template1 and recreate it from template0 without any ill effects. This course of action might be advisable if one has carelessly added a bunch of junk in template1. To delete template1, it must have pg_database.datistemplate = false.
  • The postgres database is also created when a database cluster is initialized. This database is meant as a default database for users and applications to connect to. It is simply a copy of template1 and can be dropped and recreated if required.
  • Databases are destroyed with the command DROP DATABASE dbname; Only the owner of the database, or a superuser, can drop a database. Dropping a database removes all objects that were contained within the database. The destruction of a database cannot be undone. For convenience, there is also a shell program to drop databases dropdb dbname. Unlike createdb, it is not the default action to drop the database with the current user name.
  • You cannot execute the DROP DATABASE command while connected to the victim database. You can, however, be connected to any other database, including the template1 database. template1 would be the only option for dropping the last user database of a given cluster.
  • Tablespaces in PostgreSQL allow database administrators to define locations in the file system where the files representing database objects can be stored. Once created, a tablespace can be referred to by name when creating database objects. By using tablespaces, an administrator can control the disk layout of a PostgreSQL installation. Tablespaces allow an administrator to use knowledge of the usage pattern of database objects to optimize performance. For example, an index which is very heavily used can be placed on a very fast, highly available disk, such as an expensive solid state device. At the same time a table storing archived data which is rarely used or not performance critical could be stored on a less expensive, slower disk system.
  • PostgreSQL makes use of symbolic links to simplify the implementation of tablespaces. This means that tablespaces can be used only on systems that support symbolic links.
  • To define a tablespace, use the CREATE TABLESPACE command e.g. CREATE TABLESPACE fastspace LOCATION '/mnt/sda1/postgresql/data'; The location must be an existing, empty directory that is owned by the PostgreSQL system user. All objects subsequently created within the tablespace will be stored in files underneath this directory.
    • Note: There is usually not much point in making more than one tablespace per logical file system, since you cannot control the location of individual files within a logical file system. However, PostgreSQL does not enforce any such limitation, and indeed it is not directly aware of the file system boundaries on your system. It just stores files in the directories you tell it to use.
    • If a database is created without specifying a tablespace for it, it uses the same tablespace as the template database it is copied from. Two tablespaces are automatically created by initdb. The pg_global tablespace is used for shared system catalogs. The pg_default tablespace is the default tablespace of the template1 and template0 databases (and, therefore, will be the default tablespace for other databases as well, unless overridden by a TABLESPACE clause in CREATE DATABASE).
    • Once created, a tablespace can be used from any database, provided the requesting user has sufficient privilege. This means that a tablespace cannot be dropped until all objects in all databases using the tablespace have been removed.
  • Tables, indexes, and entire databases can be assigned to particular tablespaces. To do so, a user with the CREATE privilege on a given tablespace must pass the tablespace name as a parameter to the relevant command. For example, the following creates a table in the tablespace space1: CREATE TABLE foo(i int) TABLESPACE space1; Alternatively, use the default_tablespace parameter: SET default_tablespace = space1; CREATE TABLE foo(i int);
    • When default_tablespace is set to anything but an empty string, it supplies an implicit TABLESPACE clause for C=REATE TABLE= and CREATE INDEX commands that do not have an explicit one.
  • To remove an empty tablespace, use the DROP TABLESPACE command.

Localization:

  • Do not confuse i18n and l10n with encoding also known as character sets
  • Note that the locale behavior of the server is determined by the environment variables seen by the server, not by the environment of any client. Therefore, be careful to configure the correct locale settings before starting the server. A consequence of this is that if client and server are set up in different locales, messages might appear in different languages depending on where they originated.
    • When we speak of inheriting the locale from the execution environment, this means the following on most operating systems: For a given locale category, say the collation, the following environment variables are consulted in this order until one is found to be set: LC_ALL, LC_COLLATE (or the variable corresponding to the respective category), LANG. If none of these environment variables are set then the locale defaults to C.

      - Locale support refers to an application respecting cultural preferences regarding alphabets, sorting, number formatting, etc.

  • PostgreSQL uses the standard ISO C and POSIX locale facilities provided by the server operating system.
  • Locale support is automatically initialized when a database cluster is created using initdb. initdb will initialize the database cluster with the locale setting of its execution environment by default, so if your system is already set to use the locale that you want in your database cluster then there is nothing else you need to do.
  • There are following locale subcategories
    • LC_COLLATE, String sort order
    • LC_CTYPE, Character classification (What is a letter? Its upper-case equivalent?)
    • LC_MESSAGES, Language of messages
    • LC_MONETARY, Formatting of currency amounts
    • LC_NUMERIC, Formatting of numbers
    • LC_TIME, Formatting of dates and times
  • The nature of some locale categories is that their value has to be fixed when the database is created. You can use different settings for different databases, but once a database is created, you cannot change them for that database anymore. LC_COLLATE and LC_CTYPE are these categories. They affect the sort order of indexes, so they must be kept fixed, or indexes on text columns will become corrupt.
  • The other locale categories can be changed whenever desired by setting the server configuration parameters that have the same name as the locale categories. The values that are chosen by initdb are actually only written into the configuration file postgresql.conf to serve as defaults when the server is started. If you delete these assignments from postgresql.conf then the server will inherit the settings from its execution environment.
  • Client applications that handle server-side errors by parsing the text of the error message will obviously have problems when the server's messages are in a different language. Authors of such applications are advised to make use of the error code scheme instead.
  • Character Set Support: The character set support in PostgreSQL allows you to store text in a variety of character sets (also called encodings), including single-byte character sets such as the ISO 8859 series and multiple-byte character sets such as EUC (Extended Unix Code), UTF-8, and Mule internal code.
  • The default character set is selected while initializing your PostgreSQL database cluster using initdb. It can be overridden when you create a database, so you can have multiple databases each with a different character set.
  • An important restriction, however, is that each database's character set must be compatible with the database's LC_CTYPE and LC_COLLATE locale settings. For C or POSIX locale, any character set is allowed, but for other locales there is only one character set that will work correctly.
  • On most modern operating systems, PostgreSQL can determine which character set is implied by an LC_CTYPE setting, and it will enforce that only the matching database encoding is used. On older systems it is your responsibility to ensure that you use the encoding expected by the locale you have selected. A mistake in this area is likely to lead to strange misbehavior of locale-dependent operations such as sorting.
  • PostgreSQL supports automatic character set conversion between server and client for certain character set combinations. The conversion information is stored in the pg_conversion system catalog.

Maintenance

Routine Database Maintenance Tasks:

  • http://bucardo.org/check_postgres/ is available for monitoring database health and reporting unusual conditions. check_postgres.pl integrates with Nagios and MRTG, but can be run standalone too.
  • PostgreSQL, like any database software, requires that certain tasks be performed regularly to achieve optimum performance. The tasks discussed here are required, but they are repetitive in nature and can easily be automated using standard tools such as cron scripts or Windows' Task Scheduler. But it is the database administrator's responsibility to set up appropriate scripts, and to check that they execute successfully.
    • vacuuming
      • PostgreSQL databases require periodic maintenance known as vacuuming. For many installations, it is sufficient to let vacuuming be performed by the autovacuum daemon
      • PostgreSQL's VACUUM command has to process each table on a regular basis for several reasons:
        1. To recover or reuse disk space occupied by updated or deleted rows.
        2. To update data statistics used by the PostgreSQL query planner.
        3. To protect against loss of very old data due to transaction ID wraparound.
      • There are two variants of VACUUM: standard VACUUM and VACUUM FULL. VACUUM FULL can reclaim more disk space but runs much more slowly. Also, the standard form of VACUUM can run in parallel with production database operations which is not true for VACUUM FULL.
      • VACUUM creates a substantial amount of I/O traffic, which can cause poor performance for other active sessions. There are configuration parameters that can be adjusted to reduce the performance impact of background vacuuming.
      • The usual goal of routine vacuuming is to do standard VACUUMs often enough to avoid needing VACUUM FULL. The autovacuum daemon attempts to work this way, and in fact will never issue VACUUM FULL.
      • If you have multiple databases in a cluster, do not forget to VACUUM each one; the program vacuumdb might be helpful.
      • Neither form of VACUUM is entirely satisfactory when a table contains large numbers of dead row versions as a result of massive update or delete activity. If you have such a table and you need to reclaim the excess disk space it occupies, the best way is to use CLUSTER or one of the table-rewriting variants of ALTER TABLE. These commands rewrite an entire new copy of the table and build new indexes for it. Like VACUUM FULL, they require exclusive lock. Note that they also temporarily use extra disk space, since the old copies of the table and indexes can't be released until the new ones are complete. In the worst case where your disk is nearly full, VACUUM FULL may be the only workable alternative.
      • If you have a table whose entire contents are deleted on a periodic basis, consider doing it with TRUNCATE rather than using DELETE followed by VACUUM. TRUNCATE removes the entire content of the table immediately, without requiring a subsequent VACUUM or VACUUM FULL to reclaim the now-unused disk space. The

disadvantage is that strict MVCC semantics are violated.

  • updating statistics
    • The PostgreSQL query planner relies on statistical information about the contents of tables in order to generate good plans for queries. These statistics are gathered by the ANALYZE command, which can be invoked by itself or as an optional step in VACUUM. It is important to have reasonably accurate statistics, otherwise poor choices of plans might degrade database performance.
    • The autovacuum daemon, if enabled, will automatically issue ANALYZE commands whenever the content of a table has changed sufficiently. However, administrators might prefer to rely on manually-scheduled ANALYZE operations, particularly if it is known that update activity on a table will not affect the statistics of "interesting" columns. The daemon schedules ANALYZE strictly as a function of the number of rows inserted or updated; it has no knowledge of whether that will lead to meaningful statistical changes.
    • It is possible to run ANALYZE on specific tables and even just specific columns of a table, so the flexibility exists to update some statistics more frequently than others if your application requires it. In practice, however, it is usually best to just analyze the entire database, because it is a fast operation. ANALYZE uses a statistical random sampling of the rows of a table rather than reading every single row.
      • Tip: Although per-column tweaking of ANALYZE frequency might not be very productive, you might well find it worthwhile to do per-column adjustment of the level of detail of the statistics collected by ANALYZE. Columns that are heavily used in WHERE clauses and have highly irregular data distributions might require a finer-grain data histogram than other columns. See ALTER TABLE SET STATISTICS, or change the database-wide default using the default_statistics_target configuration parameter.
    • PostgreSQL has an optional but highly recommended feature called autovacuum, whose purpose is to automate the execution of VACUUM and ANALYZE commands.
    • The "autovacuum daemon" actually consists of multiple processes. There is a persistent daemon process, called the autovacuum launcher, which is in charge of starting autovacuum worker processes for all databases. The launcher will distribute the work across time, attempting to start one worker on each database every autovacuum_naptime seconds. One worker will be launched for each database, with a maximum of autovacuum_max_workers processes running at the same time. If there are more than autovacuum_max_workers databases to be processed, the next database will be processed as soon as the first worker finishes. Each worker process will check each table within its database and execute VACUUM and/or ANALYZE as needed.
    • The default thresholds and scale factors are taken from postgresql.conf, but it is possible to override them on a table-by-table basis; see Storage Parameters for more information.
  • reindexing
    • In PostgreSQL releases before 7.4, periodic reindexing was frequently necessary to avoid "index bloat", due to lack of internal space reclamation in B-tree indexes.
    • In PostgreSQL 7.4 and later, index pages that have become completely empty are reclaimed for re-use. There is still a possibility for inefficient use of space: if all but a few index keys on a page have been deleted, the page remains allocated. So a usage pattern in which all but a few keys in each range are eventually deleted will see poor use of space. For such usage patterns, periodic reindexing is recommended.
    • It might be worthwhile to reindex periodically just to improve access speed. In some situations it is worthwhile to rebuild indexes periodically with the REINDEX command.
  • log file management
    • http://www.postgresql.org/docs/8.4/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHERE
    • http://www.postgresql.org/docs/8.4/interactive/logfile-maintenance.html
  • backups
    • As with everything that contains valuable data, PostgreSQL databases should be backed up regularly. While the procedure is essentially simple, it is important to have a clear understanding of the underlying techniques and assumptions. There are three fundamentally different approaches to backing up PostgreSQL data:
      1. SQL dump
      2. File system level backup
      3. Continuous archiving
    • SQL dump: The idea behind this dump method is to generate a text file with SQL commands that, when fed back to the server, will recreate the database in the same state as it was at the time of the dump. PostgreSQL provides the utility program pg_dump for this purpose. The basic usage of this command is: pg_dump dbname > outfile
    • pg_dump does not operate with special permissions. In particular, it must have read access to all tables that you want to back up, so in practice you almost always have to run it as a database superuser.
    • Like any other PostgreSQL client application, pg_dump will by default connect with the database user name that is equal to the current operating system user name. To override this, either specify the -U option or set the environment variable PGUSER. Remember that pg_dump connections are subject to the normal client authentication mechanisms
    • Dumps created by pg_dump are internally consistent, that is, the dump represents a snapshot of the database as of the time pg_dump begins running. pg_dump does not block other operations on the database while it is working. If your database schema relies on OIDs (for instance as foreign keys) you must instruct pg_dump to dump the OIDs as well. To do this, use the -o command line option.
    • The text files created by pg_dump are intended to be read in by the psql program. The general command form to restore a dump is psql dbname < infile=. where infile is what you used as outfile for the pg_dump command. The database dbname will not be created by this command, so you must create it yourself from template0 before executing psql (e.g., with createdb -T template0 dbname)
    • Before restoring a SQL dump, all the users who own objects or were granted permissions on objects in the dumped database must already exist. If they do not, then the restore will fail to recreate the objects with the original ownership and/or permissions.
    • By default, the psql script will continue to execute after an SQL error is encountered. You might wish to use the following command at the top of the script to alter that behaviour and have psql exit with an exit status of 3 if an SQL error occurs: \set ON_ERROR_STOP
    • you can specify that the whole dump should be restored as a single transaction, so the restore is either fully completed or fully rolled back. This mode can be specified by passing the -1 or —single-transaction command-line options to psql. When using this mode, be aware that even the smallest of errors can rollback a restore that has already run for many hours. However, that might still be preferable to manually cleaning up a complex database after a partially restored dump.
    • The ability of pg_dump and psql to write to or read from pipes makes it possible to dump a database directly from one server to another, for example: pg_dump -h host1 dbname \| psql -h host2 dbname
    • The dumps produced by pg_dump are relative to template0. This means that any languages, procedures, etc. added via template1 will also be dumped by pg_dump. As a result, when restoring, if you are using a customized template1, you must create the empty database from template0, as in the example above.
    • pg_dump dumps only a single database at a time, and it does not dump information about roles or tablespaces (because those are cluster-wide rather than per-database). To support convenient dumping of the entire contents of a database cluster, the pg_dumpall program is provided. pg_dumpall backs up each database in a given cluster, and also preserves cluster-wide data such as role and tablespace definitions. The basic usage of this command is: pg_dumpall > outfile. The resulting dump can be restored with psql: psql -f infile postgres
      • pg_dumpall works by emitting commands to re-create roles, tablespaces, and empty databases, then invoking pg_dump for each database. This means that while each database will be internally consistent, the snapshots of different databases might not be exactly in-sync.
    • It is always necessary to have database superuser access when restoring a pg_dumpall dump, as that is required to restore the role and tablespace information.
    • If you use tablespaces, be careful that the tablespace paths in the dump are appropriate for the new installation.
    • Handling large databases: Use pg_dump's custom dump format. If PostgreSQL was built on a system with the zlib compression library installed, the custom dump format will compress data as it writes it to the output file. This will produce dump file sizes similar to using gzip, but it has the added advantage that tables can be restored selectively. The following command dumps a database using the custom dump format: pg_dump -Fc dbname > filename. A custom-format dump is not a script for psql, but instead must be restored with pg_restore, for example: pg_restore -d dbname filename. For very large databases, you might need to combine split with one of the other two approaches.
    • File System Level Backup: http://www.postgresql.org/docs/8.4/interactive/backup-file.html
    • If your database is spread across multiple file systems, there might not be any way to obtain exactly-simultaneous frozen snapshots of all the volumes. For example, if your data files and WAL log are on different disks, or if tablespaces are on different file systems, it might not be possible to use snapshot backup because the snapshots must be simultaneous.
    • Another option is to use rsync to perform a file system backup. This is done by first running rsync while the database server is running, then shutting down the database server just long enough to do a second rsync. The second rsync will be much quicker than the first, because it has relatively little data to transfer, and the end result will be consistent because the server was down. This method allows a file system backup to be performed with minimal downtime.
    • Continuous Archiving and Point-In-Time Recovery (PITR): At all times, PostgreSQL maintains a write ahead log (WAL) in the pg_xlog/ subdirectory of the cluster's data directory. The log describes every change made to the database's data files. This log exists primarily for crash-safety purposes: if the system crashes, the database can be restored to consistency by "replaying" the log entries made since the last checkpoint. However, the existence of the log makes it possible to use a third strategy for backing up databases: we can combine a file-system-level backup with backup of the WAL files. If recovery is needed, we restore the backup and then replay from the backed-up WAL files to bring the backup up to current time.
      • We do not need a perfectly consistent backup as the starting point. Any internal inconsistency in the backup will be corrected by log replay (this is not significantly different from what happens during crash recovery). So we don't need file system snapshot capability, just tar or a similar archiving tool.
      • Since we can string together an indefinitely long sequence of WAL files for replay, continuous backup can be achieved simply by continuing to archive the WAL files. This is particularly valuable for large databases, where it might not be convenient to take a full backup frequently.
      • There is nothing that says we have to replay the WAL entries all the way to the end. We could stop the replay at any point and have a consistent snapshot of the database as it was at that time. Thus, this technique supports point-in-time recovery: it is possible to restore the database to its state at any time since your base backup was taken.
      • If we continuously feed the series of WAL files to another machine that has been loaded with the same base backup file, we have a warm standby system: at any point we can bring up the second machine and it will have a nearly-current copy of the database.
    • As with the plain file-system-backup technique, this method can only support restoration of an entire database cluster, not a subset.
    • Also, it requires a lot of archival storage: the base backup might be bulky, and a busy system will generate many megabytes of WAL traffic that have to be archived. Still, it is the preferred backup technique in many situations where high reliability is needed.
    • To recover successfully using continuous archiving (also called "online backup" by many database vendors), you need a continuous sequence of archived WAL files that extends back at least as far as the start time of your backup.
    • In an abstract sense, a running PostgreSQL system produces an indefinitely long sequence of WAL records. The system physically divides this sequence into WAL segment files, which are normally 16MB apiece
    • The segment files are given numeric names that reflect their position in the abstract WAL sequence. When not using WAL archiving, the system normally creates just a few segment files and then "recycles" them by renaming no-longer-needed segment files to higher segment numbers. When archiving WAL data, we need to capture the contents of each segment file once it is filled, and save that data somewhere before the segment file is recycled for reuse.
    • To provide the database administrator with as much flexibility as possible, PostgreSQL tries not to make any assumptions about how the archiving will be done. Instead, PostgreSQL lets the administrator specify a shell command to be executed to copy a completed segment file to wherever it needs to go. The command could be as simple as a cp, or it could invoke a complex shell script — it's all up to you.
    • archive_command = 'cp -i %p /mnt/server/archivedir/%f </dev/null' The archive command will be executed under the ownership of the same user that the PostgreSQL server is running as.
    • Since the series of WAL files being archived contains effectively everything in your database, you will want to be sure that the archived data is protected from prying eyes; for example, archive into a directory that does not have group or world read access.
    • It is important that the archive command return zero exit status if and only if it succeeded. Upon getting a zero result, PostgreSQL will assume that the file has been successfully archived, and will remove or recycle it.
    • The archive command should generally be designed to refuse to overwrite any pre-existing archive file. This is an important safety feature to preserve the integrity of your archive in case of administrator error (such as sending the output of two different servers to the same archive directory).
    • Warm Standby Servers for High Availability: Continuous archiving can be used to create a high availability (HA) cluster configuration with one or more standby servers ready to take over operations if the primary server fails. This capability is widely referred to as warm standby or log shipping.
    • The primary and standby server work together to provide this capability, though the servers are only loosely coupled. The primary server operates in continuous archiving mode, while each standby server operates in continuous recovery mode, reading the WAL files from the primary. No changes to the database tables are required to enable this capability, so it offers low administration overhead in comparison with some other replication approaches. This configuration also has relatively low performance impact on the primary server.
    • Directly moving WAL records from one database server to another is typically described as log shipping. PostgreSQL implements file-based log shipping, which means that WAL records are transferred one file (WAL segment) at a time. WAL files (16MB) can be shipped easily and cheaply over any distance, whether it be to an adjacent system, another system on the same site or another system on the far side of the globe. The bandwidth required for this technique varies according to the transaction rate of the primary server.
    • It should be noted that the log shipping is asynchronous, i.e., the WAL records are shipped after transaction commit. As a result there is a window for data loss should the primary server suffer a catastrophic failure: transactions not yet shipped will be lost.
    • The standby server is not available for access, since it is continually performing recovery processing. Recovery performance is sufficiently good that the standby will typically be only moments away from full availability once it has been activated. As a result, we refer to this capability as a warm standby configuration that offers high availability. Restoring a server from an archived base backup and rollforward will take considerably longer, so that technique only offers a solution for disaster recovery, not high availability.
    • the hardware architecture must be the same — shipping from, say, a 32-bit to a 64-bit system will not work.
    • In general, log shipping between servers running different major PostgreSQL release levels will not be possible. It is the policy of the PostgreSQL Global Development Group not to make changes to disk formats during minor release upgrades, so it is likely that running different minor release levels on primary and standby servers will work successfully. However, no formal support for that is offered and you are advised to keep primary and standby servers at the same release level as much as possible
    • There is no special mode required to enable a standby server. The operations that occur on both primary and standby servers are entirely normal continuous archiving and recovery tasks. The only point of contact between the two database servers is the archive of WAL files that both share: primary writing to the archive, standby reading from the archive.
    • Care must be taken to ensure that WAL archives for separate primary servers do not become mixed together or confused. The archive need not be large, if it is only required for the standby operation.
    • The magic that makes the two loosely coupled servers work together is simply a restore_command used on the standby that, when asked for the next WAL file, waits for it to become available from the primary. The restore_command is specified in the recovery.conf file on the standby server. Normal recovery processing would request a file from the WAL archive, reporting failure if the file was unavailable. For standby processing it is normal for the next WAL file to be unavailable, so we must be patient and wait for it to appear.
    • A working example of a waiting restore_command is provided as a contrib module named pg_standby
    • PostgreSQL does not provide the system software required to identify a failure on the primary and notify the standby system and then the standby database server
    • The size of the WAL archive can be minimized by using the %r option of the restore_command. This option specifies the last archive file name that needs to be kept to allow the recovery to restart correctly. This can be used to truncate the archive once files are no longer required, if the archive is writable from the standby server.
    • If the primary server fails and then immediately restarts, you must have a mechanism for informing it that it is no longer the primary. This is sometimes known as STONITH (Shoot the Other Node In The Head), which is necessary to avoid situations where both systems think they are the primary, which will lead to confusion and ultimately data loss.
    • Many failover systems use just two systems, the primary and the standby, connected by some kind of heartbeat mechanism to continually verify the connectivity between the two and the viability of the primary.
    • Regular switching from primary to standby is useful, since it allows regular downtime on each system for maintenance. This also serves as a test of the failover mechanism to ensure that it will really work when you need it. Written administration procedures are advised.
    • Incrementally Updated Backups: In a warm standby configuration, it is possible to offload the expense of taking periodic base backups from the primary server; instead base backups can be made by backing up a standby server's files. This concept is generally known as incrementally updated backups, log change accumulation, or more simply, change accumulation.
    • http://www.postgresql.org/docs/8.4/interactive/migration.html and /usr/share/doc/postgresql-8.4/README.Debian.gz —> Default clusters and upgrading

High Availability, Load Balancing, and Replication:

  • read-only database servers can be combined relatively easily too. Unfortunately, most database servers have a read/write mix of requests, and read/write servers are much harder to combine. This synchronization problem is the fundamental difficulty for servers working together. Because there is no single solution that eliminates the impact of the sync problem for all use cases, there are multiple solutions. Each solution addresses this problem in a different way, and minimizes its impact for a specific workload.
  • Some solutions deal with synchronization by allowing only one server to modify the data. Servers that can modify data are called read/write or "master" servers. Servers that can reply to read-only queries are called "slave" servers. Servers that cannot be accessed until they are changed to master servers are called "standby" servers.
  • Some solutions are synchronous, meaning that a data-modifying transaction is not considered committed until all servers have committed the transaction. This guarantees that a failover will not lose any data and that all load-balanced servers will return consistent results no matter which server is queried. In contrast, asynchronous solutions allow some delay between the time of a commit and its propagation to the other servers, opening the possibility that some transactions might be lost in the switch to a backup server, and that load balanced servers might return slightly stale results. Asynchronous communication is used when synchronous would be too slow.
  • Solutions can also be categorized by their granularity. Some solutions can deal only with an entire database server, while others allow control at the per-table or per-database level.
  • Performance must be considered in any choice. There is usually a trade-off between functionality and performance. For example, a full synchronous solution over a slow network might cut performance by more than half, while an asynchronous one might have a minimal performance impact.
  • http://www.postgres-r.org/documentation/terms

Monitoring Database Activity:

  • http://www.postgresql.org/docs/8.4/interactive/monitoring.html
  • A database administrator frequently wonders, "What is the system doing right now?" This chapter discusses how to find that out. Several tools are available for monitoring database activity and analyzing performance. Most of this chapter is devoted to describing PostgreSQL's statistics collector, but one should not neglect regular Unix monitoring programs such as ps, top, iostat, and vmstat. Also, once one has identified a poorly-performing query, further investigation might be needed using PostgreSQL's EXPLAIN command.
  • Standard Unix Tools: Unix monitoring programs such as ps, top, iostat, and vmstat.
  • The Statistics Collector: PostgreSQL's statistics collector is a subsystem that supports collection and reporting of information about server activity.
  • the collector can count accesses to tables and indexes in both disk-block and individual-row terms. It also tracks total numbers of rows in each table, and the last vacuum and analyze times for each table. It can also count calls to user-defined functions and the total time spent in each one.
  • PostgreSQL also supports determining the exact command currently being executed by other server processes. This is an independent facility that does not depend on the collector process.
  • Since collection of statistics adds some overhead to query execution, the system can be configured to collect or not collect information. This is controlled by configuration parameters that are normally set in postgresql.conf. Normally these parameters are set in postgresql.conf so that they apply to all server processes, but it is possible to turn them on or off in individual sessions using the SET command.
  • The statistics collector communicates with the backends needing information (including autovacuum) through temporary files. These files are stored in the pg_stat_tmp subdirectory. When the postmaster shuts down, a permanent copy of the statistics data is stored in the global subdirectory.
  • When using the statistics to monitor current activity, it is important to realize that the information does not update instantaneously. Each individual server process transmits new statistical counts to the collector just before going idle; so a query or transaction still in progress does not affect the displayed totals.
  • Viewing Locks: Another useful tool for monitoring database activity is the pg_locks system table. It allows the database administrator to view information about the outstanding locks in the lock manager.
  • Dynamic Tracing: PostgreSQL provides facilities to support dynamic tracing of the database server. This allows an external utility to be called at specific points in the code and thereby trace execution. A number of probes or trace points are already inserted into the source code. These probes are intended to be used by database developers and administrators. By default the probes are not compiled into PostgreSQL; the user needs to explicitly tell the configure script to make the probes available.

Monitoring Disk Usage:

  • Determining Disk Usage: Each table has a primary heap disk file where most of the data is stored. If the table has any columns with potentially-wide values, there is also a TOAST file associated with the table, which is used to store values too wide to fit comfortably in the main table
  • There will be one index on the TOAST table, if present. There might also be indexes associated with the base table. Each table and index is stored in a separate disk file — possibly more than one file, if the file would exceed one gigabyte.
  • You can monitor disk space from three ways: using SQL functions listed in Table 9-57, using VACUUM information, and from the command line using the tools in contrib/oid2name.
  • Disk Full Failure: The most important disk monitoring task of a database administrator is to make sure the disk doesn't grow full. A filled data disk will not result in data corruption, but it might prevent useful activity from occurring. If the disk holding the WAL files grows full, database server panic and consequent shutdown might occur.

Reliability and the Write-Ahead Log:

  • Reliability is an important property of any serious database system, and PostgreSQL does everything possible to guarantee reliable operation. One aspect of reliable operation is that all data recorded by a committed transaction should be stored in a nonvolatile area that is safe from power loss, operating system failure, and hardware failure (except failure of the nonvolatile area itself, of course).
  • Successfully writing the data to the computer's permanent storage (disk drive or equivalent) ordinarily meets this requirement. In fact, even if a computer is fatally damaged, if the disk drives survive they can be moved to another computer with similar hardware and all committed transactions will remain intact.
  • While forcing data periodically to the disk platters might seem like a simple operation, it is not. Because disk drives are dramatically slower than main memory and CPUs, several layers of caching exist between the computer's main memory and the disk platters. First, there is the operating system's buffer cache, which caches frequently requested disk blocks and combines disk writes. Fortunately, all operating systems give applications a way to force writes from the buffer cache to disk, and PostgreSQL uses those features.
  • Next, there might be a cache in the disk drive controller; this is particularly common on RAID controller cards. Some of these caches are write-through, meaning writes are passed along to the drive as soon as they arrive. Others are write-back, meaning data is passed on to the drive at some later time. Such caches can be a reliability hazard because the memory in the disk controller cache is volatile, and will lose its contents in a power failure. Better controller cards have battery-backed caches, meaning the card has a battery that maintains power to the cache in case of system power loss. After power is restored the data will be written to the disk drives.
  • And finally, most disk drives have caches. Some are write-through while some are write-back, and the same concerns about data loss exist for write-back drive caches as exist for disk controller caches. Consumer-grade IDE and SATA drives are particularly likely to have write-back caches that will not survive a power failure. To check write caching on Linux use hdparm -I; it is enabled if there is a * next to Write cache. hdparm -W to turn off write caching.
  • Another risk of data loss is posed by the disk platter write operations themselves. Disk platters are divided into sectors, commonly 512 bytes each. Every physical read or write operation processes a whole sector. When a write request arrives at the drive, it might be for 512 bytes, 1024 bytes, or 8192 bytes, and the process of writing could fail due to power loss at any time, meaning some of the 512-byte sectors were written, and others were not. To guard against such failures, PostgreSQL periodically writes full page images to permanent storage before modifying the actual page on disk. By doing this, during crash recovery PostgreSQL can restore partially-written pages. If you have a battery-backed disk controller or file-system software that prevents partial page writes (e.g., ReiserFS 4), you can turn off this page imaging by using the full_page_writes parameter.
  • Write-Ahead Logging (WAL) is a standard method for ensuring data integrity. A detailed description can be found in most (if not all) books about transaction processing. Briefly, WAL's central concept is that changes to data files (where tables and indexes reside) must be written only after those changes have been logged, that is, after log records describing the changes have been flushed to permanent storage. If we follow this procedure, we do not need to flush data pages to disk on every transaction commit, because we know that in the event of a crash we will be able to recover the database using the log: any changes that have not been applied to the data pages can be redone from the log records. (This is roll-forward recovery, also known as REDO.)
  • Using WAL results in a significantly reduced number of disk writes, because only the log file needs to be flushed to disk to guarantee that a transaction is committed, rather than every data file changed by the transaction.
  • Asynchronous Commit: Asynchronous commit is an option that allows transactions to complete more quickly, at the cost that the most recent transactions may be lost if the database should crash. In many applications this is an acceptable trade-off.
  • As described in the previous section, transaction commit is normally synchronous: the server waits for the transaction's WAL records to be flushed to permanent storage before returning a success indication to the client. The client is therefore guaranteed that a transaction reported to be committed will be preserved, even in the event of a server crash immediately after. However, for short transactions this delay is a major component of the total transaction time. Selecting asynchronous commit mode means that the server returns success as soon as the transaction is logically completed, before the WAL records it generated have actually made their way to disk. This can provide a significant boost in throughput for small transactions.
  • Asynchronous commit introduces the risk of data loss. There is a short time window between the report of transaction completion to the client and the time that the transaction is truly committed (that is, it is guaranteed not to be lost if the server crashes). The risk that is taken by using asynchronous commit is of data loss, not data corruption. If the database should crash, it will recover by replaying WAL up to the last record that was flushed. The database will therefore be restored to a self-consistent state, but any transactions that were not yet flushed to disk will not be reflected in that state.
  • Thus asynchronous commit should not be used if the client will take external actions relying on the assumption that the transaction will be remembered. As an example, a bank would certainly not use asynchronous commit for a transaction recording an ATM's dispensing of cash. But in many scenarios, such as event logging, there is no need for a strong guarantee of this kind.
  • If the database crashes during the risk window between an asynchronous commit and the writing of the transaction's WAL records, then changes made during that transaction will be lost. The duration of the risk window is limited because a background process (the "WAL writer") flushes unwritten WAL records to disk every wal_writer_delay milliseconds. The actual maximum duration of the risk window is three times wal_writer_delay because the WAL writer is designed to favor writing whole pages at a time during busy periods.
  • An immediate-mode shutdown is equivalent to a server crash, and will therefore cause loss of any unflushed asynchronous commits.
  • Asynchronous commit provides behavior different from setting fsync = off. fsync is a server-wide setting that will alter the behavior of all transactions. It disables all logic within PostgreSQL that attempts to synchronize writes to different portions of the database, and therefore a system crash (that is, a hardware or operating system crash, not a failure of PostgreSQL itself) could result in arbitrarily bad corruption of the database state. In many scenarios, asynchronous commit provides most of the performance improvement that could be obtained by turning off fsync, but without the risk of data corruption.
  • WAL Configuration: Checkpoints are points in the sequence of transactions at which it is guaranteed that the data files have been updated with all information written before the checkpoint. At checkpoint time, all dirty data pages are flushed to disk and a special checkpoint record is written to the log file.
  • WAL Internals: WAL is automatically enabled; no action is required from the administrator except ensuring that the disk-space requirements for the WAL logs are met, and that any necessary tuning is done
  • WAL logs are stored in the directory pg_xlog under the data directory, as a set of segment files, normally each 16 MB in size (but the size can be changed by altering the --with-wal-segsize configure option when building the server). Each segment is divided into pages, normally 8 kB each (this size can be changed via the --with-wal-blocksize configure option)
  • Segment files are given ever-increasing numbers as names, starting at 000000010000000000000000. The numbers do not wrap, at present, but it should take a very very long time to exhaust the available stock of numbers.
  • It is of advantage if the log is located on another disk than the main database files. This can be achieved by moving the directory pg_xlog to another location (while the server is shut down, of course) and creating a symbolic link from the original location in the main data directory to the new location. The aim of WAL, to ensure that the log is written before database records are altered, can be subverted by disk drives that falsely report a successful write to the kernel, when in fact they have only cached the data and not yet stored it on the disk. A power failure in such a situation might still lead to irrecoverable data corruption. Administrators should try to ensure that disks holding PostgreSQL's WAL log files do not make such false reports.
  • After a checkpoint has been made and the log flushed, the checkpoint's position is saved in the file pg_control. Therefore, when recovery is to be done, the server first reads pg_control and then the checkpoint record; then it performs the REDO operation by scanning forward from the log position indicated in the checkpoint record.

Regression Tests:

  • The regression tests are a comprehensive set of tests for the SQL implementation in PostgreSQL. They test standard SQL operations as well as the extended capabilities of PostgreSQL.
  • http://www.postgresql.org/docs/8.4/interactive/regress.html
  • Some properly installed and fully functional PostgreSQL installations can "fail" some of these regression tests due to platform-specific artifacts such as varying floating-point representation and message wording. The tests are currently evaluated using a simple diff comparison against the outputs generated on a reference system, so the results are sensitive to small system differences.

Performance

  • http://samiux.wordpress.com/2009/07/26/howto-performance-tuning-for-postgresql-on-ubuntudebian/
  • http://www.postgresql.org/docs/8.4/interactive/runtime-config-resource.html
  • from http://www.postgresql.org/docs/8.4/interactive/runtime-config-query.html
    • Turning one of these settings off permanently is seldom a good idea, however. Better ways to improve the quality of the plans chosen by the optimizer include adjusting the Planner Cost Constants, running ANALYZE more frequently, increasing the value of the default_statistics_target configuration parameter, and increasing the amount of statistics collected for specific columns using ALTER TABLE SET STATISTICS.
  • http://www.postgresql.org/docs/8.4/interactive/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR
  • http://www.postgresql.org/docs/8.4/interactive/runtime-config-wal.html
  • from http://www.postgresql.org/docs/8.4/interactive/runtime-config-query.html, Planner Cost Constants: random_page_cost (floating point): Sets the planner's estimate of the cost of a non-sequentially-fetched disk page. The default is 4.0. Reducing this value relative to seq_page_cost will cause the system to prefer index scans; raising it will make index scans look relatively more expensive. You can raise or lower both values together to change the importance of disk I/O costs relative to CPU costs, which are described by the following parameters.
    • Tip: Although the system will let you set random_page_cost to less than seq_page_cost, it is not physically sensible to do so. However, setting them equal makes sense if the database is entirely cached in RAM, since in that case there is no penalty for touching pages out of sequence. Also, in a heavily-cached database you should lower both values relative to the CPU parameters, since the cost of fetching a page already in RAM is much smaller than it would normally be.
  • Tablespaces in PostgreSQL allow database administrators to define locations in the file system where the files representing database objects can be stored. Once created, a tablespace can be referred to by name when creating database objects. By using tablespaces, an administrator can control the disk layout of a PostgreSQL installation. Tablespaces allow an administrator to use knowledge of the usage pattern of database objects to optimize performance. For example, an index which is very heavily used can be placed on a very fast, highly available disk, such as an expensive solid state device. At the same time a table storing archived data which is rarely used or not performance critical could be stored on a less expensive, slower disk system.

Programming for/with PostgreSQL

  • http://www.postgresql.org/docs/8.4/interactive/client-interfaces.html
  • http://www.postgresql.org/docs/8.4/interactive/biblio.html#MELT93
  • http://www.postgresql.org/docs/8.4/interactive/biblio.html#DATE97
  • Two dashes ("—") introduce comments. Whatever follows them is ignored up to the end of the line. SQL is case insensitive about key words and identifiers, except when identifiers are double-quoted to preserve the case
  • Constants that are not simple numeric values usually must be surrounded by single quotes
  • Many developers consider explicitly listing the columns better style than relying on the order implicitly.
  • You could also have used COPY to load large amounts of data from flat-text files. This is usually faster because the COPY command is optimized for this application while allowing less flexibility than INSERT. An example would be COPY weather FROM '/home/user/weather.txt'; where the file name for the source file must be available to the backend server machine, not the client, since the backend server reads the file directly.
  • To retrieve data from a table, the table is queried. An SQL SELECT statement is used to do this.
    • The statement is divided into a select list (the part that lists the columns to be returned),
    • a table list (the part that lists the tables from which to retrieve the data), and
    • an optional qualification (the part that specifies any restrictions).
  • You can write expressions, not just simple column references, in the select list.
  • A query can be "qualified" by adding a WHERE clause that specifies which rows are wanted. The WHERE clause contains a Boolean (truth value) expression, and only rows for which the Boolean expression is true are returned. The usual Boolean operators (AND, OR, and NOT) are allowed in the qualification.
  • While SELECT * is useful for off-the-cuff queries, it is widely considered bad style in production code, since adding a column to the table would change the results.
  • Thus far, our queries have only accessed one table at a time. Queries can access multiple tables at once, or access the same table in such a way that multiple rows of the table are being processed at the same time. A query that accesses multiple rows of the same or different tables at one time is called a join query
  • It is widely considered good style to qualify all column names in a join query, so that the query won't fail if a duplicate column name is later added to one of the tables.
  • types of joins:
    • self join
    • inner join
    • outer join
    • left outer join
    • right outer join
    • full outer join
  • An aggregate function computes a single result from multiple input rows.
  • It is important to understand the interaction between aggregates and SQL's WHERE and HAVING clauses. The fundamental difference between WHERE and HAVING is this: WHERE selects input rows before groups and aggregates are computed (thus, it controls which rows go into the aggregate computation), whereas HAVING selects group rows after groups and aggregates are computed. Thus, the WHERE clause must not contain aggregate functions; it makes no sense to try to use an aggregate to determine which rows will be inputs to the aggregates. On the other hand, the HAVING clause always contains aggregate functions. (Strictly speaking, you are allowed to write a HAVING clause that doesn't use aggregates, but it's seldom useful. The same condition could be used more efficiently at the WHERE stage.)

View:

  • You can create a view over the query, which gives a name to the query that you can refer to like an ordinary table. Views allow you to encapsulate the details of the structure of your tables, which might change as your application evolves, behind consistent interfaces. Views can be used in almost any place a real table can be used. Building views upon other views is not uncommon.

Transaction:

  • Transactions are a fundamental concept of all database systems. The essential point of a transaction is that it bundles multiple steps into a single, all-or-nothing operation. The intermediate states between the steps are not visible to other concurrent transactions, and if some failure occurs that prevents the transaction from completing, then none of the steps affect the database at all.
  • A transaction is said to be atomic: from the point of view of other transactions, it either happens completely or not at all.
  • A transactional database guarantees that all the updates made by a transaction are logged in permanent storage (i.e., on disk) before the transaction is reported complete.
  • Another important property of transactional databases is closely related to the notion of atomic updates: when multiple transactions are running concurrently, each one should not be able to see the incomplete changes made by others. For example, if one transaction is busy totalling all the branch balances, it would not do for it to include the debit from Alice's branch but not the credit to Bob's branch, nor vice versa. So transactions must be all-or-nothing not only in terms of their permanent effect on the database, but also in terms of their visibility as they happen. The updates made so far by an open transaction are invisible to other transactions until the transaction completes, whereupon all the updates become visible simultaneously.
  • PostgreSQL actually treats every SQL statement as being executed within a transaction. If you do not issue a BEGIN command, then each individual statement has an implicit BEGIN and (if successful) COMMIT wrapped around it. A group of statements surrounded by BEGIN and COMMIT is sometimes called a transaction block.
  • It's possible to control the statements in a transaction in a more granular fashion through the use of savepoints. Savepoints allow you to selectively discard parts of the transaction, while committing the rest. After defining a savepoint with SAVEPOINT, you can if needed roll back to the savepoint with ROLLBACK TO. All the transaction's database changes between defining the savepoint and rolling back to it are discarded, but changes earlier than the savepoint are kept.
  • After rolling back to a savepoint, it continues to be defined, so you can roll back to it several times. Conversely, if you are sure you won't need to roll back to a particular savepoint again, it can be released, so the system can free some resources. Keep in mind that either releasing or rolling back to a savepoint will automatically release all savepoints that were defined after it.

Window Functions:

  • A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result.
  • A window function call always contains an OVER clause following the window function's name and argument(s).
  • Window functions are permitted only in the SELECT list and the ORDER BY clause of the query. They are forbidden elsewhere, such as in GROUP BY, HAVING and WHERE clauses. This is because they logically execute after the processing of those clauses. Also, window functions execute after regular aggregate functions. This means it is valid to include an aggregate function call in the arguments of a window function, but not vice versa.

SQL

  • SQL input consists of a sequence of commands. A command is composed of a sequence of tokens, terminated by a semicolon (";"). The end of the input stream also terminates a command. Which tokens are valid depends on the syntax of the particular command. A token can be a key word, an identifier, a quoted identifier, a literal (or constant), or a special character symbol. Tokens are normally separated by whitespace (space, tab, newline), but need not be if there is no ambiguity (which is generally only the case if a special character is adjacent to some other token type). Additionally, comments can occur in SQL input. They are not tokens, they are effectively equivalent to whitespace.
  • the UPDATE command always requires a SET token to appear in a certain position
  • key words, that is, words that have a fixed meaning in the SQL language
  • identifiers. They identify names of tables, columns, or other database objects, depending on the command they are used in.
  • Key words and identifiers have the same lexical structure, meaning that one cannot know whether a token is an identifier or a key word without knowing the language.
  • A convention often used is to write key words in upper case and names in lower case, e.g. UPDATE my_table SET a = 5;
  • The Unicode escape syntax works only when the server encoding is UTF8. When other server encodings are used, only code points in the ASCII range (up to \007F) can be specified.
  • Quoting an identifier also makes it case-sensitive, whereas unquoted names are always folded to lower case. For example, the identifiers FOO, foo, and "foo" are considered the same by PostgreSQL, but "Foo" and "FOO" are different from these three and each other. (The folding of unquoted names to lower case in PostgreSQL is incompatible with the SQL standard, which says that unquoted names should be folded to upper case. Thus, foo should be equivalent to "FOO" not "foo" according to the standard. If you want to write portable applications you are advised to always quote a particular name or never quote it.)
  • There are three kinds of implicitly-typed constants in PostgreSQL: strings, bit strings, and numbers. Constants can also be specified with explicit types, which can enable more accurate representation and more efficient handling by the system.
  • Although standard_conforming_strings currently defaults to off, the default will change to on in a future release for improved standards compliance. Applications are therefore encouraged to migrate away from using backslash escapes. If you need to use a backslash escape to represent a special character, write the string constant with an E to be sure it will be handled the same way in future releases.

Value Expressions:

  • Value expressions are used in a variety of contexts, such as in the target list of the SELECT command, as new column values in INSERT or UPDATE, or in search conditions in a number of commands. The result of a value expression is sometimes called a scalar, to distinguish it from the result of a table expression (which is a table). Value expressions are therefore also called scalar expressions (or even simply expressions). The expression syntax allows the calculation of values from primitive parts using arithmetic, logical, set, and other operations.
  • A row constructor is an expression that builds a row value (also called a composite value) using values for its member fields

Data Definition:

  • Subsequently, we discuss how tables can be organized into schemas
  • The number and order of the columns is fixed, and each column has a name. The number of rows is variable — it reflects how much data is stored at a given moment.
  • SQL does not make any guarantees about the order of the rows in a table.
  • Furthermore, SQL does not assign unique identifiers to rows, so it is possible to have several completely identical rows in a table.
  • Each column has a data type. The data type constrains the set of possible values that can be assigned to a column and assigns semantics to the data stored in the column so that it can be used for computations.
  • When you create many interrelated tables it is wise to choose a consistent naming pattern for the tables and columns. For instance, there is a choice of using singular or plural nouns for table names, both of which are favored by some theorist or other.
  • There is a limit on how many columns a table can contain. Depending on the column types, it is between 250 and 1600. However, defining a table with anywhere near this many columns is highly unusual and often a questionable design.
  • A column can be assigned a default value. When a new row is created and no values are specified for some of the columns, those columns will be filled with their respective default values. A data manipulation command can also request explicitly that a column be set to its default value, without having to know what that value is. If no default value is declared explicitly, the default value is the null value.
  • In general, a unique constraint is violated when there is more than one row in the table where the values of all of the columns included in the constraint are equal.
  • A table can have at most one primary key. A primary key indicates that a column or group of columns can be used as a unique identifier for rows in the table. Note that a unique constraint does not, by itself, provide a unique identifier because it does not exclude null values. Relational database theory dictates that every table must have a primary key. This rule is not enforced by PostgreSQL, but it is usually best to follow it.
  • A foreign key constraint specifies that the values in a column (or a group of columns) must match the values appearing in some row of another table. We say this maintains the referential integrity between two related tables.
  • Finally, we should mention that a foreign key must reference columns that either are a primary key or form a unique constraint.
  • When you create a database object, you become its owner. By default, only the owner of an object can do anything with the object. In order to allow other users to use it, privileges must be granted. However, users that have the superuser attribute can always access any object.

Schemas:

  • A PostgreSQL database cluster contains one or more named databases. Users and groups of users are shared across the entire cluster, but no other data is shared across databases. Any given client connection to the server can access only the data in a single database, the one specified in the connection request.
  • Users of a cluster do not necessarily have the privilege to access every database in the cluster. Sharing of user names means that there cannot be different users named, say, joe in two databases in the same cluster; but the system can be configured to allow joe access to only some of the databases.
  • Often you will want to create a schema owned by someone else (since this is one of the ways to restrict the activities of your users to well-defined namespaces). The syntax for that is: CREATE SCHEMA schemaname AUTHORIZATION username;
  • Schema names beginning with pg_ are reserved for system purposes and cannot be created by users.
  • In the previous sections we created tables without specifying any schema names. By default such tables (and other objects) are automatically put into a schema named "public". Every new database contains such a schema.
  • By default, users cannot access any objects in schemas they do not own. To allow that, the owner of the schema must grant the USAGE privilege on the schema. To allow users to make use of the objects in the schema, additional privileges might need to be granted, as appropriate for the object.
  • Note that by default, everyone has CREATE and USAGE privileges on the schema public. This allows all users that are able to connect to a given database to create objects in its public schema.
  • In addition to public and user-created schemas, each database contains a pg_catalog schema, which contains the system tables and all the built-in data types, functions, and operators. pg_catalog is always effectively part of the search path. If it is not named explicitly in the path then it is implicitly searched before searching the path's schemas.
  • In PostgreSQL versions before 7.3, table names beginning with pg_ were reserved. This is no longer true: you can create such a table name if you wish, in any non-system schema. However, it's best to continue to avoid such names, to ensure that you won't suffer a conflict if some future version defines a system table named the same as your table.
  • http://www.postgresql.org/docs/8.4/interactive/ddl-schemas.html#DDL-SCHEMAS-PATTERNS

Inheritance

  • PostgreSQL implements table inheritance, which can be a useful tool for database designers. (SQL:1999 and later define a type inheritance feature, which differs in many respects from the features described here.)
  • In PostgreSQL, a table can inherit from zero or more other tables, and a query can reference either all rows of a table or all rows of a table plus all of its descendant tables
  • Inheritance does not automatically propagate data from INSERT or COPY commands to other tables in the inheritance hierarchy.
  • All check constraints and not-null constraints on a parent table are automatically inherited by its children. Other types of constraints (unique, primary key, and foreign key constraints) are not inherited.
  • A table can inherit from more than one parent table, in which case it has the union of the columns defined by the parent tables.
  • If the same column name appears in multiple parent tables, or in both a parent table and the child's definition, then these columns are "merged" so that there is only one such column in the child table. To be merged, columns must have the same data types, else an error is raised. The merged column will have copies of all the check constraints coming from any one of the column definitions it came from, and will be marked not-null if any of them are.
  • Table inheritance is typically established when the child table is created, using the INHERITS clause of the CREATE TABLE statement. Alternatively, a table which is already defined in a compatible way can have a new parent relationship added, using the INHERIT variant of ALTER TABLE. To do this the new child table must already include columns with the same names and types as the columns of the parent. It must also include check constraints with the same names and check expressions as those of the parent. One convenient way to create a compatible table that will later be made a new child is to use the LIKE clause in CREATE TABLE. This creates a new table with the same columns as the source table. If there are any CHECK constraints defined on the source table, the INCLUDING CONSTRAINTS option to LIKE should be specified, as the new child must have constraints matching the parent to be considered compatible.
  • Similarly an inheritance link can be removed from a child using the NO INHERIT variant of ALTER TABLE. Dynamically adding and removing inheritance links like this can be useful when the inheritance relationship is being used for table partitioning.
  • A parent table cannot be dropped while any of its children remain.
  • Neither can columns or check constraints of child tables be dropped or altered if they are inherited from any parent tables.
  • If you wish to remove a table and all of its descendants, one easy way is to drop the parent table with the CASCADE option.
  • ALTER TABLE will propagate any changes in column data definitions and check constraints down the inheritance hierarchy. Again, dropping columns that are depended on by other tables is only possible when using the CASCADE option. ALTER TABLE follows the same rules for duplicate column merging and rejection that apply during CREATE TABLE.
  • Caveats: http://www.postgresql.org/docs/8.4/interactive/ddl-inherit.html —> Caveats
    • More generally, note that not all SQL commands are able to work on inheritance hierarchies.

Partitioning:

  • PostgreSQL supports basic table partitioning. Partitioning refers to splitting what is logically one large table into smaller physical pieces.
  • Currently, PostgreSQL supports partitioning via table inheritance. Each partition must be created as a child table of a single parent table. The parent table itself is normally empty; it exists just to represent the entire data set.
  • The following forms of partitioning can be implemented in PostgreSQL:
    • Range Partitioning: The table is partitioned into "ranges" defined by a key column or set of columns, with no overlap between the ranges of values assigned to different partitions. For example one might partition by date ranges, or by ranges of identifiers for particular business objects.
    • List Partitioning: The table is partitioned by explicitly listing which key values appear in each partition.
  • We will refer to the child tables as partitions, though they are in every way normal PostgreSQL tables.
  • Ensure that the constraint_exclusion configuration parameter is not disabled in postgresql.conf. If it is, queries will not be optimized as desired.
  • Managing Partitions: Another option that is often preferable is to remove the partition from the partitioned table but retain access to it as a table in its own right: ALTER TABLE measurement_y2006m02 NO INHERIT measurement; This allows further operations to be performed on the data before it is dropped. For example, this is often a useful time to back up the data using COPY, pg_dump, or similar tools. It might also be a useful time to aggregate data into smaller formats, perform other data manipulations, or run reports.
  • Partitioning and Constraint Exclusion: Constraint exclusion is a query optimization technique that improves performance for partitioned tables
  • Note that constraint exclusion is driven only by CHECK constraints, not by the presence of indexes. Therefore it isn't necessary to define indexes on the key columns. Whether an index needs to be created for a given partition depends on whether you expect that queries that scan the partition will generally scan a large part of the partition or just a small part. An index will be helpful in the latter case but not the former.
  • Alternative Partitioning Methods: A rule has significantly more overhead than a trigger, but the overhead is paid once per query rather than once per row, so this method might be advantageous for bulk-insert situations. In most cases, however, the trigger method will offer better performance.
  • Another disadvantage of the rule approach is that there is no simple way to force an error if the set of rules doesn't cover the insertion date; the data will silently go into the master table instead

Dependency Tracking:

  • When you create complex database structures involving many tables with foreign key constraints, views, triggers, functions, etc. you implicitly create a net of dependencies between the objects.
  • To ensure the integrity of the entire database structure, PostgreSQL makes sure that you cannot drop objects that other objects still depend on.
  • Data is conceptually inserted one row at a time. Of course you can also insert more than one row, but there is no way to insert less than one row. Even if you know only some column values, a complete row must be created.
  • The modification of data that is already in the database is referred to as updating.
  • You can update individual rows, all the rows in a table, or a subset of all rows.
  • Each column can be updated separately; the other columns are not affected.
  • It is not an error to attempt an update that does not match any rows.
  • You can update more than one column in an UPDATE command by listing more than one assignment in the SET clause. For example: UPDATE mytable SET a = 5, b = 3, c = 1 WHERE a > 0;
  • Just as adding data is only possible in whole rows, you can only remove entire rows from a table.

Queries:

  • The process of retrieving or the command to retrieve data from a database is called a query. In SQL the SELECT command is used to specify queries. The general syntax of the SELECT command is [WITH with_queries] SELECT select_list FROM table_expression [sort_specification]
  • The method of retrieval depends on the client application. For example, the psql program will display an ASCII-art table on the screen, while client libraries will offer functions to extract individual values from the query result
  • In strict SQL, GROUP BY can only group by columns of the source table but PostgreSQL extends this to also allow GROUP BY to group by columns in the select list. Grouping by value expressions instead of simple column names is also allowed.
  • As shown in the previous section, the table expression in the SELECT command constructs an intermediate virtual table by possibly combining tables, views, eliminating rows, grouping, etc. This table is finally passed on to processing by the select list. The select list determines which columns of the intermediate table are actually output.
  • Column Labels: For protection against possible future keyword additions, it is recommended that you always either write AS or double-quote the output column name.
  • After the select list has been processed, the result table can optionally be subject to the elimination of duplicate rows. The DISTINCT key word is written directly after SELECT to specify this: SELECT DISTINCT select_list ... Obviously, two rows are considered distinct if they differ in at least one column value. Null values are considered equal in this comparison.
  • Combining Queries: In order to calculate the union, intersection, or difference of two queries, the two queries must be "union compatible", which means that they return the same number of columns and the corresponding columns have compatible data types
  • WITH provides a way to write subqueries for use in a larger SELECT query. The subqueries can be thought of as defining temporary tables that exist just for this query.

Numeric Types:

  • Specifying NUMERIC without any precision or scale creates a column in which numeric values of any precision and scale can be stored, up to the implementation limit on precision. The types decimal and numeric are equivalent. Both types are part of the SQL standard.
  • In addition to ordinary numeric values, the numeric type allows the special value NaN, meaning "not-a-number". Any operation on NaN yields another NaN.
  • In most implementations of the "not-a-number" concept, NaN is not considered equal to any other numeric value (including NaN). In order to allow numeric values to be sorted and used in tree-based indexes, PostgreSQL treats NaN values as equal, and greater than all non-NaN values.
  • In addition to ordinary numeric values, the floating-point types have several special values: Infinity, Infinity, NaN

Character Types:

  • There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used instead.

Binary Data Types:

  • A binary string is a sequence of octets (or bytes). Binary strings are distinguished from character strings in two ways: First, binary strings specifically allow storing octets of value zero and other "non-printable" octets (usually, octets outside the range 32 to 126). Character strings disallow zero octets, and also disallow any other octet values and sequences of octet values that are invalid according to the database's selected character set encoding. Second, operations on binary strings process the actual bytes, whereas the processing of character strings depends on locale settings. In short, binary strings are appropriate for storing data that the programmer thinks of as "raw bytes", whereas character strings are appropriate for storing text.

Time Zones:

  • To address these difficulties, we recommend using date/time types that contain both date and time when using time zones. We do not recommend using the type time with time zone (though it is supported by PostgreSQL for legacy applications and for compliance with the SQL standard). PostgreSQL assumes your local time zone for any type containing only date or time.
  • All timezone-aware dates and times are stored internally in UTC. They are converted to local time in the zone specified by the timezone configuration parameter before being displayed to the client.

Boolean Types:

  • boolean values are output using the letters t and f
  • boolean uses 1 byte of storage.

Enum:

  • Enum labels are case sensitive, so 'happy' is not the same as 'HAPPY
  • An enum value occupies four bytes on disk. The length of an enum value's textual label is limited by the NAMEDATALEN setting compiled into PostgreSQL; in standard builds this means at most 63 bytes.

Arrays:

  • Arrays are not sets; searching for specific array elements can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale better for a large number of elements.

Composite Types:

  • A composite type represents the structure of a row or record; it is essentially just a list of field names and their data types. PostgreSQL allows composite types to be used in many of the same ways that simple types can be used. For example, a column of a table can be declared to be of a composite type.
  • Note: Remember that what you write in an SQL command will first be interpreted as a string literal, and then as a composite. This doubles the number of backslashes you need (assuming escape string syntax is used). For example, to insert a text field containing a double quote and a backslash in a composite value, you'd need to write: INSERT ... VALUES (E'("\\"\\\\")'); The string-literal processor removes one level of backslashes, so that what arrives at the composite-value parser looks like ("\"\\"). In turn, the string fed to the text data type's input routine becomes "\. (If we were working with a data type whose input routine also treated backslashes specially, bytea for example, we might need as many as eight backslashes in the command to get one backslash into the stored composite field.) Dollar quoting (see Section 4.1.2.4) can be used to avoid the need to double backslashes.
  • Tip: The ROW constructor syntax is usually easier to work with than the composite-literal syntax when writing composite values in SQL commands. In ROW, individual field values are written the same way they would be written when not members of a composite.

Object Identifier Types:

  • Object identifiers (OIDs) are used internally by PostgreSQL as primary keys for various system tables. OIDs are not added to user-created tables, unless WITH OIDS is specified when the table is created, or the default_with_oids configuration variable is enabled.
  • There are also several alias types for oid: regproc, regprocedure, regoper, regoperator, regclass, regtype, regconfig, and regdictionary. Table 8-22 shows an overview.
  • The oid type is currently implemented as an unsigned four-byte integer. Therefore, it is not large enough to provide database-wide uniqueness in large databases, or even in large individual tables. So, using a user-created table's OID column as a primary key is discouraged. OIDs are best used only for references to system tables.

Functions and Operators:

  • Do not write expression = NULL because NULL is not "equal to" NULL. (The null value represents an unknown value, and it is not known whether two unknown values are equal.) This behavior conforms to the SQL standard.
    • Some applications might expect that expression = NULL returns true if expression evaluates to the null value. It is highly recommended that these applications be modified to comply with the SQL standard. However, if that cannot be done the transform_null_equals configuration variable is available. If it is enabled, PostgreSQL will convert x = NULL clauses to x IS NULL.
  • If the expression is row-valued, then IS NULL is true when the row expression itself is null or when all the row's fields are null, while IS NOT NULL is true when the row expression itself is non-null and all the row's fields are non-null. Because of this behavior, IS NULL and IS NOT NULL do not always return inverse results for row-valued expressions, i.e., a row-valued expression that contains both NULL and non-null values will return false for both tests. This definition conforms to the SQL standard, and is a change from the inconsistent behavior exhibited by PostgreSQL versions prior to 8.2.
  • A null input is treated as the logical value "unknown".

Index:

  • To remove an index, use the DROP INDEX command. Indexes can be added to and removed from tables at any time.
  • Once an index is created, no further intervention is required: the system will update the index when the table is modified, and it will use the index in queries when it thinks doing so would be more efficient than a sequential table scan.
  • Indexes can also benefit UPDATE and DELETE commands with search conditions. Indexes can moreover be used in join searches. Thus, an index defined on a column that is part of a join condition can also significantly speed up queries with joins.
  • Creating an index on a large table can take a long time. By default, PostgreSQL allows reads (selects) to occur on the table in parallel with index creation, but writes (INSERTs, UPDATEs, DELETEs) are blocked until the index build is finished.
  • After an index is created, the system has to keep it synchronized with the table. This adds overhead to data manipulation operations. Therefore indexes that are seldom or never used in queries should be removed.
  • unique indexes: Indexes can also be used to enforce uniqueness of a column's value, or the uniqueness of the combined values of more than one column. The preferred way to add a unique constraint to a table is ALTER TABLE ... ADD CONSTRAINT. The use of indexes to enforce unique constraints could be considered an implementation detail that should not be accessed directly. One should, however, be aware that there's no need to manually create indexes on unique columns; doing so would just duplicate the automatically-created index.
  • http://www.postgresql.org/docs/8.4/interactive/locking-indexes.html

Full Text Search:

  • A lexeme is a string, just like a token, but it has been normalized so that different forms of the same word are made alike. For example, normalization almost always includes folding upper-case letters to lower-case, and often involves removal of suffixes (such as s or es in English).
  • A text search configuration specifies all options necessary to transform a document into a tsvector: the parser to use to break text into tokens, and the dictionaries to use to transform each token into a lexeme.
  • A data type tsvector is provided for storing preprocessed documents, along with a type tsquery for representing processed queries
  • Full text searches can be accelerated using indexes
  • What Is a Document? A document is the unit of searching in a full text search system; for example, a magazine article or email message
  • Ispell Dictionary; The Ispell dictionary template supports morphological dictionaries, which can normalize many different linguistic forms of a word into the same lexeme. For example, an English Ispell dictionary can match all declensions and conjugations of the search term bank, e.g., banking, banked, banks, banks', and bank's. The standard PostgreSQL distribution does not include any Ispell configuration files. Dictionaries for a large number of languages are available from Ispell.
  • GiST and GIN Index Types: There are two kinds of indexes that can be used to speed up full text searches. Note that indexes are not mandatory for full text searching, but in cases where a column is searched on a regular basis, an index is usually desirable.
  • limitations for 8.4 are http://www.postgresql.org/docs/8.4/interactive/textsearch-limitations.html

Concurrency Control:

  • http://www.postgresql.org/docs/8.4/interactive/mvcc-intro.html
  • The main advantage of using the MVCC model of concurrency control rather than locking is that in MVCC locks acquired for querying (reading) data do not conflict with locks acquired for writing data, and so reading never blocks writing and writing never blocks reading.
  • Table- and row-level locking facilities are also available in PostgreSQL for applications that cannot adapt easily to MVCC behavior. However, proper use of MVCC will generally provide better performance than locks. In addition, application-defined advisory locks provide a mechanism for acquiring locks that are not tied to a single transaction.

Locales:

  • Client applications that handle server-side errors by parsing the text of the error message will obviously have problems when the server's messages are in a different language. Authors of such applications are advised to make use of the error code scheme instead. http://www.postgresql.org/docs/8.4/interactive/locale.html

Miscellaneous

Samples

sa@wks:~$ ll /usr/share/postgresql/8.4/
total 832K
drwxr-xr-x 2 root root 4.0K 2009-07-17 16:50 contrib
-rw-r--r-- 1 root root  74K 2009-07-11 18:24 conversion_create.sql
-rw-r--r-- 1 root root  92K 2009-07-11 18:24 information_schema.sql
drwxr-xr-x 4 root root 4.0K 2009-07-16 18:41 man
-rw-r--r-- 1 root root 3.5K 2009-07-11 18:24 pg_hba.conf.sample
-rw-r--r-- 1 root root 1.6K 2009-07-11 18:24 pg_ident.conf.sample
-rw-r--r-- 1 root root  643 2009-07-11 18:25 pg_service.conf.sample
-rw-r--r-- 1 root root 438K 2009-07-11 18:24 postgres.bki
-rw-r--r-- 1 root root  79K 2009-07-11 18:24 postgres.description
-rw-r--r-- 1 root root  17K 2009-07-11 18:24 postgresql.conf.sample
-rw-r--r-- 1 root root   40 2009-07-11 18:24 postgres.shdescription
-rw-r--r-- 1 root root  211 2009-07-11 18:25 psqlrc.sample
-rw-r--r-- 1 root root 2.9K 2009-07-11 18:24 recovery.conf.sample
-rw-r--r-- 1 root root  14K 2009-07-11 18:25 snowball_create.sql
-rw-r--r-- 1 root root  32K 2009-07-11 18:24 sql_features.txt
-rw-r--r-- 1 root root  18K 2009-07-11 18:24 system_views.sql
drwxr-xr-x 2 root root 4.0K 2009-07-16 18:41 timezonesets
drwxr-xr-x 2 root root 4.0K 2009-07-16 18:41 tsearch_data
sa@wks:~$

Compilation Settings

sa@wks:~$ pg_config
BINDIR = /usr/lib/postgresql/8.4/bin
DOCDIR = /usr/share/doc/postgresql
HTMLDIR = /usr/share/doc/postgresql
INCLUDEDIR = /usr/include/postgresql
PKGINCLUDEDIR = /usr/include/postgresql
INCLUDEDIR-SERVER = /usr/include/postgresql/8.4/server
LIBDIR = /usr/lib
PKGLIBDIR = /usr/lib/postgresql/8.4/lib
LOCALEDIR = /usr/share/locale
MANDIR = /usr/share/postgresql/8.4/man
SHAREDIR = /usr/share/postgresql/8.4
SYSCONFDIR = /etc/postgresql-common
PGXS = /usr/lib/postgresql/8.4/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=/usr/include' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--libexecdir=/usr/lib/postgresql-8.4' '--disable-maintainer-mode' '--disable-dependency-tracking' '--srcdir=.' '--mandir=/usr/share/postgresql/8.4/man' '--with-docdir=/usr/share/doc/postgresql-doc-8.4' '--sysconfdir=/etc/postgresql-common' '--datadir=/usr/share/postgresql/8.4' '--bindir=/usr/lib/postgresql/8.4/bin' '--includedir=/usr/include/postgresql/' '--enable-nls' '--enable-integer-datetimes' '--enable-thread-safety' '--enable-debug' '--disable-rpath' '--with-tcl' '--with-perl' '--with-python' '--with-pam' '--with-krb5' '--with-gssapi' '--with-openssl' '--with-libxml' '--with-libxslt' '--with-ldap' '--with-ossp-uuid' '--with-gnu-ld' '--with-tclconfig=/usr/lib/tcl8.5' '--with-tkconfig=/usr/lib/tk8.5' '--with-includes=/usr/include/tcl8.5' '--with-system-tzdata=/usr/share/zoneinfo' '--with-pgport=5432' 'CFLAGS=-g -O2 -g -Wall -O2 -fPIC' 'LDFLAGS= -Wl,--as-needed' 'build_alias=x86_64-linux-gnu' 'CC=cc' 'CPPFLAGS='
CC = cc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/tcl8.5
CFLAGS = -g -O2 -g -Wall -O2 -fPIC -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -g
CFLAGS_SL = -fpic
LDFLAGS = -Wl,--as-needed -Wl,--as-needed
LDFLAGS_SL =
LIBS = -lpgport -lxslt -lxml2 -lpam -lssl -lcrypto -lkrb5 -lcom_err -lgssapi_krb5 -lz -lreadline -lcrypt -ldl -lm
VERSION = PostgreSQL 8.4.0
sa@wks:~$

Files/Directories

  • http://www.postgresql.org/docs/8.4/interactive/storage-file-layout.html
  • See /etc/postgresql/<version>/main/postgresql.conf —> FILE LOCATIONS
  • from man 7 postgresql-common: ... This program is run only as a link to names which correspond to PostgreSQL programs in /usr/lib/postgresql/<version>/bin
wks:/var/lib/monkeysphere# la /usr/lib/postgresql/8.4/bin/
total 6884
drwxr-xr-x 2 root root    4096 2009-07-17 16:50 .
drwxr-xr-x 4 root root    4096 2009-07-16 18:41 ..
-rwxr-xr-x 1 root root   60456 2009-07-11 18:26 clusterdb
-rwxr-xr-x 1 root root   60520 2009-07-11 18:26 createdb
-rwxr-xr-x 1 root root   56216 2009-07-11 18:26 createlang
-rwxr-xr-x 1 root root   60808 2009-07-11 18:26 createuser
-rwxr-xr-x 1 root root   56232 2009-07-11 18:26 dropdb
-rwxr-xr-x 1 root root   56216 2009-07-11 18:26 droplang
-rwxr-xr-x 1 root root   56232 2009-07-11 18:26 dropuser
-rwxr-xr-x 1 root root   80160 2009-07-11 18:26 initdb
-rwxr-xr-x 1 root root   22808 2009-07-11 18:26 oid2name
-rwxr-xr-x 1 root root   43608 2009-07-11 18:26 pgbench
-rwxr-xr-x 1 root root   22808 2009-07-11 18:26 pg_controldata
-rwxr-xr-x 1 root root   35608 2009-07-11 18:26 pg_ctl
-rwxr-xr-x 1 root root  281752 2009-07-11 18:26 pg_dump
-rwxr-xr-x 1 root root   77512 2009-07-11 18:26 pg_dumpall
-rwxr-xr-x 1 root root   31088 2009-07-11 18:26 pg_resetxlog
-rwxr-xr-x 1 root root  130088 2009-07-11 18:26 pg_restore
-rwxr-xr-x 1 root root   22768 2009-07-11 18:26 pg_standby
-rwxr-xr-x 1 root root 5305056 2009-07-11 18:26 postgres
lrwxrwxrwx 1 root root       8 2009-07-16 18:41 postmaster -> postgres
-rwxr-xr-x 1 root root  369040 2009-07-11 18:26 psql
-rwxr-xr-x 1 root root   60488 2009-07-11 18:26 reindexdb
-rwxr-xr-x 1 root root   31672 2009-07-11 18:26 vacuumdb
-rwxr-xr-x 1 root root   22784 2009-07-11 18:26 vacuumlo
wks:/var/lib/monkeysphere#

Datadir

wks:~# pg_lsclusters
Version Cluster   Port Status Owner    Data directory                     Log file
8.4     main      5432 online postgres /var/lib/postgresql/8.4/main       /var/log/postgresql/postgresql-8.4-main.log
wks:~# la /var/lib/postgresql/8.4/main/
total 56
drwx------ 11 postgres postgres 4096 2009-07-25 07:45 .
drwxr-xr-x  3 root     root     4096 2009-07-16 18:41 ..
drwx------  6 postgres postgres 4096 2009-07-17 15:37 base
drwx------  2 postgres postgres 4096 2009-07-25 07:45 global
drwx------  2 postgres postgres 4096 2009-07-16 18:41 pg_clog
drwx------  4 postgres postgres 4096 2009-07-16 18:41 pg_multixact
drwx------  2 postgres postgres 4096 2009-07-29 13:36 pg_stat_tmp
drwx------  2 postgres postgres 4096 2009-07-16 18:41 pg_subtrans
drwx------  2 postgres postgres 4096 2009-07-16 18:41 pg_tblspc
drwx------  2 postgres postgres 4096 2009-07-16 18:41 pg_twophase
-rw-------  1 postgres postgres    4 2009-07-16 18:41 PG_VERSION
drwx------  3 postgres postgres 4096 2009-07-16 18:41 pg_xlog
-rw-------  1 postgres postgres  133 2009-07-25 07:45 postmaster.opts
-rw-------  1 postgres postgres   54 2009-07-25 07:45 postmaster.pid
lrwxrwxrwx  1 root     root       36 2009-07-16 18:42 server.crt -> /etc/ssl/certs/ssl-cert-snakeoil.pem
lrwxrwxrwx  1 root     root       38 2009-07-16 18:42 server.key -> /etc/ssl/private/ssl-cert-snakeoil.key
wks:~#

SSL

The PostgreSQL server packages support SSL, which provides encrypted and authenticated network communication. SSL should be used if you have an untrusted network between a database server and a client and these exchange security sensitive data like passwords or confidential database contents.

postgresql-common makes use of the snakeoil SSL certificate that is generated by the ssl-cert package, so that SSL works out of the box. When a cluster is created with pg_createcluster, SSL support will automatically be enabled in that cluster, and symlinks server.key and server.crt are created in the data directory which point to the snakeoil certificate in /etc/ssl; therefore all clusters use the same certificate by default. Of course you can replace these symlinks by cluster-specific certificates.

In addition, if /etc/postgresql-common/root.crt exists, a symbolic link root.crt will be created in the data directory. /etc/postgresql-common/root.crt is a dummy file by default, so that client-side authentication is not performed. To enable it, you should add some root certificates to it. A reasonable choice is to just symlink the file to /etc/ssl/certs/ssl-cert-snakeoil.pem; in this case, client certificates need to be signed by the snakeoil certificate, which might be desirable in many cases. See /usr/share/doc/postgresql-doc-<version>/html/ssl-tcp.html

Non-local Connections

from pg_hba.conf

# If you want to allow non-local connections, you need to add more
# "host" records. In that case you will also need to make PostgreSQL listen
# on a non-local interface via the listen_addresses configuration parameter,
# or via the -i or -h command line switches.

Python

  • http://www.sqlalchemy.org/
  • psycopg
  • py-postgresql

Cluster / Replication / High Availability / Load Balancing

  • file:///usr/share/doc/postgresql-common/architecture.html
  • http://www.postgres-r.org/

Additional Supplied Modules

  • http://www.postgresql.org/docs/8.4/interactive/contrib.html

Client Interfaces

  • http://www.postgresql.org/docs/8.4/interactive/external-interfaces.html
    • psycopg is a PostgreSQL database adapter for the Python programming language; http://www.initd.org

Procedural Languages

  • http://www.postgresql.org/docs/8.4/interactive/external-pl.html

Python

  • http://python.projects.postgresql.org/
  • http://wiki.github.com/jwp/py-postgresql

Extensions

  • http://www.postgresql.org/docs/8.4/interactive/external-extensions.html

Utilities

GUI

  • http://wiki.postgresql.org/wiki/Community_Guide_to_PostgreSQL_GUI_Tools
Creative Commons License
The content of this site is licensed under Creative Commons Attribution-Share Alike 3.0 License.