Thursday, May 21, 2015

[Postgres Toolkit] pt-table-usage/pt-index-usage commands

In this entry, I would like to explain how to use pt-table-usage and pt-index-usage commands in the Postgres Toolkit which I introduced in the previous entry.

If you have never heard about Postgres Toolkit, please visit the previous entry.
pt-table-usage and pt-index-usage are the commands which can be used to obtain usages of the tables/indexes of PostgreSQL.

If you are already familiar with PostgreSQL, you may know that several system views and system tables need to be combined in order to obtain the PostgreSQL statistics and usages.

So, obtaining those information is one of the essential talks for DBA, but it is one of the complicated and bothersome tasks.

pt-table-usage and pt-index-usage are designed to allow DBA to deal with those tasks in single command.

■pt-table-usage command


pt-table-usage command can be used to obtain the information about OID, owner, schema and table names. Number of blocks, access statistics and table space name which holds the table as well.

Here is an example that shows usage of the tables in the postgres database.

[snaga@devvm04 tmp]$ pt-table-usage -d postgres
+-------+-------+--------+------------------+-------+--------+----------+---------+--------+-------+--------+--------+---------------------+---------------------+------------+
|  OID  | OWNER | SCHEMA |      TABLE       |  BLKS |  SCAN  |  T_READ  |  T_INS  | T_UPD  | T_DEL | B_READ | B_HIT  |       VACUUMED      |       ANALYZED      | TABLESPACE |
+-------+-------+--------+------------------+-------+--------+----------+---------+--------+-------+--------+--------+---------------------+---------------------+------------+
| 27457 | snaga | public | pgbench_accounts | 16690 |     22 | 22011215 | 1000000 | 140965 |     0 | 598949 | 514771 | 2015-05-02 18:00:53 | 2015-05-02 21:21:33 | pg_default |
| 27460 | snaga | public | pgbench_branches |     8 | 183611 |  1405430 |      10 | 140959 |     0 |    106 | 673490 | 2015-05-03 16:36:48 | 2015-05-03 16:36:48 | pg_default |
| 27451 | snaga | public | pgbench_history  |    71 |      3 |   205759 |  140956 |      0 |     0 |   7259 | 143541 | 2015-05-02 18:00:47 | 2015-05-03 16:36:48 | spc1       |
| 27454 | snaga | public | pgbench_tellers  |    12 | 118516 | 11296500 |     100 | 140961 |     0 |    122 | 381979 | 2015-05-03 16:36:48 | 2015-05-03 16:36:48 | pg_default |
+-------+-------+--------+------------------+-------+--------+----------+---------+--------+-------+--------+--------+---------------------+---------------------+------------+
[snaga@devvm04 tmp]$

For example, the object size can be calculated by multiplying the number of blocks by the block size (8kB).

Also access statistics of tuples in the table, table scans (SCAN), tuple reads (T_READ), tuple inserts (T_INS), tuple updates (T_UPD) and tuple delete (T_DEL), can be obtained.

In addition to that, access statistics of blocks in the table, block reads (B_READ) and block hits (B_HIT), can be obtained. The block hits here is the number of read count from the shared buffer, not physical disk.

As I explain above, by using pt-table-usage, you can check all access statistics and usage of your tables at a glance with single command.

The command also shows the timestamp of the latest VACUUM/ANALYZE, which are necessary for maintaining databases.

"VACUUMED" column shows the latest timestamp of VACUUM (VACUUM command or auto vacuum). And "ANALYZED" column shows the latest timestamp of ANALYZE (ANALYZE command or auto analyze). Those fields will be empty if VACUUM/ANALYZE has not been executed ever.

The last column "TABLESPACE" shows name of the tablespace where the table is held, and "pg_default" would be shown if the table is held in the default tablespace. Seeing the tablespace name is one of the bothersome tasks without tool or script.

If you have lots of tables in single database, you can see only the selected tables by specifying owner and/or schema names. Also you can see only specific tables by specifying table names as you need.

[snaga@devvm04 tmp]$ pt-table-usage --help

Usage: pt-table-usage [option...]

Options:
    -h, --host=HOSTNAME        Host name of the postgres server
    -p, --port=PORT            Port number of the postgres server
    -U, --username=USERNAME    User name to connect
    -d, --dbname=DBNAME        Database name to connect

    -o, --owner=STRING         Database owner
    -n, --schema=STRING        Database schema
    -t, --table=STRING         Table name

    --help                     Print this help.

[snaga@devvm04 tmp]$

■pt-index-usage command


pt-index-usage command is similar to pt-table-usage command, and it can be used to obtain the index information, including object name and access statistics.

[snaga@devvm04 tmp]$ pt-index-usage -d postgres
+-------+-------+--------+------------------+-----------------------+------+--------+--------+--------+--------+--------+--------+------------+
|  OID  | OWNER | SCHEMA |      TABLE       |         INDEX         | BLKS |  SCAN  | T_READ | T_FTCH | B_READ | B_HIT  | STATUS | TABLESPACE |
+-------+-------+--------+------------------+-----------------------+------+--------+--------+--------+--------+--------+--------+------------+
| 27468 | snaga | public | pgbench_accounts | pgbench_accounts_pkey | 2745 | 281928 | 329225 | 281928 | 121564 | 867172 |        | pg_default |
| 27464 | snaga | public | pgbench_branches | pgbench_branches_pkey |    2 |    441 |   1202 |    441 |     39 |    626 |        | pg_default |
| 27466 | snaga | public | pgbench_tellers  | pgbench_tellers_pkey  |    2 |  28003 |  28802 |  28003 |     34 |  28489 |        | pg_default |
+-------+-------+--------+------------------+-----------------------+------+--------+--------+--------+--------+--------+--------+------------+
[snaga@devvm04 tmp]$

pt-index-usage shows both table and index names, and the status (STATUS) of the index.

This STATUS column shows the status of the index, such as "INVALID" and/or "NOTREADY". These statuses are obtained from indisvalid and indisready columns in pg_index view. For more details, please refer the PostgreSQL official manual.
pt-index-usage command can select indexes in the database to be shown by specifying schema name, owner and table names. So, if you have lots of indexes in single database, you can use those options to select specific indexes.

[snaga@devvm04 tmp]$ pt-index-usage --help

Usage: pt-index-usage [option...]

Options:
    -h, --host=HOSTNAME        Host name of the postgres server
    -p, --port=PORT            Port number of the postgres server
    -U, --username=USERNAME    User name to connect
    -d, --dbname=DBNAME        Database name to connect

    -o, --owner=STRING         Database owner
    -n, --schema=STRING        Database schema
    -t, --table=STRING         Table name
    -i, --index=STRING         Index name

    --help                     Print this help.

[snaga@devvm04 tmp]$

■Wrap-up


Obtaining the status and statistics of tables and indexes is one of the essential tasks for DBA, but it's too bothersome without any supportive tool or script.

So, I hope the PostgreSQL DBA to improve productivity and quality of the DBA tasks with handy tools, such as pt-table-usage and pt-index-usage.

2 comments:

Daniel Cristian said...

I liked your idea, but why not create a tool like ip?

$ pg table usage [database]
... info about all tables in database

$ pg index usage [database] [schema] [table]
... info about all index on specific table

:)

Satoshi Nagayasu said...

Yeah, I agree that it can be implemented and seem to be cool.

But once it is implemented, every DBA has to type lots of helps, like "foo --help" and "foo bar --help" a lot. :)