Wednesday, March 29, 2017

Change autovacuum_freeze_max_age without a restart (sort of…)

This blog post is kind of involved, so I'm giving a short version at the top, with some background for beginners at the bottom. The middle section explains the motivation for using this hack in the first place.

Short version

I came up with a useful and/or terrible hack the other day: setting autovacuum_freeze_max_age as a storage parameter. I definitely don't recommend doing this routinely, but it unblocked us during a critical maintenance window.

    ALTER TABLE my_table SET (autovacuum_freeze_max_age = 300000000);

Don't forget to set it back when you're done! Otherwise you will incur an even longer autovacuum freeze, probably when you least expect it.

Medium-length version

My colleague Kacey Holston was in the midst of upgrading a client from PostgreSQL 9.4 to 9.6, using Slony for minimal downtime. As planned, the client took a few minutes of downtime so Kacey could do. She was ready to reverse the direction of replication (so the 9.6 server was replicating to the 9.4 server, in case our client to fall back to it). But there was an autovacuum freeze (a.k.a. "autovacuum (to prevent wraparound)" that was keeping Slony from getting the brief ExclusiveLock it needed.

She knew from experience that this table takes three hours to freeze. But the client had only minutes of downtime scheduled – that was the whole point of using Slony!

If only it were possible to change autovacuum_freeze_max_age on the fly; then we could bump it up to stop that autovacuum. Unfortunately, you have to restart the database in order to change it. Except…

You can set it on a per-table basis, as follows. This took effect immediately:

    ALTER TABLE my_table SET (autovacuum_freeze_max_age = 300000000);

If you do this, don't forget to set it back to the normal value (by default, 200000000) once you're done! Otherwise autovacuum freezes on this table will come around less often and take even longer.

Background for beginners:

When the oldest transaction ID on any row in a table is more than autovacuum_freeze_max_age old (200 million transaction old, by default), then an "autovacuum (to prevent wraparound)" process runs on the table to reclaim old transaction IDs. For large tables, this can be a problem, because it can generate a lot of CPU and I/O activity during busy hours. Also, as we saw here, it locks the table (in a SHARE UPDATE EXCLUSIVE mode); this blocks DDL changes (a.k.a. migrations).

For more-technical background, see the official PostgreSQL docs on how transaction IDs work, and for a friendlier intro, see this series of blog posts.

Friday, January 29, 2016

Language thoughts

All the languages I've used heavily have one primitive data structure you fall into using as a golden hammer:

  • C: arrays
  • Lisp: lists
  • Perl: hashes
  • JS: "objects"
Et cetera. Python is borderline; it's tempting to abuse dicts, but there's a bit less friction to using proper objects than with Perl.

Saturday, January 16, 2016

Blocked by rdsadmin

One of our clients on RDS had a VACUUM FREEZE that was hanging for a long time. "I bet it's waiting on a lock," I thought. Yup, but the curious part is what it was waiting on: a mysterious process run by the rdsadmin role (see RECORD 2, below):
SELECT pg_stat_activity, pg_locks.mode
  FROM pg_stat_activity
  JOIN pg_locks USING (pid)
  JOIN pg_class ON pg_locks.relation = pg_class.oid
 WHERE pg_class.relname = 'users'
   AND pg_locks.mode IN ('ShareUpdateExclusiveLock', 'ShareLock', 'ShareRowExclusiveLock', 'ExclusiveLock', 'AccessExclusiveLock');

-[ RECORD 1 ]----+------------------------------
datid            | 1234
datname          | my_database
pid              | 14641
usesysid         | 16396
usename          | postgres
application_name | psql
client_addr      | 198.162.0.0
client_hostname  |
client_port      | 5430
backend_start    | 2016-01-15 22:05:06.161987+00
xact_start       | 2016-01-15 22:14:39.301425+00
query_start      | 2016-01-15 22:14:39.301425+00
state_change     | 2016-01-15 22:14:39.301429+00
waiting          | t
state            | active
query            | VACUUM FREEZE verbose users;
mode             | ShareUpdateExclusiveLock
-[ RECORD 2 ]----+------------------------------
datid            | 1234
datname          | my_database
pid              | 22328
usesysid         | 10
usename          | rdsadmin
application_name |
client_addr      |
client_hostname  |
client_port      |
backend_start    |
xact_start       |
query_start      |
state_change     |
waiting          |
state            |
query            | 
mode             | ShareUpdateExclusiveLock
Further examination showed that this process was locking only the users table (and its indexes):
SELECT locktype, relation::regclass AS tablename
  FROM pg_locks
  JOIN pg_stat_activity USING (pid)
 WHERE pid = 22328;

locktype    |                  tablename
------------+---------------------------------------------
 relation   | user_index_a
 relation   | user_index_b
 relation   | user_index_c
 relation   | users_pkey
 virtualxid |
 relation   | users
(13 rows)

Has anyone else seen such a process? I'm curious as to what it is. My current best guess is a vacuum We opened a ticket with Amazon to ask them, so I'll update this post when Amazon replies.

EDIT 2016-01-17: the above is a braino for "an autovacuum." An Amazon rep wrote back the next day to say that it was, indeed, an autovacuum process, and included some vacuum-tuning tips. So good on them, although it's unfortunate that RDS's privilege system doesn't allow you to see the pg_stat_activity.query field when the rdsadmin role is working on a table you own.

Tuesday, October 6, 2015

Reflections on (Humans) Trusting (Humans') Trust

One thing I've learned from consulting is that you should trust a competent person's feels about a technology, even if they can't immediately come up with arguments to support them. Seemingly vague statements like "Oh, I hate the Foo webserver, it's so flaky" or "Bar is a really annoying language to work in" or "the Baz API just doesn't feel very user-friendly" are actually the distilled product of many hard-won experiences that the speaker may not be able to call to mind individually. I observed this last night at Larry Wall's Perl 6 talk, specifically the Q&A: most of the time he came up with great examples, but sometimes he just had to talk about his own language in broad, esthetic terms. And I'm no Larry Wall, but today I found myself in a similar position when advising a client about which of two virtualization platforms to use. Of course, this applies to people speaking from experience and not from prejudice; you have to know how good someone's judgment is in order to transitively-trust them to have good opinions. That's one soft skill that gives you a major edge in technology. But, contrary to stereotype, most technical people are good at making these kinds of meta-judgments.

Tuesday, June 9, 2015

Baidu Bot Blues

A client complained of mysterious connection spikes. "We think we're getting hit by bots, but we're not sure," they said; "Can you look into it?"

So I edited postgresql.conf, turned on log_connections and log_disconnections, did a 'pg_ctl -D MY_DATA_DIR reload', and waited (in my case I also set log_min_duration_statment = 0, but I don't recommend that unless you can watch to make sure the logs aren't filling the log partition — and make sure you're not going to hammer a slow log device).

A while later, I did

ls -t data/pg_log/*.csv | head -5 | xargs grep -c connection

(That says "List log files in order from most recent to least recent, take the five most recent, and count the lines in each file containing 'connection'.")

I got lucky. A spike was obviously in progress; the current log file showed a count two orders of magnitude higher than the other log files'. Since the problem was in progress, no post mortem was required; I just mailed the client and asked "Do your web logs show bot traffic right now?"

The client mailed right back saying, "Yes, we see a bunch of connections from Baidu's web crawler; it's going nuts hitting our site."

Baidu is a Chinese search engine. It turns out its crawling bot is well-known for being aggressive. As that link explains, the bot doesn't respect Crawl-delay in robots.txt (that's annoying but unsurprising, since Crawl-delay is a non-standard extension).

So if this happens to you, you have these options:

  1. Register at Baidu's site (Chinese-only) to request less-aggressive crawling.
  2. Try to tune PostgreSQL connection pooling to handle the onslaught.
  3. Block Baidu entirely. This is an extreme option — not advised if you do business in China, or if you have Chinese-speaking customers elsewhere who are reaching you through Baidu. The latter requires analytics to figure out.

Friday, May 8, 2015

syslog Considered Dangerous

Edit 2015-10-01: there's a similar problem on RDS, where logs are written to the same partition as data, which can cause problems if you don't have enough provisioned IOPS (or disk space).

A client of ours was suffering mysterious, intermittent lock pileups on a large table. To diagnose the problem, I ran our lock-logging scripts.

Querying the resulting log_transaction_locks table, I saw lots of extend locks and RowExclusiveLocks piling up behind normal UPDATEs (and INSERTs) — there were even extend locks waiting on other extend locks. This is pretty unusual; you'd expect that when one UPDATE completed, the extend lock would succeed, PostgreSQL would quickly extend the relation (in this case, the big table's TOAST relation), and queries would keep running smoothly. These were all single-row writes; they should have taken that long. However, a complicating factor was that a couple of columns in that big table were giant TEXT columns storing JSON strings. Was that the issue? The problem was unclear, and we needed more info.

At this point my colleague Christophe Pettus made the excellent suggestion to do 'strace -p' on one of the locked processes. The dump revealed lots of sendto() calls with giant strings, which he immediately recognized as syslog messages based on their formatting.

Now, this was a problem. syslog writes to /var, and /var was hanging off the root partition, which was an EBS volume (without provisioned IOPS). So each basically the entire SQL of each query was being written across the network, including those giant JSON strings, and the query itself wouldn't terminate till the write was fsync()'ed. In the meantime, extend locks and other UPDATEs piled up behind it. The PostgreSQL data directory was on a big, fast software RAID across SSD instance storage, but performance was being bottlenecked by that single EBS volume.

The client elected to turn off log_lock_waits, and the problem went away. In addition, Christophe suggested prepending a minus sign to the filename in /etc/rsyslog.d/postgres.conf, like so, which tells syslog not to flush between each log statement:

local0.* -/var/log/postgres.log

I thought that was neat: a potential fix to a huge performance problem, in one character(!)

Lessons learned:

1. If your log_destination contains syslog, be careful about filesystem layout. (If you stick to csvlog, this normally isn't a problem, because csvlogs, by default, go in pg_log/ under the PostgreSQL data dir.)

2. Logging info about a performance problem can sometimes make it worse. In this case, we decided to diagnose the pre-existing lock pileups by setting log_lock_waits = on and deadlock_timeout = 200. But, because of the syslog issue, those very settings exacerbated the lock problems. This was necessary to catch the problem, but it's something to keep in mind if you decide to use our lock-logging scripts.

Wednesday, September 17, 2014

RDS for Postgres: List of Supported Extensions

Today I learned that Amazon doesn't keep any list of extensions supported in PostgreSQL. Instead, their documentation tells you to start a psql session and run 'SHOW rds.extensions'. But that creates a chicken-and-egg situation if you have an app that needs extensions, and you're trying to decide whether to migrate.

So here's a list of extensions supported as of today, 2014-09-17 (RDS PostgreSQL 9.3.3). I'll try to keep this current.

btree_gin
btree_gist
chkpass
citext
cube
dblink
dict_int
dict_xsyn
earthdistance
fuzzystrmatch
hstore
intagg
intarray
isn
ltree
pgcrypto
pgrowlocks
pg_trgm
plperl
plpgsql
pltcl
postgis
postgis_tiger_geocoder
postgis_topology
sslinfo
tablefunc
tsearch2
unaccent
uuid-ossp
btree_gin
btree_gist
chkpass
citext
cube
dblink
dict_int
dict_xsyn
earthdistance
fuzzystrmatch
hstore
intagg
intarray
isn
ltree
pgcrypto
pgrowlocks
pg_trgm
plperl
plpgsql
pltcl
postgis
postgis_tiger_geocoder
postgis_topology
sslinfo
tablefunc
tsearch2
unaccent
uuid-ossp