npm Blog (Archive)

The npm blog has been discontinued.

Updates from the npm team are now published on the GitHub Blog and the GitHub Changelog.

Publishing problem 2014-02-18

Today for ~35 minutes from 4.14pm to 4.50pm, if you were attempting to publish an updated version of an existing package, there is a 1 in 3 chance you would have seen an error something like this:

This was caused by a data inconsistency between our master CouchDB server and one of the read-only replicas. The root cause was a known bug in Couch replication (which has bitten us before). Per our previous blog post, we had set up alerting on replication status, so we were already addressing the issue when the first user reported the problem.

We resolved user-facing errors by taking the affected read-only replica out of rotation, and then 15 minutes later we permanently resolved the issue by replacing the version of Pound used on our write master, which allowed replication to resume. We then returned the read-only replica to rotation.

We believe only a handful of users saw this error in production, but we can still do better. We are adding replication status to the health checks used by our CDN, Fastly, which means if a read replica falls behind in future it will be automatically removed from rotation after a few minutes. However, replicas falling out of rotation should be pretty rare now that we have fixed the issue with Pound.