RPM homepage link
Home
Up

   Q. I can't upgrade any packages, run up2date or rpm. What is wrong?

Let me state up front, and at the top. Just like the rpm -ba to rpmbuild -ba conversion, the fight is over: Upgrade!! All known and reproducable issues needing the discussion matter below are solved in the laterst production releases of RPM.

There is a man page for rpm2cpio; there are all sorts of problems thatrpm plain and simply cannot do anything about. You want to run buggy software? Not rpm's problem.


A. The RPM database sometimes gets stuck. One cause of this problem is when a process working on the RPM database is killed, which leaves (thereafter inaccurate) lock state information behind, limiting what further processes can do to the database. So, a word to the wise: make sure such a process really is stuck before killing it - if a process is using much CPU time (eg. as reported by top) it is probably still doing useful work.

(A tip o' the hat to: M A Young <m.a.young durham.ac.uk> for parts of this summarization. revised 17 Jan 2004, after a helpful mailing list thread, pointing out a prior loose use of nomenclature by RPH. All remaining errors are mine alone.)

This lock state information has to be repaired manually for rpm to eliminate what are described at 'hangs', and to permit rpm to properly function. With the recent capabilities for several programs to concurently be using, and writing, into the RPM database, RPM alone cannot no longer safely 'assume' it is safe to clear out arbitrary lock state information which it or its parent process dod not create. This therefor relies on the external judgment of the system administrator to make the determination to reset and clear lock state.

What the nature of the underlying problem is, and how to fix it, are detailed below:


First informally: When a process _is_ so killed [sometimes due to a loss of power crash, sometimes due to a third party helper application dying without cleaning up lock state information in files at exit, sometimes due to an impatient admin], it may also leave some small corruption behind in the RPM 'SleepyCat' db based database. Usually the simple
rm -f /var/lib/rpm/__db* will remove the files which hold lock state information (these files start with that distinctive "__"). As these lock state holding files are automatically re-created if missing, this is a safe approach.

It would be safest to clear these locks while still in single user mode; as a system is booting up, it is in single user mode before init can fire off any 'child' daemon processes. Checking, we can find: bash-2.05b$ nl /etc/rc.d/rc.sysinit | grep rpm 556 rm -f /var/lib/rpm/__db* bash-2.05b$ That is, in the boot-scripts, recent Red Hat releases do exactly this (here at line 556 of /etc/rc.d/rc.sysinit). The safest approach which will do this for you, automatically, is to simply verify that a single user mode script indeed will do the cleanup at boot time when no other processes are running, and just reboot. [It is all right; that uptime record you were going for is less important than a clean RPM database -- try the reboot method.]

But sometimes there are still problems. If the simply removal of the '__*' files does not clear up the problem, it is also possible that other damage exists, such corrupted linked lists, and is separately present. Proceed carefully -- take a backup FIRST, before trying to have rpm repair the database.

It is also _usually_ safe to rebuild these lists, thus: rpm -vv --rebuilddb [Note: we have added the option -vv here, so that there is visible progress and lots of detail during the rebuild process -- A worried admin is often impatient at a lack of visible progress -- this option causes the rebuild process to be quite verbose.]

The way a rpm --rebuilddb command works, it creates a temporary directory to work in at: /var/lib/, next to the 'parent' one at: /var/lib/rpm/ and attempts a rebuild. As it gets to the very bottom of a normal rebuild, it replaces the new content over top of the old files.

Obviously, if some other process has independently filled the partition containing /var/lib/, and is hung but ready to run, waiting for an inode to come free, a 'race' exists which the rebuild process may lose. At that point, if the 'other' process 'wins' and grabs the 'last' newly freed available resource, the rebuild process cannot complete succesfully and RPM database content may be lost. Ouch.

The lesson of this is to always take backups of /var/lib/rpm/ before doing a --rebuilddb [actually, to take periodic backups generally is of course always the better practice] and to take care not to rebuild where ENOSPC can occur. A quick and dirty way to take a temporary on-disk backup is:
cd /var/lib mkdir rpm-backup rsync -av ./rpm/. ./rpm-backup/. Note: there is a tar based approach below as well. Then run: cd /var/lib du | sort -n df to 'take stock' and then stop, and evaluate ip a risk exists. Inspect the process table as well for a wayward process you may have forgot about which use or access the RPM database.

During and after some rebuilds (as on Red Hat Linux 9), an unsightly but harmless warning message is produced, which may safely be ignored. The message looks like this:
[root@dhcp108 rpm]# rpm --rebuilddb
error: db4 error(16) from dbenv->remove: Device or resource busy
[root@dhcp108 rpm]#
This is scary (because the message uses the word 'error', looks like a 'hung' NFS mount error message, and did not describe itself just a 'warning'). It breaks the *nix expectation that well-behaved processes run silently when they are free of errors. Ahhh, well ...


But sometimes, unfortunately, even more substantive database repair is needed, as described below the section with Jeff Johnson's email later in this discussion.

Think of a two way linked list (we ignore indices to simplify the presentation) for RPM Package description items thus:
-> -> -> -> A B C D E <- <- <- <- If we have just a couple of broken links, we do not have any 'orphan chains' and rpm --rebuilddb can repair things: -> -> -> A B C D E <- <- <- But if substantial persistent corruption occurs, and is not repaired for a long time, we might get: -> -> A B C D E <- <- And it becomes unclear where to re-attach A and B, to the other chain. (Note: This is an analogy, and does not precisely describe the data structures, rebuild or dump processes from a formal standpoint.)



What else might cause the problem? Warren Togami has reviewed the trouble ticket trackers for the Conectiva apt for rpm package, and others (the Ximian updater, Gerald Teschl's autoupdate), and helped locate some interaction issues which are being worked on and discussed on the RH rpm-list.

There is also a reported, but not reproduceable issue involving corrupted signing Keys; if you have such a report, and can write a test case to reliably, or even frequently, cause this corruption, please file a Bugzilla and attach the test case script.

More formally: see the Bugzilla master ticket on this issue, 73097

Separately, a testing release (issued 7 Oct 2002) for an updated RPM is available at Jeff Johnson's personal ftp site ftp://people.redhat.com/jbj/test-4.1/ -- Warning: This is NOT a formal Red Hat QA approved release.

The text marked with the strikethrough above was superceded on June 2003 -- Please view: /hintskinks/repairdb-2003-06/ which summarizes the presently proper RPM version - glibc - kernel combinations as of that time.



Date: Thu, 8 Aug 2002 09:10:12 -0400
From: Jeff Johnson <jbj@redhat.com>
Subject: Re: rpm database - how to repair it?

On Thu, Aug 08, 2002 at 01:37:52PM +0200, Robert Vojta wrote:
<snip>
> What is the safest way how to repair rpm database
> (if it's corrupted)? I was doing this job a long time
> ago and I forgot the whole process :(

Hmmm, "hangs" in select are usually stale locks. Fix by doing

cd /var/lib/rpm rm -f /var/lib/rpm/__db* Otherwise,

All that needs repairing is /var/lib/rpm/Packages, the indices can/will be rebuilt with rpm --rebuilddb later.

Save a copy just in case:

cd /var/lib tar czvf /tmp/rpmdb.tar.gz rpm Verify integrity with

cd /var/lib/rpm db_verify Packages [Note: if using rpm-4.2 as in Red Hat Linux 9 and later, use:
cd /var/lib/rpm /usr/lib/rpm/rpmdb_verify Packages instead, as the db4 version code used differs. Thanks again to M A Young for suggesting this caveat addition in June 2003.]

If there are any errors, repair by doing

mv Packages Packages-ORIG /usr/lib/rpm/db_dump Packages-ORIG | \ /usr/lib/rpm/db_load Packages [Pending: This may differ in rpm-4.2 as in Red Hat Linux 9 and later -- suggestions?]

Read all the headers in Packages by doing

rpm -qa If you segfault here, make an entry at http://bugzilla.redhat.com and I'll tell you what to do.

Rebuild the indices

rpm --rebuilddb ------------------------------------
HTH

73 de Jeff

--
Jeff Johnson ARS N3NPQ
jbj@redhat.com (jbj@jbj.org)
Chapel Hill, NC


Later:
On Thu, Aug 08, 2002 at 08:08:13AM -0400, Gene C. wrote:
>
> 2. There are circumstances that screw up the rpm "database" which
> are not recoverable ... you will need to re-install.

No, there aren't any cases the database is not recoverable. There are still times that a re-install is an easier recovery pathway, however.



Editor's Note added September 18 2002:

In moving to RPM-4.1, the above repair and test process needs an update of the db3-3.x packages packages from the RPM ftp site, if an error of the following sort shown in bold red is encountered:

[root@router root]# rpm -Uvh popt-1.7-1.07.7x.i386.rpm \ rpm-4.1-1.07.7x.i386.rpm rpm-build-4.1-1.07.7x.i386.rpm \ rpm-python-4.1-1.07.7x.i386.rpm Preparing ... ########################################### [100%] 1:popt ########################################### [ 25%] 2:rpm ########################################### [ 50%] 3:rpm-build ########################################### [ 75%] 4:rpm-python ########################################### [100%] # > then try the 'test tip' at the RPM website: # > # > /hintskinks/repairdb/ # > [root@router root]# rpm --rebuilddb [root@router root]# cd /var/lib ; tar czvf /tmp/rpmdb.tar.gz <snip> [root@router lib]# cd /var/lib/rpm [root@router rpm]# db_verify Packages db_verify: Program version 3.2.9 doesn't match environment version 4.0.14 [root@router rpm]#
These earlier 'transition' packages are available here.


Last updated: Sat, 16 Dec 2006 14:26:28 -0500

 

Back to Top page
Maintained by Owl River Company -- Comments to: rpm editor, please.