Q. I can't upgrade any packages, run up2date or rpm.
What is wrong?
Let me state up front, and at the top. Just like the rpm -ba to
rpmbuild -ba conversion, the fight is over: Upgrade!! All known
and reproducable issues needing the discussion matter below are solved in
the laterst production releases of RPM.
There is a man page for rpm2cpio; there are all sorts of
problems thatrpm plain and simply cannot do
anything about. You want to run buggy software? Not rpm's
problem.
A. The RPM database sometimes gets stuck. One cause of this problem is
when a process working on the RPM database is killed, which leaves
(thereafter inaccurate) lock state information behind, limiting what
further processes can do to the database. So, a word to the wise:
make sure such a process really is stuck before killing it -
if a process is using much CPU time (eg. as reported by top) it is
probably still doing useful work.
(A tip o' the hat to: M A Young <m.a.young durham.ac.uk> for
parts of this summarization. revised 17 Jan 2004, after a helpful
mailing list thread, pointing out a prior loose use of nomenclature by RPH.
All remaining errors are mine alone.)
This lock state information has to be repaired manually for rpm
to eliminate what are described at 'hangs', and to permit rpm to
properly function. With the recent capabilities for several programs to
concurently be using, and writing, into the RPM database, RPM alone cannot
no longer safely 'assume' it is safe to clear out arbitrary lock state
information which it or its parent process dod not create. This therefor
relies on the external judgment of the system administrator to make the
determination to reset and clear lock state.
What the nature of the underlying problem is, and how to fix it,
are detailed below:
First informally: When a process _is_ so killed
[sometimes due to a loss of power crash,
sometimes due to a third party helper application dying without cleaning
up lock state information in files at exit,
sometimes due to an impatient admin],
it may also leave some small
corruption behind in the RPM 'SleepyCat' db based database.
Usually the simple
rm -f /var/lib/rpm/__db*
will remove the files which hold lock state information (these files start
with that distinctive "__"). As these lock state holding files are
automatically re-created if missing, this is a safe approach.
It would be safest to clear these locks while still in single user mode; as
a system is booting up, it is in single user mode before init can
fire off any 'child' daemon processes. Checking, we can find:
bash-2.05b$ nl /etc/rc.d/rc.sysinit | grep rpm
556 rm -f /var/lib/rpm/__db*
bash-2.05b$
That is, in the boot-scripts, recent Red Hat releases do exactly this (here
at line 556 of /etc/rc.d/rc.sysinit). The safest approach which
will do this for you, automatically, is to simply verify that a single user
mode script indeed will do the cleanup at boot time when no other processes
are running, and just reboot. [It is all right; that uptime record you were
going for is less important than a clean RPM database -- try the reboot
method.]
But sometimes there are still problems. If the simply removal of the '__*'
files does not clear up the problem, it is also possible that other damage
exists, such corrupted linked lists, and is separately present. Proceed
carefully -- take a backup FIRST, before trying
to have rpm repair the database.
It is also _usually_ safe to rebuild these lists, thus:
rpm -vv --rebuilddb
[Note: we have added the option -vv here, so that there is visible
progress and lots of detail during the rebuild process -- A worried admin is
often impatient at a lack of visible progress -- this option causes the
rebuild process to be quite verbose.]
The way a rpm --rebuilddb command works, it creates a temporary
directory to work in at: /var/lib/, next to the 'parent' one at:
/var/lib/rpm/ and attempts a rebuild. As it gets to the very
bottom of a normal rebuild, it replaces the new content over top of the old
files.
Obviously, if some other process has independently filled the partition
containing /var/lib/, and is hung but ready to run, waiting for
an inode to come free, a 'race' exists which the rebuild process may lose.
At that point, if the 'other' process 'wins' and grabs the 'last' newly
freed available resource, the rebuild process cannot complete succesfully
and RPM database content may be lost. Ouch.
The lesson of this is to always take backups of /var/lib/rpm/ before doing
a --rebuilddb [actually, to take periodic backups generally is of
course always the better practice] and to take care not to rebuild where
ENOSPC can occur. A quick and dirty way to take a temporary on-disk backup
is:
cd /var/lib
mkdir rpm-backup
rsync -av ./rpm/. ./rpm-backup/.
Note: there is a tar based approach below as well. Then run:
cd /var/lib
du | sort -n
df
to 'take stock' and then stop, and evaluate ip a risk exists. Inspect the
process table as well for a wayward process you may have forgot about which
use or access the RPM database.
During and after some rebuilds (as on Red Hat
Linux 9), an unsightly but harmless warning message is produced,
which may safely be ignored. The message looks like this:
[root@dhcp108 rpm]# rpm --rebuilddb
error: db4 error(16) from dbenv->remove: Device or resource busy
[root@dhcp108 rpm]#
This is scary (because the message uses the word 'error',
looks like a 'hung' NFS mount error message, and did not describe itself
just a 'warning'). It breaks the *nix expectation that well-behaved
processes run silently when they are free of errors. Ahhh, well ...
But sometimes, unfortunately, even more substantive database repair is
needed, as described below the section with Jeff Johnson's email
later in this discussion.
Think of a two way linked list (we ignore indices to simplify the
presentation) for RPM Package description items thus:
-> -> -> ->
A B C D E
<- <- <- <-
If we have just a couple of broken links, we do not have any 'orphan
chains' and rpm --rebuilddb can repair things:
-> -> ->
A B C D E
<- <- <-
But if substantial persistent corruption occurs, and is not repaired for a
long time, we might get:
-> ->
A B C D E
<- <-
And it becomes unclear where to re-attach A and B, to the other chain.
(Note: This is an analogy, and does not precisely describe the data
structures, rebuild or dump processes from a formal standpoint.)
What else might cause the problem? Warren Togami has reviewed the
trouble ticket trackers for the Conectiva apt for rpm package, and
others (the Ximian updater, Gerald Teschl's autoupdate), and helped
locate some interaction issues which are being worked on and
discussed on the RH rpm-list.
There is also a reported, but not reproduceable issue involving
corrupted signing Keys; if you have such a report, and can write a test case
to reliably, or even frequently, cause this corruption, please file a
Bugzilla and attach the test case script.
More formally: see the Bugzilla master ticket on this issue,
73097
Separately, a testing release (issued 7 Oct 2002) for an
updated RPM is available at Jeff Johnson's personal ftp site
ftp://people.redhat.com/jbj/test-4.1/
-- Warning: This is NOT a formal Red Hat QA approved
release.
The text marked with the strikethrough above was superceded
on June 2003 -- Please view:
/hintskinks/repairdb-2003-06/ which
summarizes the presently proper RPM version - glibc - kernel
combinations as of that time.
Date: Thu, 8 Aug 2002 09:10:12 -0400
From: Jeff Johnson <jbj@redhat.com>
Subject: Re: rpm database - how to repair it?
On Thu, Aug 08, 2002 at 01:37:52PM +0200, Robert Vojta wrote:
<snip>
> What is the safest way how to repair rpm database
> (if it's corrupted)? I was doing this job a long time
> ago and I forgot the whole process :(
Hmmm, "hangs" in select are usually stale locks. Fix by doing
cd /var/lib/rpm
rm -f /var/lib/rpm/__db*
Otherwise,
All that needs repairing is /var/lib/rpm/Packages, the indices can/will be
rebuilt with rpm --rebuilddb later.
Save a copy just in case:
cd /var/lib
tar czvf /tmp/rpmdb.tar.gz rpm
Verify integrity with
cd /var/lib/rpm
db_verify Packages
[Note: if using rpm-4.2 as in Red Hat Linux 9 and later, use:
cd /var/lib/rpm
/usr/lib/rpm/rpmdb_verify Packages
instead, as the db4 version code used differs. Thanks again to
M A Young for suggesting this caveat addition in June 2003.]
If there are any errors, repair by doing
mv Packages Packages-ORIG
/usr/lib/rpm/db_dump Packages-ORIG | \
/usr/lib/rpm/db_load Packages
[Pending: This may differ in rpm-4.2 as in Red Hat Linux 9 and later --
suggestions?]
Read all the headers in Packages by doing
rpm -qa
If you segfault here, make an entry at
http://bugzilla.redhat.com and
I'll tell you what to do.
Rebuild the indices
rpm --rebuilddb
------------------------------------
HTH
73 de Jeff
--
Jeff Johnson ARS N3NPQ
jbj@redhat.com (jbj@jbj.org)
Chapel Hill, NC
Later:
On Thu, Aug 08, 2002 at 08:08:13AM -0400, Gene C. wrote:
>
> 2. There are circumstances that screw up the rpm "database" which
> are not recoverable ... you will need to re-install.
No, there aren't any cases the database is not recoverable. There are still
times that a re-install is an easier recovery pathway, however.
Editor's Note added September 18 2002:
In moving to RPM-4.1, the above repair and test process needs an update
of the db3-3.x packages packages from the RPM ftp site, if an error of
the following sort shown in bold
red is encountered:
[root@router root]# rpm -Uvh popt-1.7-1.07.7x.i386.rpm \
rpm-4.1-1.07.7x.i386.rpm rpm-build-4.1-1.07.7x.i386.rpm \
rpm-python-4.1-1.07.7x.i386.rpm
Preparing ... ########################################### [100%]
1:popt ########################################### [ 25%]
2:rpm ########################################### [ 50%]
3:rpm-build ########################################### [ 75%]
4:rpm-python ########################################### [100%]
# > then try the 'test tip' at the RPM website:
# >
# > /hintskinks/repairdb/
# >
[root@router root]# rpm --rebuilddb
[root@router root]# cd /var/lib ; tar czvf /tmp/rpmdb.tar.gz
<snip>
[root@router lib]# cd /var/lib/rpm
[root@router rpm]# db_verify Packages
db_verify: Program version 3.2.9 doesn't match environment version 4.0.14
[root@router rpm]#
These earlier 'transition' packages are available here.