Best practice tips for Gentoo sysadmins

Currently, there are some critical ebuild dependency issues in Gentoo’s portage tree that might seriously hurt your box. Symptoms: When updating your system, portage might display an error message similar to this one:

[ebuild     U ] sys-fs/e2fsprogs-1.41.2 [1.40.9] USE=”nls (-static%)” 4,263 kB
[ebuild  N    ] sys-libs/e2fsprogs-libs-1.41.2  USE=”nls” 479 kB
[blocks B     ] sys-libs/ss (is blocking sys-libs/e2fsprogs-libs-1.41.2)
[blocks B     ] <sys-fs/e2fsprogs-1.41 (is blocking sys-libs/e2fsprogs-libs-1.41.2)
[blocks B     ] sys-libs/com_err (is blocking sys-libs/e2fsprogs-libs-1.41.2)
[blocks B     ] sys-libs/e2fsprogs-libs (is blocking sys-libs/ss-1.40.9, sys-libs/com_err-1.40.9)

The important thing: DON’T unmerge ss or com_err, as it will break wget and other essential parts of your system! Portage thus won’t be able to download e2fsprogs-libs-1.41.2 which is required to replace the removed ss and com_err libraries (which are part of e2fsprogs-libs starting with v1.41.2).

Solution: Either wait until this issue gets resolved by the Gentoo core dev team or read through the following posts and Gentoo bug reports:

If you really know what you do, you might want to try this suggested quickfix (Important disclaimer: Looks reasonable and fine as a quick workaround to me, but I haven’t tried it yet. You apply it at your own risk, as usual! Note that this workaround doesn’t solve the real problem.)

As serious issues like these are quite common in Gentoo, here are some best practice tips for Gentoo sysadmins that help prevent some of the potential problems:

  • First of all, try to use stable ebuilds only. If this is not possible for some reason, try to minimize the number of unstable ebuilds (~amd64 etc.) on your system.
  • It’s rather tempting, but DON’T setup a cronjob to do automatic emerges! Portage only catches the most evident issues, but emerging new ebuilds is never without risks (not updating your system is risky too, however). The best approach would be testing any updates on a test box first before installing them on a production system. The second best approach is probably doing a monitored, manual update in small, incremental steps with immediate testing afterwards. This helps isolating problems, should they occur (it’s difficult to isolate a problem that was detected after an automatic update of hundreds of ebuilds).
  • Automate ’emerge –sync’ by putting it in your daily crontab in order to refresh your portage tree regularly. That’s neither particularly safe nor unsafe, but it guarantees that you don’t emerge that weeks-or-months-old broken ebuild that has been fixed in the meantime.
  • Regularly fetch new source packages by setting up a cronjob for ’emerge -uDN –fetchonly world’ (or -f). Like this, portage uses some additional hard disk space for the package sources (always make sure you have enough free space and properly setup partitions/volumes!). It makes sense though as one day, you’ll use most of these source packages anyway and having a source package locally can be very helpful in a situation like the one described in this post. IOW: If you aren’t able to download anything anymore due to a severely broken system, chances are, that you can still solve the problem on localhost, if you have source packages at hand.
  • Append “buildpkg” to the FEATURES variable in /etc/make.conf. Like this, portage will additionally create binary packages in /usr/portage/packages/All when emerging new ebuilds. This will require some spare free space on your hard disk again, but having a prebuilt, binary package at hand can be very helpful if there are any problems with the gcc toolchain or any other compiler chain needed. If you don’t like to enable this feature permanently, you can use the -b or –buildpkg option when executing emerge.
  • If you haven’t used the “buildpkg” feature so far, you can create binary packages of all the installed ebuilds on your system using the “quickpkg” utility and my quickpkg_all bash script.
  • Keep old, compiled kernel images in /boot and listed in your /boot/grub/menu.lst. Booting a new manually configured and compiled kernel is always a bit of an adventure (unless it was tested on an identical box before), and it’s good to keep previous kernels that are known to work. Even if it doesn’t work perfectly, it can take you to a console login prompt at least.
  • Instead of doing things the regular “remove old packages first, then install new packages” way, get used to the Gentoo way of doing things: “install new packages first, then remove old packages (if at all)”. Avoids serious problems that can occur when accidentally deinstalling an old, seemingly no longer used package that other important packages depend on and don’t work without.
  • When merging new configuration files, use dispatch-conf instead of etc-update. dispatch-conf uses CVS to create backups of old config files (which can be a helpful source of information in some situations). See the value of the “archive-dir” setting in /etc/dispatch-conf.conf.
  • Add files and directories to CONFIG_PROTECT, if in doubt. It’s better to have one ._cfg0000_XXX file too much than an important configuration file accidentally overwritten by portage.
  • Use emerge’s -D option for improved (deep) dependency checking.
  • Do regularly use revdep-rebuild to check for broken dependencies and to remerge the according ebuilds.
  • And of course, create automated, incremental backups of your systems regularly. You’ll sleep better, believe me ;)
  • Monitor your systems for errors. I do it with some custom bash scripts I wrote, but there are many full-fledged monitoring solutions for general purpose health monitoring.
  • As a fallback for (some) DNS problems with DHCP-based systems, I regularly send a heartbeat of a DHCP system to a box in another network, revealing the DHCP system’s last known/assigned IP address.
  • Not limited to Gentoo sysadmins: Having a (hardware) remote console accessible via a different IP address is worth a lot in case there are serious troubles with the operating system or the hardware.

Feel free to add other helpful best practice tips for Gentoo sysadmins!

4 Replies to “Best practice tips for Gentoo sysadmins”

  1. Indeed, Yoan! Sometimes, I even used a Gentoo install/livecd as a rescue disc for other distros, as it is very flexible (no automated installers, has nice kernels, modules, tools on the CD, CDs for various architectures).

    For servers where one doesn’t have direct access, one might consider setting up a virtual CDROM drive, PXE or similar using a hardware remote console box/card.

  2. quote: “having a remote console accessible via a different IP address” Yeah i wish i would have one, i have 10 1HE server but every one has a stupid raid card in there no space for such an card, every time i upgrade the kernel i can’t sleep for the next 10 days ^^

    Greetings from Beijing!
    xuedi

  3. @xuedi: That must be a nightmare indeed! I’ve been in that situation too. Wishing you good luck with the kernel updates (maybe setup Xen or another VM to test the updated kernel in a DomU first? Doesn’t detect all problems, particularly not those with a RAID, but may help raise confidence in some cases)! And have fun, nonetheless, in Beijing!

Leave a Reply

Your email address will not be published. Required fields are marked *

5 × = 35

This site uses Akismet to reduce spam. Learn how your comment data is processed.