removing obsolete packages from a local Debian repository

2012-02-21

background info

Bandwidth in South Africa is neither readily-available nor cheap, so whenever I fetch and install Debian packages (e.g. apt-get upgrade), I also keep them in a local custom repository. I use a great tool named reprepro for this, and this is the command I use to update that repository, given the location of the fresh package files:

$ reprepro -vv --basedir ~/.custom_repo/ includedeb tshepang /var/cache/apt/archives/*deb

What's nice about the command is that, if there is an older version of the package I'm adding, it gets replaced, and the package index is updated accordingly.

and now to the topic at hand

I run that command more or less regularly, but very often the repository accumulates some junk:

To help with the cleanup, I have written the following simple script:

#!/usr/bin/env python3

import apt_pkg
import gzip
import subprocess

CUSTOM_REPO = ("/home/wena/.custom_repo/dists/tshepang/main/"
               "binary-i386/Packages.gz")
WHEEZY_REPO = ("/var/lib/apt/lists/ftp.de.debian.org_debian_dists_testing_{}_"
               "binary-i386_Packages")


def main():
    custom_repo = apt_pkg.TagFile(gzip.open(CUSTOM_REPO, "rb"))
    archive_areas = "main contrib non-free".split()
    wheezy_packages = list()
    for archive_area in archive_areas:
        repo = WHEEZY_REPO.format(archive_area)
        repo = apt_pkg.TagFile(gzip.open(repo, "rb"))
        wheezy_packages.extend([pkg["Package"] for pkg in repo])
    for package in custom_repo:
        package_name = package["Package"]
        if package_name not in wheezy_packages:
            cmd = "apt-cache policy " + package_name
            subprocess.call(cmd.split())
            choice = input("remove from cache [Y/n]? ")
            if not choice or choice.lower().startswith("y"):
                cmd = ("reprepro -vv --basedir /home/wena/.custom_repo/ "
                       "remove tshepang " + package_name)
                subprocess.call(cmd.split())

if __name__ == "__main__":
    main()

And here's a snippet of its output:

cx-oracle:
  Installed: 5.1.1-2
  Candidate: 5.1.1-2
  Version table:
 *** 5.1.1-2 0
        500 file:/home/wena/.custom_repo/ tshepang/main i386 Packages
        100 /var/lib/dpkg/status
remove from cache [Y/n]?

What it does is look for packages which are only available in my custom repository, as compared with the one in Wheezy (soon to be Debian 7). It then prompts me on whether or not to remove it from that custom repository. Today, it helped me get rid of dozens of junk.