Wednesday, May 21, 2014

GSoC 2014: mock improvements - week 1

My proposal for Google Summer of Code 2014 - improving the mock tool has been accepted. In the next few months I'm going to focus on improving this tool used in packaging to facilitate faster and more effective workflow for packagers.

What do I want to improve?
(This section is just recapitulation of the proposal, if you already read it, skip it)
  • Better caching - imporve the performance and provide easier mechanisms for cloning and manipulating buildroots. Currently, mock uses tarballs for keeping the buildroot cache and extracts them when needed. This is slow and not very flexible. Using snapshots would provide a way to conveniently save, restore or clone the current state of the buildroot. This could speed up the workflow of packagers who have to work with many packages at once. Current options are either cleaning the buildroot with each package which takes a long time to since dependencies have to be installed again. Or manually copying the buildroot which is also slow and inconvenient. Or not cleaning the mock at all which can make some packaging mistakes (missing BuildRequires) go unnoticed. My proposed modification could present another workflow that could be faster than cleaning with each rebuild but not error prone as not cleaning at all. For example Java packages built with Maven could save a buildroot snapshot with maven-local and its long chain of dependencies and reuse it for multiple packages.
  • Provide a way to revert recent changes without having to clean the buildroot and install all dependencies over again. A similar idea to the previous. Mock could automatically make a snapshot after installing package's build dependencies, but before actually building it. This could provide a way to do manual modifications to buildroot without having to reinstall dependencies after you turned your immediate changes into patches and you want to rebuild it to see whether they work.
  • Ability to work interactively - add realtime logging to the terminal. Packager should immediately see the output of the package management system and the build system on his terminal without having to look for the logs stored deep in the filesystem
  • Allow to use mock shell during running build - mock uses one global lock per buildroot which prevents you to use the mock shell when there is already a build in progress and vice versa. More finely-grained locking would be more appropriate, because packager usually doesn't want to interfere with the running build, but just query some information or copy files.
  • Allow using rpmbuild's short-circuit mechanisms to reinitiate failed build without having to start over - rpmbuild provides short-circuit option that starts the execution of the build in the given phase skipping previous phases. If the package built fine but the file verification failed, don't force the packager to repeat the whole process from the very beginnning.
  • Use DNF instead of YUM to install packages - DNF is a modular package manager that is meant to replace YUM in the future. Since it is already usable, mock should be able to use it instead.
  • Provide more ways to interact with the package management system within the mock - using DNF straight from the mock shell could be more straightforward than interacting with the package manager via mock command line interface.
  • Handle user interrupts without corrupting the buildroot or leaving still running processes - If you run a mock build and realize you made a mistake and want to stop it with <C-c>, there is a high probability that your buildroot will not be usable for the next build or there will be some remaining processes that weren't terminated when you interrupted. It also should provide a way to pause the whole build, in case you need more computational power or your battery is running low due to increased resource usage
The repository
I've setup a git fork of the upstream git repository based on the msuchy-work branch. You can see my changes here: https://github.com/msimacek/mock

What have I already done?
Started refactoring the mock backend, which currently lacks the abstraction needed for implementing aforementioned features. Mock started as a simple script and evolved as such. The functionality is scattered across the modules and lots of the commands are big monolithic sequences of function calls performing different things that could and should be split into smaller parts. My mentor already suggested making a set of small, atomic commands forming a low-level interface on which the high-level and more complex commands would be built. I began with refactoring the locking scheme. The current implementation allows only one running instance of mock per buildroot by using an exclusive file-based lock. The functions that operate on the buildroot are responsible for locking the buildroot, performing the initialization and the cleanup. And sometimes those actions are done directly from main. I'm trying to factor out the common parts, so the buildroot preparation and finalization is a responsibility of single function and the commands can assume that the builroot is already prepared and will be cleaned up after they finish without their direct intervention.
I changed the locking scheme to use the exclusive lock only when the operations are destructive (buildroot cleaning/deleting). For non-destructive commands the buildroot is locked exclusively only for its initialization and only if it wasn't initialized already. Then it is locked with a shared lock, allowing multiple instances at the same time to coexist. If the buildroot is locked with a shared lock, we know that there is still some mock process working. Thus, the initialization is done only by the first process to enter buildroot and the cleanup actions are performed only by the process that happens to be the last. This allows for example having a running build and still logging into the mock shell and examining the build environment. Further redesigning is needed to enable the possibility of having lvm-based storage with snapshot abilities, which will require slightly different initialization and finalization to work.

Another part where an additional abstraction will be needed is the package management system. Currently, only yum is supported and the command is hardcoded, therefore adding support for dnf needs some refactoring. I created an abstract class providing a common interface for package management, which will then have subclasses Yum and Dnf that would handle specifics of those two systems. The current implementation I have is just a skeleton denoting how it may look like, but it needs to be integrated to the rest of the mock and replace hardcoded invocations of yum.

What I plan to do next?
Continue with refactoring. Apropriate abstraction is essesntial to be able to add new features and fix existing issues. The whole initialization part is quite messy and mixes input parsing, config file parsing, initialization and performing the actual commands into one big part. The backend should be divided into low-level atomic commands and high-level ones, similarly to git which has a plumbing a layer and a porcelain layer built on top of it. It should also provide direct acces to the underlying package manager (currently yum) in order to not restrict users to a particular set of commands and enable them to utilize its full power.

No comments:

Post a Comment