howto access subuids for lxc for backup and tar

Sometimes it takes a while to come to the obvious solutions, however I did learn a lot about namespaces ans stuff, but conclusions first, if you want to have somethink like fakeroot but for lxc to create backups without knowing about mapped userids or if you want to untar a priviledged installation, you may want to use lxc-usernsexec

cd ~/.local/lxc/yourvm/rootfs ; lxc-usernsexec -- tar xvfz somearchive.tar

You can easily map to different maps or map as root by using -m (u|g|b):0:startid:range for example -m b:0:1738400:65535. lxc-usernsexec does not do a changeroot. so you can use all the tools and data from the base system. However, keep in mind that all the files which are not in your mapped range including the data of the own user, are owned by user nobody and nogroup, that means that ssh private keys and gpg data is not accessable, except if you set the permissions accordingly.

Now a bit Theory:
The namespace change is done by cloning a process with the clone() C system call this one is the basic system call from which fork() exec() and all the others are derived from. but with clone() you can decide that the cloned system process has his own namespace. The Manpage user_namespaces provides you with some piece of C code doing exactly that. If you look at the code you might find a few confusing parts.

  • The uid and gid Mappings can only be made by the root user
  • The uid and gid mappings can not be made by the cloned process, even if it is a root process.
  • The uid and gid mappings can not be preset during clone()

This means clone() creates a new process with a new user namespace, which then has to wait for the parent or a setuid root process to set the mappings, and once this is done, it can continue with its work by, for example executing the process that really does the work. In the example from the user_namespaces manpace this waiting is done by waiting for a pipe to be closed by the parent.


close(args->pipe_fd[1]); /* Close our descriptor for the write
end of the pipe so that we see EOF
when parent closes its descriptor */
if (read(args->pipe_fd[0], &ch, 1) != 0) {
fprintf(stderr,
"Failure in child: read from pipe returned != 0\n");
exit(EXIT_FAILURE);
}

Because the experimental tool in the user_namespaces manpage only does a write to /proc//_map, it will only work as root, and there is no check if the given map makes sense. This check is done by other tools like newuidmap and newgidmap these are also running setuid root, so they have to check if the user is allowed to do the mapping before they actually set it. You may have to keep that in mind before actually implement your own code, that it may be better to call these tools.

you may also need to know that there is nsenter which is a tool that just enters the exact same namespace of a process if the calling process is allowed to do that.

You can find additional information in secondary literature and other blog posts. regards

Posted on May 19, 2016, 5:39 pm By
Comments Categories: code, software