Portál AbcLinuxu, 1. května 2025 00:45
Author: Petr Ročkai (AToL - PV208, FI.MUNI)
This is not really a report in strict sense of the word, it is more of a
technical proposal. The issue is that currently, the partition detection
code
(with all the quirky metadata to parse) resides in kernel. However, the
current
storage systems (especially LVM2) tend to move metadata handling into
userspace
and use device-mapper to tell the kernel about arrangement of the block
devices.
This approach can be used for partition tables as well -- in fact, there already is an implementation of userspace partition table parser, called kpartx. This will use device-mapper to map sections of block devices, in a manner similar to LVM2.
The obvious downside of this approach is that an initrd would be required to mount filesystems even from old-fashioned partitions (as opposed to logical volumes). However, since nowadays all modern distributions already use initial ramdisk (or equivalents), this is not really a big issue.
On the upside, this change would bring several advantages to the table: first, duplicate code could be removed -- we would only need one set of partition table parsers, removing quirky code from the kernel (which is always a good thing in itself). Note that kpartx is already required by some setups (access to permanent storage attached to eg. fibre channel, through device-mapper/multipath will use kpartx to scan for partitions, since the current kernel code cannot handle that case).
Moreover, the trend is to use udev for handling devices asynchronously: using kpartx for this would again lead to more unification. As block devices appear on the physical layer, kernel generates udev events for them, and udev can call kpartx -- from the kernel point of view, this is the same as for all other devices it knows about, and we can get rid of the partition scanning logic in kernel altogether.
Another advantage is, that since kpartx routes all the partition access through device-mapper, the kernel code exercised by use of partitions would be the same code that is used for LVM -- meaning better exposures of bugs. This should indirectly benefit all users of device-mapper: LVM, dm-crypt and multipath, by stressing the common foundations.
Finally, the ability of device-mapper to change properties of running, mounted devices can be of great benefit. Right now, it is not really possible to edit partition table of a running system, even less it is to migrate an existing, mounted and in-use partition to a different drive. These possibilities all open up through clever use of device-mapper features (like they are already used by LVM). In fact, if a physical volume signature could be written to a partition without interfering with the existing filesystem, it would be possible to migrate existing live partitions to LVM, with all its benefits (live migration, adding mirrors to a live logical volume and so on).
The tools to realize the more advanced aspects of such a move (especially using device-mapper to "hijack" partitions and turn them into logical volumes) might be fairly tricky to implement -- nevertheless, such possibility exists and might be exploited. The remaining advantages of such a system are still compelling enough, though. We intend to create a prototype implementation and if it turns out to work well, propose it for inclusion in Fedora, at some point.
Tiskni
Sdílej:
$ wc -l fs/partitions/*.[ch] ... 5463 celkem
$ size fs/partitions/*.o | perl -e 'while(<>){if(/^((\s*\d+){4})/){$_=$1;@line=split;($text,$data,$bss,$dec)=($text+$line[0],$data+$line[1],$bss+$line[2],$dec+$line[3]);}} print "text:$text, data:$data, bss:$bss, dec:$dec\n";' text:27707, data:420, bss:132, dec:28259
$ rm fs/partitions/built-in.o $ size -t fs/partitions/*.o text data bss dec hex filename 2891 144 0 3035 bdb fs/partitions/check.o 1488 0 0 1488 5d0 fs/partitions/msdos.o 4379 144 0 4523 11ab (TOTALS)check.o by v jádru z velké části musel stejně zůstat, takže jde jen o msdos.o, který má necelý 1.5k. Fakt si myslíš že má cenu dělat userspace subsystém kvůli takové drobnosti?
ISSN 1214-1267, (c) 1999-2007 Stickfish s.r.o.