19.6. 高级主题

19.6.1. 优化调整

这里有数个可调校的项目可以调整,来让 ZFS 在面对各种工作都能以最佳状况运作。

  • vfs.zfs.arc_max - Maximum size of the ARC. The default is all RAM but 1 GB, or 5/8 of all RAM, whichever is more. However, a lower value should be used if the system will be running any other daemons or processes that may require memory. This value can be adjusted at runtime with sysctl(8) and can be set in /boot/loader.conf or /etc/sysctl.conf.

  • vfs.zfs.arc_meta_limit - Limit the portion of the ARC that can be used to store metadata. The default is one fourth of vfs.zfs.arc_max. Increasing this value will improve performance if the workload involves operations on a large number of files and directories, or frequent metadata operations, at the cost of less file data fitting in the ARC. This value can be adjusted at runtime with sysctl(8) and can be set in /boot/loader.conf or /etc/sysctl.conf.

  • vfs.zfs.arc_min - Minimum size of the ARC. The default is one half of vfs.zfs.arc_meta_limit. Adjust this value to prevent other applications from pressuring out the entire ARC. This value can be adjusted at runtime with sysctl(8) and can be set in /boot/loader.conf or /etc/sysctl.conf.

  • vfs.zfs.vdev.cache.size - 预先分配的内存量,作为池中每个设备的缓存。使用的内存总量将是这个值乘以设备数量。此值只能在操作系统启动时调整,可以在 /boot/loader.conf中设置。

  • vfs.zfs.min_auto_ashift - 在创建池时自动使用的最小ashift(扇区大小)。该值是二的幂。默认值9表示2^9 = 512,扇区大小为512字节。为了避免写放大并获得最佳性能,请将此值设置为池中设备使用的最大扇区大小。

    许多驱动器有4KB的扇区。在这些硬盘上使用默认的ashift9,会导致这些设备上的写入量放大。原本可以包含在单个 4 KB 写入中的数据必须以 8 个 512 字节的写入方式写入。ZFS在创建池时,会尝试从所有设备中读取本机扇区大小,但许多具有4 KB扇区的驱动器报告说,为了兼容性,它们的扇区是512字节。在创建池之前,将 vfs.zfs.min_auto_ashift设置为 12 (2^12 = 4096),可迫使 ZFS使用 4 KB 块,以获得这些驱动器上的最佳性能。

    Forcing 4 KB blocks is also useful on pools where disk upgrades are planned. Future disks are likely to use 4 KB sectors, and ashift values cannot be changed after a pool is created.

    在某些特定情况下,使用较小的 512 字节块大小可能更合适。当与 512 字节的数据库磁盘一起使用时,或用作虚拟机的存储时,在小型随机读取期间传输的数据较少。这可以提供更好的性能,尤其是在使用较小的ZFS记录大小时。

  • vfs.zfs.prefetch_disable - Disable prefetch. A value of 0 is enabled and 1 is disabled. The default is 0, unless the system has less than 4 GB of RAM. Prefetch works by reading larger blocks than were requested into the ARC in hopes that the data will be needed soon. If the workload has a large number of random reads, disabling prefetch may actually improve performance by reducing unnecessary reads. This value can be adjusted at any time with sysctl(8).

  • >vfs.zfs.vdev.trim_on_init - 是否启用存储池中磁盘的TRIM。这可以提升SSD的性能并延长其使用寿命。如果设备已经被安全擦除,禁用此设置将使新设备的添加速度更快。该值可通过sysctl(8)随时调整。

  • vfs.zfs.vdev.max_pending - 限制每个设备的挂起的I/O请求数量。较高的值将使设备的命令队列保持满员,并可能提供更高的吞吐量。较低的值会降低延迟。这个值可以通过sysctl(8)随时调整。

  • vfs.zfs.top_maxinflight - Maxmimum number of outstanding I/Os per top-level vdev. Limits the depth of the command queue to prevent high latency. The limit is per top-level vdev, meaning the limit applies to each mirror, RAID-Z, or other vdev independently. This value can be adjusted at any time with sysctl(8).

  • vfs.zfs.l2arc_write_max - 限制 L2ARC 的写入速度。此参数为延长 SSD 使用寿命设计。可使用 sysctl(8) 随时调整该参数。

  • vfs.zfs.l2arc_write_boost - The value of this tunable is added to vfs.zfs.l2arc_write_max and increases the write speed to the SSD until the first block is evicted from the L2ARC. This Turbo Warmup Phase is designed to reduce the performance loss from an empty L2ARC after a reboot. This value can be adjusted at any time with sysctl(8).

  • vfs.zfs.scrub_delay - Number of ticks to delay between each I/O during a scrub. To ensure that a scrub does not interfere with the normal operation of the pool, if any other I/O is happening the scrub will delay between each command. This value controls the limit on the total IOPS (I/Os Per Second) generated by the scrub. The granularity of the setting is determined by the value of kern.hz which defaults to 1000 ticks per second. This setting may be changed, resulting in a different effective IOPS limit. The default value is 4, resulting in a limit of: 1000 ticks/sec / 4 = 250 IOPS. Using a value of 20 would give a limit of: 1000 ticks/sec / 20 = 50 IOPS. The speed of scrub is only limited when there has been recent activity on the pool, as determined by vfs.zfs.scan_idle. This value can be adjusted at any time with sysctl(8).

  • vfs.zfs.resilver_delay - Number of milliseconds of delay inserted between each I/O during a resilver. To ensure that a resilver does not interfere with the normal operation of the pool, if any other I/O is happening the resilver will delay between each command. This value controls the limit of total IOPS (I/Os Per Second) generated by the resilver. The granularity of the setting is determined by the value of kern.hz which defaults to 1000 ticks per second. This setting may be changed, resulting in a different effective IOPS limit. The default value is 2, resulting in a limit of: 1000 ticks/sec / 2 = 500 IOPS. Returning the pool to an Online state may be more important if another device failing could Fault the pool, causing data loss. A value of 0 will give the resilver operation the same priority as other operations, speeding the healing process. The speed of resilver is only limited when there has been other recent activity on the pool, as determined by vfs.zfs.scan_idle. This value can be adjusted at any time with sysctl(8).

  • vfs.zfs.scan_idle - Number of milliseconds since the last operation before the pool is considered idle. When the pool is idle the rate limiting for scrub and resilver are disabled. This value can be adjusted at any time with sysctl(8).

  • vfs.zfs.txg.timeout - Maximum number of seconds between transaction groups. The current transaction group will be written to the pool and a fresh transaction group started if this amount of time has elapsed since the previous transaction group. A transaction group my be triggered earlier if enough data is written. The default value is 5 seconds. A larger value may improve read performance by delaying asynchronous writes, but this may cause uneven performance when the transaction group is written. This value can be adjusted at any time with sysctl(8).

19.6.2. i386 上的ZFS

ZFS提供的一些功能是内存密集型的,可能需要在内存有限的系统上进行调优以实现最高效率。

19.6.2.1. 内存

As a bare minimum, the total system memory should be at least one gigabyte. The amount of recommended RAM depends upon the size of the pool and which ZFS features are used. A general rule of thumb is 1 GB of RAM for every 1 TB of storage. If the deduplication feature is used, a general rule of thumb is 5 GB of RAM per TB of storage to be deduplicated. While some users successfully use ZFS with less RAM, systems under heavy load may panic due to memory exhaustion. Further tuning may be required for systems with less than the recommended RAM requirements.

19.6.2.2. 内核配置

由于在 i386™ 平台上位址空间的限制,在 i386™ 架构上的 ZFS 使用者必须加入这个选项到自订核心配置文件,重新编译核心并重新开启:

options        KVA_PAGES=512

This expands the kernel address space, allowing the vm.kvm_size tunable to be pushed beyond the currently imposed limit of 1 GB, or the limit of 2 GB for PAE. To find the most suitable value for this option, divide the desired address space in megabytes by four. In this example, it is 512 for 2 GB.

19.6.2.3. 载入程序可调参数

The kmem address space can be increased on all FreeBSD architectures. On a test system with 1 GB of physical memory, success was achieved with these options added to /boot/loader.conf, and the system restarted:

vm.kmem_size="330M"
vm.kmem_size_max="330M"
vfs.zfs.arc_max="40M"
vfs.zfs.vdev.cache.size="5M"

要取得更多详细的 ZFS 相关调校的建议清单,请参考 http://wiki.freebsd.org/ZFSTuningGuide

本文档和其它文档可从这里下载: ftp://ftp.FreeBSD.org/pub/FreeBSD/doc/.

如果对于FreeBSD有问题,请先阅读 文档,如不能解决再联系 <questions@FreeBSD.org>.

关于本文档的问题请发信联系 <doc@FreeBSD.org>.