Skip to content

Persistent L2ARC#9582

Merged
behlendorf merged 1 commit intoopenzfs:masterfrom
gamanakis:persist_l2arc_unified
Apr 10, 2020
Merged

Persistent L2ARC#9582
behlendorf merged 1 commit intoopenzfs:masterfrom
gamanakis:persist_l2arc_unified

Conversation

@gamanakis
Copy link
Copy Markdown
Contributor

@gamanakis gamanakis commented Nov 13, 2019

This commit makes the L2ARC persistent across reboots. It is
based on issue 3525 in Illumos by Saso Kiselkov, which was later
ported by Yuxuan Shui to zfsonlinux (#2672) with modifications by
Jorgen Lundman, and rebased to current master with fixes and new
features by me.

Co-authored-by: Saso Kiselkov [email protected]
Co-authored-by: Jorgen Lundman [email protected]
Co-authored-by: George Amanakis [email protected]
Ported-by: Yuxuan Shui [email protected]
Signed-off-by: George Amanakis [email protected]

Motivation and Context

This feature implements a light-weight persistent L2ARC metadata
structure that allows L2ARC contents to be recovered after a reboot.
This significantly eases the impact a reboot has on read performance
on systems with large caches.

Closes #3744

Description

This is a substantial rework of issue 3525 in Illumos and PR #2672
It includes numerous fixes and new features to be listed here.

How Has This Been Tested?

A number of tests have been added in ZTS.
Also running in a production server (VMs with ZVOLs, Samba, NFS, LXC) with unencrypted ZFS since 11/22/2019.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

@Crow-Control
Copy link
Copy Markdown
Contributor

Crow-Control commented Nov 13, 2019

You could change the commit message to:
"This commit makes the L2ARC persistent across reboots. It is largely
based on 3525 commit on Illumos"

The rest is already in the Authored and ported by tags.

That way it complies with the stylerule a little more.

edit
I've since come to realise i'm a retard and the style rule is 75 PER LINE, not 75 total. Please ignore this.

Copy link
Copy Markdown
Contributor

@Crow-Control Crow-Control left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While i'm not versed in ZoL enough to review some/most of the actual function calls.
I do want to note it's very neatly structured and byond-well commented.

@Crow-Control
Copy link
Copy Markdown
Contributor

Quick work fixing the commit message!
Seems only some space-tabs issues and the sort are left now :)
http://build.zfsonlinux.org/builders/Ubuntu%2017.10%20x86_64%20%28STYLE%29/builds/18872/steps/shell_2/logs/stdio

@gamanakis
Copy link
Copy Markdown
Contributor Author

@Ornias1993 I accidentally deleted your comment regarding disabling persistent L2ARC. I guess we could make it an option the user can set, ie a zfs property?

@Crow-Control
Copy link
Copy Markdown
Contributor

@gamanakis No worries that was me, I didnt notice I already submitted it and afterwards scrolled up and noted you already had a quite easy/nice hook to disable persistent L2Arc :)

(not a property yet though, so maybe that can be added indeed )

@ahrens
Copy link
Copy Markdown
Member

ahrens commented Nov 13, 2019

Awesome, thanks for picking this up!

based on 3525 commit on Illumos

I don't see that commit in illumos. Could you specify which codebase you started from? Is it #2672 ?

@ahrens ahrens added Type: Feature Feature request or new feature Status: Code Review Needed Ready for review and testing labels Nov 13, 2019
@Crow-Control
Copy link
Copy Markdown
Contributor

Crow-Control commented Nov 13, 2019

@ahrens My fault of being unclear with my advice to change the commit message. @gamanakis also accidentily changed the PR summary.
this was the original (that should be put back, in the PR summary but left out of the commit summary):

This commit makes the L2ARC persistent across reboots. It is largely
based on 3525 commit on Illumos by Saso Kiselkov, which was later
ported by Yuxuan Shui to zfsonlinux, and rebased to current master
with fixes by myself.

With regards to the Illumos commit...
I see the mistake now, it is based on a Ilumos Issue, not commit.
https://www.illumos.org/issues/3525

@gamanakis
Copy link
Copy Markdown
Contributor Author

@ahrens Yes, I picked up from #2672

@ahrens
Copy link
Copy Markdown
Member

ahrens commented Nov 13, 2019

Thanks for clarifying.

@Crow-Control
Copy link
Copy Markdown
Contributor

@behlendorf Do you have anyone in mind to request a review from?

@behlendorf
Copy link
Copy Markdown
Contributor

@gamanakis thanks for moving this work forward! I haven't had a chance to take a careful look at this PR to provide any detailed feedback. But when skimming I noticed that it doesn't appear to add any new test cases. One of the next steps for this PR would be add some to the tests/zfs-tests/tests/functional directory to verify everything is working as intended.

@gamanakis
Copy link
Copy Markdown
Contributor Author

@behlendorf I will work on that. Meanwhile I did a minor cleanup and implemented this as a zpool property, so that the user can set it. Is it ok if I push (or force-push?) those commits in this PR?

@Crow-Control
Copy link
Copy Markdown
Contributor

@gamanakis Usually those changes can get pushed to the same repo no problem, certainly before formal review :)

@scineram
Copy link
Copy Markdown

scineram commented Nov 16, 2019

Thanks for picking this up.

What is the point of another property?

@Crow-Control
Copy link
Copy Markdown
Contributor

@scineram I think @gamanakis means the earlier discussed option to enable/disable the persistance

@Crow-Control
Copy link
Copy Markdown
Contributor

@gamanakis Just as headsup: lookup if your git email on your machine is the same as the one setup on github. it doesn't recognise your commits as linked to your github account. you might want too ;)

@Crow-Control
Copy link
Copy Markdown
Contributor

@gamanakis You might need to add your property to the test suite too, test errors:
zpool_get_002_pos
Found zpool features not in the zpool_get test config 59/60.

@gamanakis
Copy link
Copy Markdown
Contributor Author

gamanakis commented Nov 17, 2019

I pushed a final functional commit finalizing the zpool property, and adding man pages.
I am working on the tests now.

@gamanakis gamanakis force-pushed the persist_l2arc_unified branch 4 times, most recently from 4374a5a to acf0bac Compare November 17, 2019 06:19
Comment thread man/man8/zpool.8 Outdated
Comment thread man/man8/zpool.8 Outdated
@gamanakis gamanakis force-pushed the persist_l2arc_unified branch from 687dae1 to e02d8f0 Compare November 17, 2019 15:26
@gamanakis
Copy link
Copy Markdown
Contributor Author

gamanakis commented Mar 14, 2020

6b1a0aa: Rebase to master, update tests, simplify the logic between dh_evict and l2ad_evict.

@gamanakis
Copy link
Copy Markdown
Contributor Author

gamanakis commented Mar 15, 2020

b8ef435: Cover one more case in l2arc_log_blkptr_valid() and keep cksum related code together in zdb.c.

8bdb6fe: Simplify l2arc_log_blkptr_valid() and do not evict a cache device when onlining with !l2arc_rebuild_enabled.

@adamdmoss
Copy link
Copy Markdown
Contributor

I've been testing this day-to-day on my desktop machine for a couple of months now. It's been behaving well.

Every few reboots it'll drop the cache and I'll need to warm it again, but since I'm rebuilding openzfs with fresher versions of this patch regularly I'm assuming this is expected as the cache header/format/version/validation changes.

@gamanakis
Copy link
Copy Markdown
Contributor Author

@adamdmoss Thank you for testing. I have been making some changes that could cause this. I am about to push a final one (hopefully) to cover some corner cases.

@gamanakis
Copy link
Copy Markdown
Contributor Author

gamanakis commented Mar 17, 2020

60c9ca3: avoid infinite loops after restoring a log block in l2arc_rebuild(), update zdb.c logic to reflect this, rebase to master.

@gamanakis
Copy link
Copy Markdown
Contributor Author

217790d: Make code more concise in zdb.c, improve l2arc_log_blkptr_valid() to detect evicted blocks like this:

l2ad_start =================================== l2ad_end
                               ^             ^
                      l2ad_hand     l2ad_evict
           ------^           ^----------------
           lbp_daddr      payload_start

@adamdmoss
Copy link
Copy Markdown
Contributor

adamdmoss commented Mar 25, 2020

Been getting what looks like infinite recursion since updating to current HEAD of persist_l2arc_unified_review (79fbfb6):

[   97.177110] Call Trace:
[   97.177136]  l2arc_evict+0x354/0x480 [zfs]
[   97.177160]  l2arc_evict+0x354/0x480 [zfs]
(many many repetitions removed)
[   97.180363]  l2arc_evict+0x354/0x480 [zfs]
[   97.180386]  l2arc_evict+0x354/0x480 [zfs]
[   97.180410]  l2arc_feed_thread+0x296/0x1780 [zfs]
[   97.180413]  ? kfree+0x1f7/0x210
[   97.180416]  ? __schedule+0x2b0/0x670
[   97.180439]  ? l2arc_remove_vdev+0x1f0/0x1f0 [zfs]
[   97.180445]  thread_generic_wrapper+0x74/0x90 [spl]
[   97.180447]  kthread+0x121/0x140
[   97.180451]  ? __thread_exit+0x20/0x20 [spl]
[   97.180452]  ? kthread_park+0xb0/0xb0
[   97.180454]  ret_from_fork+0x1f/0x40

@gamanakis
Copy link
Copy Markdown
Contributor Author

gamanakis commented Mar 25, 2020

@adamdmoss Please do not test persist_l2arc_unified_review. That branch contains not up-to-date experimental pushes. The one this PR is based upon is persist_l2arc_unified. Thank you.

@gamanakis
Copy link
Copy Markdown
Contributor Author

gamanakis commented Mar 25, 2020

While testing this PR I found a subtle bug in the L2ARC implementation, unrelated to this PR:
l2arc_feed_thread runs periodically like this: l2arc_write_size -> l2arc_evict(size) -> l2arc_write_buffers(size)

Assume l2ad_hand is approaching l2ad_end. l2arc_write_buffers will reset l2ad_hand at its end taking into account the write size calculated at l2arc_write_size at the start of this cycle. However if the user increases l2arc_write_{max,boost} this will happen in the next cycle when l2arc_write_size runs again. Now if l2arc_write_buffers does not have enough space to perform its writes (as it didn't account for the new write size), zio_write_phys panics.

This is unlikely to happen in a production environment as l2arc_write_{max,boost} are not constantly changed, and the cache size exceeds those parameters. However, it can be fixed by moving resetting of l2ad_hand to l2arc_evict so that the new write size is always accounted for.

I will make a new PR (#10154), and then we can rebase this one on that.

@gamanakis
Copy link
Copy Markdown
Contributor Author

gamanakis commented Mar 27, 2020

0ab68d3: Addresses a comment from @ahrens. Comment on padding of log entries, add a CTASSERT on the size of log blocks, set the size of log entries to 64 bytes (from 128) and size of log blocks to 64K (from 128K). Reduce the maximum amount of log entries in a log block to 1022 (from 1023). Also improve the tests.

eb7e847: Rebase to master, resolve a conflict in a man page.

@gamanakis
Copy link
Copy Markdown
Contributor Author

842a51d: Rebase to master. No functional changes. Incorporate #10154 and employ the newly introduced get_arcstat function in the tests.

@behlendorf
Copy link
Copy Markdown
Contributor

behlendorf commented Apr 1, 2020

As mentioned on the March 30th OpenZFS call there's still time to provide additional review feedback on this feature PR. Let's target getting this merged by April 10th, so if you have input please provide it by then.

@gamanakis
Copy link
Copy Markdown
Contributor Author

gamanakis commented Apr 1, 2020

f005091: No functional change. Improve the implementation of #10154 and add comments.

1ee5bac: Make L2BLK_*_COMPRESS use SPA_COMPRESSBITS, like BP_*_COMPRESS. Again no functional changes.

e5ef3fd: Minor style changes consolidating code in tests, zdb.c and arc.c. Also give the 007 test more time to retrieve the amount of log blocks using zdb after issuing the off-lining of the cache device. Report the amount of restored blocks in dbgmsg if the rebuild was successful.

303f413: Fix the placement of DTRACE_PROBE4 in l2arc_evict().

412d6f5: Prevent unnecessary iteration of l2arc_evict() if we evict all buffers.

fbf8134: If we encounter a log block with invalid cksum in zdb.c, print out its pointer.

e044ce2: Report the number of restored log blocks in dbgmsg even when the rebuild was aborted prematurely.

b74fa37: Minor fix in zdb.c when dumping log entries.

This commit makes the L2ARC persistent across reboots. We implement
a light-weight persistent L2ARC metadata structure that allows L2ARC
contents to be recovered after a reboot. This significantly eases the
impact a reboot has on read performance on systems with large caches.

Co-authored-by: Saso Kiselkov <[email protected]>
Co-authored-by: Jorgen Lundman <[email protected]>
Co-authored-by: George Amanakis <[email protected]>
Ported-by: Yuxuan Shui <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
@adamdmoss
Copy link
Copy Markdown
Contributor

Still behaving well FWIW. :)

@behlendorf
Copy link
Copy Markdown
Contributor

Merged. Thanks to everyone who helped get this done!

@tycho
Copy link
Copy Markdown
Contributor

tycho commented Apr 10, 2020

Thanks for your hard work on this, folks! This is going to be a great benefit to several machines I maintain with ZOL.

@BrainSlayer
Copy link
Copy Markdown
Contributor

why is this patch enforcing lz4 compression, no matter which compression was selected? its also uses lz4 if compression is disabled.

@gamanakis
Copy link
Copy Markdown
Contributor Author

@BrainSlayer the persistent L2ARC does use LZ4 for the compression of the metadata (log blocks). Otherwise the L2ARC buffers (L2ARC data) use whatever compression the filesystem uses.

@gamanakis
Copy link
Copy Markdown
Contributor Author

gamanakis commented Apr 11, 2020

@BrainSlayer you can see what compression algorithm is used for L2ARC data by inspecting the metadata with:
zdb -lll device | less
This produces for example:

lb[   1]        le[   5]        DVA asize: 105472, vdev: 0, offset: 2484060160
|                               birth: 19
|                               lsize: 131072
|                               psize: 105472
|                               compr: 15
|                               type: 1
|                               protected: 0
|                               prefetch: 0
|                               address: 663780352
|

Which tells you: the 5th buffer (le[5]) referenced by the first log block (lb[1]) is using compression algorithm (compr) 15, which in master is lz4 (found in zio_compress.h) because I set the filesystem compression to lz4. If you run this with the ZSTD PR merged in, and in a filesystem where zstd is enabled you will see 16 (the value for zstd in zio_compress.h) in that PR.

@gamanakis gamanakis mentioned this pull request Apr 11, 2020
12 tasks
@BrainSlayer
Copy link
Copy Markdown
Contributor

the question is more why you enforce lz4 and why not using the configured algorithm (or even no compression if disabled)

@gamanakis
Copy link
Copy Markdown
Contributor Author

gamanakis commented Apr 11, 2020

@BrainSlayer L2ARC devices can cache from different filesystems with different compression algorithms. Doing what you suggested makes it very complex and prone to bugs. After all only the metadata on L2ARC used for restoring the buffers use LZ4, the actual cached buffers (which were fetched from ARC) still use the compression used for the filesystem.

@gamanakis
Copy link
Copy Markdown
Contributor Author

gamanakis commented Apr 11, 2020

@BrainSlayer actually your suggestion is impossible with the current implementation. The metadata (log blocks) consist of up to 1022 entries which refer to buffers cached from ARC. Each of those 1022 cached buffers may be using a different compression algorithm, dependent on which filesystem it comes from. Ie the log block can contain entries referencing buffers, some of which are LZ4, some ZSTD, some with compression disabled. So which compression algorithm would the log block use?
This is the reason the compression for log blocks (again these are the metadata on L2ARC only) was fixed to LZ4.

@BrainSlayer
Copy link
Copy Markdown
Contributor

okay. thanks for explaination. i will consider this in my rework of the zstd patch. so your own patch might be already good enough

@fedstryale
Copy link
Copy Markdown

Is it possible for l2arc to cache data even if it's still in ram and not evicted(duplicated cache)? For now, if the ram is still enough to hold cache, the data cached cannot be retrieved between reboots through persistent l2arc. It's hard to accept the most frequently used data in ram cannot be accelerated by persistent l2arc.

@gamanakis
Copy link
Copy Markdown
Contributor Author

gamanakis commented Apr 29, 2020

@fedstryale yes it is. Try setting l2arc_headroom = 0 in current master. That way the L2ARC will scan the full ARC lists for cacheable buffers, long before they are evicted. This is also documented in the zpoolconcepts.8 man page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Accepted Ready to integrate (reviewed, tested) Type: Feature Feature request or new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NEX-3514 Implement persistent L2ARC