Skip to content

A minimal runnable version of XArray.#1

Merged
cchanging merged 1 commit intoasterinas:mainfrom
cchanging:main
Jan 1, 2024
Merged

A minimal runnable version of XArray.#1
cchanging merged 1 commit intoasterinas:mainfrom
cchanging:main

Conversation

@cchanging
Copy link
Copy Markdown
Contributor

No description provided.

@cchanging cchanging merged commit 6d3cd05 into asterinas:main Jan 1, 2024
/// 用户需要确保该裸指针与原指针在内存布局上一致
/// 也就是原指针大小跟裸指针大小一致,且指向的位置一致
/// 同时需要确保原指针是aligned to 4的,因为XArray中的entry后两位要用于entry类型判断。
unsafe fn into_raw(self) -> *const u8;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should be safe. For the same reason as Box::into_raw and Arc::into_raw being safe.


/// 用来抽象用户将要存入XArray的类型,这些存入XArray的obj称之为item。
/// 对于存入的item,要求其大小为4字节,可以是各种指针类型或者是usize、u64等整数类型。
pub(crate) trait ItemEntry {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A user-facing trait should be public. Write integration or doc tests to ensure all user-facing traits or types are public.

unsafe fn load_item<'a>(entry: &'a XEntry) -> Self::Target<'a>;
}

impl<I: PointerItem> ItemEntry for I {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This blanket implementation would be confusing for the user. Think about it. Both PointerItem and ItemEntry traits provide from_raw and into_raw for pointer types like Arc<_> and Box<_>, which themselves already implements from from_raw and into_raw. So here is the million-dollar question: what implementation does the following into_raw use?

use core::sync::Arc;
use crate::{ItemEntry, PointerItem};

let smart_ptr = Arc::new(123);
let _ = Arc::into_raw(smart_ptr);

/// - sibling entry (用于multi-index entries), empty entry, retry entry (用于异常处理)
/// - item entry 用户存储的item
/// - pointer entry 后两位为00, (tagged pointer entry)
/// - value entry 末位为1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have decoded the type of user items as a type parameter of XArray. So I assume that we do not need to use runtime information to tell whether the user items are pointers or values.

/// - pointer entry 后两位为00, (tagged pointer entry)
/// - value entry 末位为1
///
/// XEntry没有所有权,可以copy,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see why XEntry does not has ownership. XNode contains XEntry. XNode must ensure its contained entries refer to valid internal nodes or user-provided values. And when an XNode is dropped, all its contained entries should have been dropped, too. This is by definition ownership.

As an evidence, XNode's drop method transforms each and every contained XEntry objects into OwnedEntry without any extra condition to drop the entries manually. From this implementation, it is clear that XEntry is treated as OwnedEntry.

Here is yet another evidence that suggests having both XEntry and OwnedEntry is a bad idea. OwnedEntry has a safe constructor method called from_raw, which takes an XEntry, but the latter implements Copy. This means multiple OwnedEntry pointing to a single user-provided item or XEntry node may be created without calling any unsafe functions. And OwnedEntry implements Drop. A receipt of double free.

With all the reasons above, I suggest removing OwnedEntry.

When references to XArray entries are demanded, use &XEntry instead. Similar to how Cursor and CursorMut have used &XArray.

Copy link
Copy Markdown
Contributor Author

@cchanging cchanging Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yesterday I tried to have XEntry take ownership and pass variables using &XEntry for specific operations. While sorting out lifetimes, I encountered an unavoidable issue, which is that since we need the clone functionality, all XNodes are managed by Locks, which leads to the lifetime of &XEntry being the same as that of LockGuard when reading XEntry from the node. As a result, holding &XEntry needs to maintain the LockGuard of the Node where XEntry resides. This requires delaying the release of locks for nodes along the path for some operations that go through multiple nodes from head to item, and also requires that the LockGuard for the node holding the item remains alive when the load() API returns &I. I feel that the implementation of this functionality would be quite ugly and could affect efficiency. Moreover, we can't refer to the implementation of BTreeMap in this aspect because not every node is managed by a lock. It might indeed be necessary to separate XEntry and something like OwnedEntry, and then let XNode and XArray hold the ownership of these entries.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that you encountered some technical difficulties using &XEntry. Let's say that we keep some sort of XEntry and OwnedEntry, but their design really need an overhaul.

They API are not sound.

OwnedEntry has a safe constructor method called from_raw, which takes an XEntry, but the latter implements Copy. This means multiple OwnedEntry pointing to a single user-provided item or XEntry node may be created without calling any unsafe functions. And OwnedEntry implements Drop. A receipt of double free.

And they are ugly.

As an evidence, XNode's drop method transforms each and every contained XEntry objects into OwnedEntry without any extra condition to drop the entries manually.

I can't hand over a good solution to you, but I will definitely recognize one when I see it. Their current form is not. I am confident that you got the wit and patience to figure out a good solution.

///
/// It can support storing a range of indices at once (Multi-index entries, TODO).
/// It also supports a marking function, allowing for the stored contents to be marked with the required labels (TODO).
pub struct XArray<I: ItemEntry, M: XMark> {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not add trait bounds to type definition.

old_head
}

pub fn load<'a>(&'a self, index: u64) -> Option<I::Target<'a>> {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify the design by returning &I. For example, if I is Arc<MemoryArea>, then the load method returns &Arc<MemoryArea>, instead of &MemoryArea.

Returning I instead of I::Target brings two benefits. First, it gives all the capabilities that I possesses. For example, the user can call clone on an object of &Arc<_>. The following code snippet gives an concrete example.

const MY_KEY: u64 = 123;

let counter_table: Mutex<XArray<Arc<Mutex<u32>>>> = {
    let mut xarray = XArray::new());
    xarray.store(MY_KEY, Arc::new(Mutex::new(0)));
    Mutex::new(xarray)
};
let counter = counter_table.lock().load(MY_KEY).clone();
// After cloning, we can manipulate the item without locking the entire table
*counter.lock() += 1;

Second, it removes the extra complexities due to the extra level of indirection via I::Target. For example, there won't be a load_item method in the ItemEntry interface. And without the I::Target, I don't why see the reason to keeping the PointerItem trait.

unsafe { Some(I::load_item(entry)) }
}

pub fn store(&mut self, index: u64, value: I) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should return Option<I>.


use crate::*;

pub enum CurrentState {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad name


pub struct Cursor<'a, I: ItemEntry, M: XMark> {
xa: &'a XArray<I, M>,
xas: State<I, M>,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think State is a bad name. Not clear what this state is for. Obviously not the entire XArray. But it does not seem to be the state of the cursor. Otherwise, there should be a module called cursor which includes both Cursor and State.


pub struct State<I: ItemEntry, M: XMark> {
pub index: u64,
pub shift: u8,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think storing a height field would be much easier to understand than doing shift.

    /// The node's height from the bottom of the tree. The height of a lead node,
    /// which stores the user-given items, is zero.
    height: u8,

I now understand what shift is, after watch a youtube video by the creator of XArray. He shows a figure depicting the shift values of nodes in a radix tree. After seeing the figure, the concept becomes obvious to me. While the concept of "shift" needs extra explanation, that of "depth" in a tree is understood by every undergraduate student.

Another reason why "shift" is harder to understand is that "shift" is about the low-level bit operations, while "depth" is about the high-level tree structure.

pub index: u64,
pub shift: u8,
pub sibs: u8,
pub offset: u8,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming offset_in_parent would make the meaning of this field self-explaining.

    /// This node is its parent's `offset_in_parent`-th child. This field is meaningless if this node is the root
    /// (no parent).
    offset_in_parent: u8

index,
shift: 0,
sibs: 0,
offset: 0,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the useless fields.

use crate::*;
pub const CHUNK_SHIFT: usize = 6;
pub const CHUNK_SIZE: usize = 1 << CHUNK_SHIFT;
pub const CHUNK_MASK: usize = CHUNK_SIZE - 1;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, these are internal parameters that are useful to the users. So they should not be made public.

Second, the notion of chunk has never been explained or referenced. Make the notions, concepts and terms used consistent throughout the codebase. I think the notion of chunk can be replaced by nodes. Having NODE_SHIFT or NODE_MASK may not quite make sense, but NODE_SIZE does. (Although calling it NODE_SIZE is a bit ambiguous because it is not clear whether the size is in bytes or about the number of entries).

@@ -0,0 +1,241 @@
use std::{
marker::PhantomData,
sync::{Arc, RwLock, Weak},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use RwLock. Use Mutex instead. RwLock should only be used when you are sure that the workload is dominated by read-only operations. This is not true for XArray.

let old_head_entry = xa.set_head(node_entry).unwrap();
let new_head_entry = xa.head();

let mut head_write = old_head_entry.as_node().write().unwrap();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please never ever name objects in verbs.

/// ```
///
/// It can support storing a range of indices at once (Multi-index entries, TODO).
/// It also supports a marking function, allowing for the stored contents to be marked with the required labels (TODO).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I watched a talk given by the creator of XArray. He says XArray keeps the data structure of Linux's original radix tree, but fixes the interface. For example, it adopts the array metaphor, instead of tree. So I think we should reference the best article that describes Linux radix tree, which could give people the general idea behind our implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants