A minimal runnable version of XArray. by cchanging · Pull Request #1 · asterinas/xarray

cchanging · 2024-01-01T07:35:27Z

No description provided.

tatetian · 2024-01-02T04:52:23Z

src/entry.rs

+    /// 用户需要确保该裸指针与原指针在内存布局上一致
+    /// 也就是原指针大小跟裸指针大小一致，且指向的位置一致
+    /// 同时需要确保原指针是aligned to 4的，因为XArray中的entry后两位要用于entry类型判断。
+    unsafe fn into_raw(self) -> *const u8;


This method should be safe. For the same reason as Box::into_raw and Arc::into_raw being safe.

tatetian · 2024-01-02T04:53:30Z

src/entry.rs

+
+/// 用来抽象用户将要存入XArray的类型，这些存入XArray的obj称之为item。
+/// 对于存入的item，要求其大小为4字节，可以是各种指针类型或者是usize、u64等整数类型。
+pub(crate) trait ItemEntry {


A user-facing trait should be public. Write integration or doc tests to ensure all user-facing traits or types are public.

tatetian · 2024-01-02T04:55:35Z

src/entry.rs

+    unsafe fn load_item<'a>(entry: &'a XEntry) -> Self::Target<'a>;
+}
+
+impl<I: PointerItem> ItemEntry for I {


This blanket implementation would be confusing for the user. Think about it. Both PointerItem and ItemEntry traits provide from_raw and into_raw for pointer types like Arc<_> and Box<_>, which themselves already implements from from_raw and into_raw. So here is the million-dollar question: what implementation does the following into_raw use?

use core::sync::Arc; use crate::{ItemEntry, PointerItem}; let smart_ptr = Arc::new(123); let _ = Arc::into_raw(smart_ptr);

tatetian · 2024-01-02T05:13:57Z

src/entry.rs

+///     - sibling entry (用于multi-index entries), empty entry, retry entry (用于异常处理)
+/// - item entry 用户存储的item
+///     - pointer entry 后两位为00, (tagged pointer entry)
+///     - value entry 末位为1


We have decoded the type of user items as a type parameter of XArray. So I assume that we do not need to use runtime information to tell whether the user items are pointers or values.

tatetian · 2024-01-02T05:15:12Z

src/entry.rs

+///     - pointer entry 后两位为00, (tagged pointer entry)
+///     - value entry 末位为1
+///
+/// XEntry没有所有权，可以copy，


I can't see why XEntry does not has ownership. XNode contains XEntry. XNode must ensure its contained entries refer to valid internal nodes or user-provided values. And when an XNode is dropped, all its contained entries should have been dropped, too. This is by definition ownership.

As an evidence, XNode's drop method transforms each and every contained XEntry objects into OwnedEntry without any extra condition to drop the entries manually. From this implementation, it is clear that XEntry is treated as OwnedEntry.

Here is yet another evidence that suggests having both XEntry and OwnedEntry is a bad idea. OwnedEntry has a safe constructor method called from_raw, which takes an XEntry, but the latter implements Copy. This means multiple OwnedEntry pointing to a single user-provided item or XEntry node may be created without calling any unsafe functions. And OwnedEntry implements Drop. A receipt of double free.

With all the reasons above, I suggest removing OwnedEntry.

When references to XArray entries are demanded, use &XEntry instead. Similar to how Cursor and CursorMut have used &XArray.

Yesterday I tried to have XEntry take ownership and pass variables using &XEntry for specific operations. While sorting out lifetimes, I encountered an unavoidable issue, which is that since we need the clone functionality, all XNodes are managed by Locks, which leads to the lifetime of &XEntry being the same as that of LockGuard when reading XEntry from the node. As a result, holding &XEntry needs to maintain the LockGuard of the Node where XEntry resides. This requires delaying the release of locks for nodes along the path for some operations that go through multiple nodes from head to item, and also requires that the LockGuard for the node holding the item remains alive when the load() API returns &I. I feel that the implementation of this functionality would be quite ugly and could affect efficiency. Moreover, we can't refer to the implementation of BTreeMap in this aspect because not every node is managed by a lock. It might indeed be necessary to separate XEntry and something like OwnedEntry, and then let XNode and XArray hold the ownership of these entries.

I get that you encountered some technical difficulties using &XEntry. Let's say that we keep some sort of XEntry and OwnedEntry, but their design really need an overhaul.

They API are not sound.

OwnedEntry has a safe constructor method called from_raw, which takes an XEntry, but the latter implements Copy. This means multiple OwnedEntry pointing to a single user-provided item or XEntry node may be created without calling any unsafe functions. And OwnedEntry implements Drop. A receipt of double free.

And they are ugly.

As an evidence, XNode's drop method transforms each and every contained XEntry objects into OwnedEntry without any extra condition to drop the entries manually.

I can't hand over a good solution to you, but I will definitely recognize one when I see it. Their current form is not. I am confident that you got the wit and patience to figure out a good solution.

tatetian · 2024-01-02T05:19:19Z

src/xarray.rs

+///
+/// It can support storing a range of indices at once （Multi-index entries, TODO）.
+/// It also supports a marking function, allowing for the stored contents to be marked with the required labels (TODO).
+pub struct XArray<I: ItemEntry, M: XMark> {


Do not add trait bounds to type definition.

tatetian · 2024-01-02T05:25:47Z

src/xarray.rs

+        old_head
+    }
+
+    pub fn load<'a>(&'a self, index: u64) -> Option<I::Target<'a>> {


Simplify the design by returning &I. For example, if I is Arc<MemoryArea>, then the load method returns &Arc<MemoryArea>, instead of &MemoryArea.

Returning I instead of I::Target brings two benefits. First, it gives all the capabilities that I possesses. For example, the user can call clone on an object of &Arc<_>. The following code snippet gives an concrete example.

const MY_KEY: u64 = 123; let counter_table: Mutex<XArray<Arc<Mutex<u32>>>> = { let mut xarray = XArray::new()); xarray.store(MY_KEY, Arc::new(Mutex::new(0))); Mutex::new(xarray) }; let counter = counter_table.lock().load(MY_KEY).clone(); // After cloning, we can manipulate the item without locking the entire table *counter.lock() += 1;

Second, it removes the extra complexities due to the extra level of indirection via I::Target. For example, there won't be a load_item method in the ItemEntry interface. And without the I::Target, I don't why see the reason to keeping the PointerItem trait.

tatetian · 2024-01-02T05:26:11Z

src/xarray.rs

+        unsafe { Some(I::load_item(entry)) }
+    }
+
+    pub fn store(&mut self, index: u64, value: I) {


Should return Option<I>.

tatetian · 2024-01-02T05:26:42Z

src/state.rs

+
+use crate::*;
+
+pub enum CurrentState {


tatetian · 2024-01-02T05:28:45Z

src/xarray.rs

+
+pub struct Cursor<'a, I: ItemEntry, M: XMark> {
+    xa: &'a XArray<I, M>,
+    xas: State<I, M>,


I think State is a bad name. Not clear what this state is for. Obviously not the entire XArray. But it does not seem to be the state of the cursor. Otherwise, there should be a module called cursor which includes both Cursor and State.

tatetian · 2024-01-02T07:36:36Z

src/state.rs

+
+pub struct State<I: ItemEntry, M: XMark> {
+    pub index: u64,
+    pub shift: u8,


I think storing a height field would be much easier to understand than doing shift.

/// The node's height from the bottom of the tree. The height of a lead node, /// which stores the user-given items, is zero. height: u8,

I now understand what shift is, after watch a youtube video by the creator of XArray. He shows a figure depicting the shift values of nodes in a radix tree. After seeing the figure, the concept becomes obvious to me. While the concept of "shift" needs extra explanation, that of "depth" in a tree is understood by every undergraduate student.

Another reason why "shift" is harder to understand is that "shift" is about the low-level bit operations, while "depth" is about the high-level tree structure.

tatetian · 2024-01-02T07:42:09Z

src/state.rs

+    pub index: u64,
+    pub shift: u8,
+    pub sibs: u8,
+    pub offset: u8,


Naming offset_in_parent would make the meaning of this field self-explaining.

/// This node is its parent's `offset_in_parent`-th child. This field is meaningless if this node is the root /// (no parent). offset_in_parent: u8

tatetian · 2024-01-02T07:43:34Z

src/state.rs

+            index,
+            shift: 0,
+            sibs: 0,
+            offset: 0,


Remove the useless fields.

tatetian · 2024-01-02T07:48:29Z

src/xarray.rs

+use crate::*;
+pub const CHUNK_SHIFT: usize = 6;
+pub const CHUNK_SIZE: usize = 1 << CHUNK_SHIFT;
+pub const CHUNK_MASK: usize = CHUNK_SIZE - 1;


First of all, these are internal parameters that are useful to the users. So they should not be made public.

Second, the notion of chunk has never been explained or referenced. Make the notions, concepts and terms used consistent throughout the codebase. I think the notion of chunk can be replaced by nodes. Having NODE_SHIFT or NODE_MASK may not quite make sense, but NODE_SIZE does. (Although calling it NODE_SIZE is a bit ambiguous because it is not clear whether the size is in bytes or about the number of entries).

tatetian · 2024-01-02T08:01:07Z

src/state.rs

@@ -0,0 +1,241 @@
+use std::{
+    marker::PhantomData,
+    sync::{Arc, RwLock, Weak},


Do not use RwLock. Use Mutex instead. RwLock should only be used when you are sure that the workload is dominated by read-only operations. This is not true for XArray.

tatetian · 2024-01-02T08:10:25Z

src/state.rs

+            let old_head_entry = xa.set_head(node_entry).unwrap();
+            let new_head_entry = xa.head();
+
+            let mut head_write = old_head_entry.as_node().write().unwrap();


Please never ever name objects in verbs.

tatetian · 2024-01-02T08:23:50Z

src/xarray.rs

+/// ```
+///
+/// It can support storing a range of indices at once （Multi-index entries, TODO）.
+/// It also supports a marking function, allowing for the stored contents to be marked with the required labels (TODO).


I watched a talk given by the creator of XArray. He says XArray keeps the data structure of Linux's original radix tree, but fixes the interface. For example, it adopts the array metaphor, instead of tree. So I think we should reference the best article that describes Linux radix tree, which could give people the general idea behind our implementation.

A minimal runnable version of XArray.

fc0b17d

cchanging merged commit 6d3cd05 into asterinas:main Jan 1, 2024

tatetian reviewed Jan 2, 2024

View reviewed changes

Conversation

cchanging commented Jan 1, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cchanging Jan 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cchanging Jan 3, 2024 •

edited

Loading