A minimal runnable version of XArray.#1
Conversation
| /// 用户需要确保该裸指针与原指针在内存布局上一致 | ||
| /// 也就是原指针大小跟裸指针大小一致,且指向的位置一致 | ||
| /// 同时需要确保原指针是aligned to 4的,因为XArray中的entry后两位要用于entry类型判断。 | ||
| unsafe fn into_raw(self) -> *const u8; |
There was a problem hiding this comment.
This method should be safe. For the same reason as Box::into_raw and Arc::into_raw being safe.
|
|
||
| /// 用来抽象用户将要存入XArray的类型,这些存入XArray的obj称之为item。 | ||
| /// 对于存入的item,要求其大小为4字节,可以是各种指针类型或者是usize、u64等整数类型。 | ||
| pub(crate) trait ItemEntry { |
There was a problem hiding this comment.
A user-facing trait should be public. Write integration or doc tests to ensure all user-facing traits or types are public.
| unsafe fn load_item<'a>(entry: &'a XEntry) -> Self::Target<'a>; | ||
| } | ||
|
|
||
| impl<I: PointerItem> ItemEntry for I { |
There was a problem hiding this comment.
This blanket implementation would be confusing for the user. Think about it. Both PointerItem and ItemEntry traits provide from_raw and into_raw for pointer types like Arc<_> and Box<_>, which themselves already implements from from_raw and into_raw. So here is the million-dollar question: what implementation does the following into_raw use?
use core::sync::Arc;
use crate::{ItemEntry, PointerItem};
let smart_ptr = Arc::new(123);
let _ = Arc::into_raw(smart_ptr);| /// - sibling entry (用于multi-index entries), empty entry, retry entry (用于异常处理) | ||
| /// - item entry 用户存储的item | ||
| /// - pointer entry 后两位为00, (tagged pointer entry) | ||
| /// - value entry 末位为1 |
There was a problem hiding this comment.
We have decoded the type of user items as a type parameter of XArray. So I assume that we do not need to use runtime information to tell whether the user items are pointers or values.
| /// - pointer entry 后两位为00, (tagged pointer entry) | ||
| /// - value entry 末位为1 | ||
| /// | ||
| /// XEntry没有所有权,可以copy, |
There was a problem hiding this comment.
I can't see why XEntry does not has ownership. XNode contains XEntry. XNode must ensure its contained entries refer to valid internal nodes or user-provided values. And when an XNode is dropped, all its contained entries should have been dropped, too. This is by definition ownership.
As an evidence, XNode's drop method transforms each and every contained XEntry objects into OwnedEntry without any extra condition to drop the entries manually. From this implementation, it is clear that XEntry is treated as OwnedEntry.
Here is yet another evidence that suggests having both XEntry and OwnedEntry is a bad idea. OwnedEntry has a safe constructor method called from_raw, which takes an XEntry, but the latter implements Copy. This means multiple OwnedEntry pointing to a single user-provided item or XEntry node may be created without calling any unsafe functions. And OwnedEntry implements Drop. A receipt of double free.
With all the reasons above, I suggest removing OwnedEntry.
When references to XArray entries are demanded, use &XEntry instead. Similar to how Cursor and CursorMut have used &XArray.
There was a problem hiding this comment.
Yesterday I tried to have XEntry take ownership and pass variables using &XEntry for specific operations. While sorting out lifetimes, I encountered an unavoidable issue, which is that since we need the clone functionality, all XNodes are managed by Locks, which leads to the lifetime of &XEntry being the same as that of LockGuard when reading XEntry from the node. As a result, holding &XEntry needs to maintain the LockGuard of the Node where XEntry resides. This requires delaying the release of locks for nodes along the path for some operations that go through multiple nodes from head to item, and also requires that the LockGuard for the node holding the item remains alive when the load() API returns &I. I feel that the implementation of this functionality would be quite ugly and could affect efficiency. Moreover, we can't refer to the implementation of BTreeMap in this aspect because not every node is managed by a lock. It might indeed be necessary to separate XEntry and something like OwnedEntry, and then let XNode and XArray hold the ownership of these entries.
There was a problem hiding this comment.
I get that you encountered some technical difficulties using &XEntry. Let's say that we keep some sort of XEntry and OwnedEntry, but their design really need an overhaul.
They API are not sound.
OwnedEntry has a safe constructor method called from_raw, which takes an XEntry, but the latter implements Copy. This means multiple OwnedEntry pointing to a single user-provided item or XEntry node may be created without calling any unsafe functions. And OwnedEntry implements Drop. A receipt of double free.
And they are ugly.
As an evidence, XNode's drop method transforms each and every contained XEntry objects into OwnedEntry without any extra condition to drop the entries manually.
I can't hand over a good solution to you, but I will definitely recognize one when I see it. Their current form is not. I am confident that you got the wit and patience to figure out a good solution.
| /// | ||
| /// It can support storing a range of indices at once (Multi-index entries, TODO). | ||
| /// It also supports a marking function, allowing for the stored contents to be marked with the required labels (TODO). | ||
| pub struct XArray<I: ItemEntry, M: XMark> { |
There was a problem hiding this comment.
Do not add trait bounds to type definition.
| old_head | ||
| } | ||
|
|
||
| pub fn load<'a>(&'a self, index: u64) -> Option<I::Target<'a>> { |
There was a problem hiding this comment.
Simplify the design by returning &I. For example, if I is Arc<MemoryArea>, then the load method returns &Arc<MemoryArea>, instead of &MemoryArea.
Returning I instead of I::Target brings two benefits. First, it gives all the capabilities that I possesses. For example, the user can call clone on an object of &Arc<_>. The following code snippet gives an concrete example.
const MY_KEY: u64 = 123;
let counter_table: Mutex<XArray<Arc<Mutex<u32>>>> = {
let mut xarray = XArray::new());
xarray.store(MY_KEY, Arc::new(Mutex::new(0)));
Mutex::new(xarray)
};
let counter = counter_table.lock().load(MY_KEY).clone();
// After cloning, we can manipulate the item without locking the entire table
*counter.lock() += 1;Second, it removes the extra complexities due to the extra level of indirection via I::Target. For example, there won't be a load_item method in the ItemEntry interface. And without the I::Target, I don't why see the reason to keeping the PointerItem trait.
| unsafe { Some(I::load_item(entry)) } | ||
| } | ||
|
|
||
| pub fn store(&mut self, index: u64, value: I) { |
|
|
||
| use crate::*; | ||
|
|
||
| pub enum CurrentState { |
|
|
||
| pub struct Cursor<'a, I: ItemEntry, M: XMark> { | ||
| xa: &'a XArray<I, M>, | ||
| xas: State<I, M>, |
There was a problem hiding this comment.
I think State is a bad name. Not clear what this state is for. Obviously not the entire XArray. But it does not seem to be the state of the cursor. Otherwise, there should be a module called cursor which includes both Cursor and State.
|
|
||
| pub struct State<I: ItemEntry, M: XMark> { | ||
| pub index: u64, | ||
| pub shift: u8, |
There was a problem hiding this comment.
I think storing a height field would be much easier to understand than doing shift.
/// The node's height from the bottom of the tree. The height of a lead node,
/// which stores the user-given items, is zero.
height: u8,I now understand what shift is, after watch a youtube video by the creator of XArray. He shows a figure depicting the shift values of nodes in a radix tree. After seeing the figure, the concept becomes obvious to me. While the concept of "shift" needs extra explanation, that of "depth" in a tree is understood by every undergraduate student.
Another reason why "shift" is harder to understand is that "shift" is about the low-level bit operations, while "depth" is about the high-level tree structure.
| pub index: u64, | ||
| pub shift: u8, | ||
| pub sibs: u8, | ||
| pub offset: u8, |
There was a problem hiding this comment.
Naming offset_in_parent would make the meaning of this field self-explaining.
/// This node is its parent's `offset_in_parent`-th child. This field is meaningless if this node is the root
/// (no parent).
offset_in_parent: u8| index, | ||
| shift: 0, | ||
| sibs: 0, | ||
| offset: 0, |
| use crate::*; | ||
| pub const CHUNK_SHIFT: usize = 6; | ||
| pub const CHUNK_SIZE: usize = 1 << CHUNK_SHIFT; | ||
| pub const CHUNK_MASK: usize = CHUNK_SIZE - 1; |
There was a problem hiding this comment.
First of all, these are internal parameters that are useful to the users. So they should not be made public.
Second, the notion of chunk has never been explained or referenced. Make the notions, concepts and terms used consistent throughout the codebase. I think the notion of chunk can be replaced by nodes. Having NODE_SHIFT or NODE_MASK may not quite make sense, but NODE_SIZE does. (Although calling it NODE_SIZE is a bit ambiguous because it is not clear whether the size is in bytes or about the number of entries).
| @@ -0,0 +1,241 @@ | |||
| use std::{ | |||
| marker::PhantomData, | |||
| sync::{Arc, RwLock, Weak}, | |||
There was a problem hiding this comment.
Do not use RwLock. Use Mutex instead. RwLock should only be used when you are sure that the workload is dominated by read-only operations. This is not true for XArray.
| let old_head_entry = xa.set_head(node_entry).unwrap(); | ||
| let new_head_entry = xa.head(); | ||
|
|
||
| let mut head_write = old_head_entry.as_node().write().unwrap(); |
There was a problem hiding this comment.
Please never ever name objects in verbs.
| /// ``` | ||
| /// | ||
| /// It can support storing a range of indices at once (Multi-index entries, TODO). | ||
| /// It also supports a marking function, allowing for the stored contents to be marked with the required labels (TODO). |
There was a problem hiding this comment.
I watched a talk given by the creator of XArray. He says XArray keeps the data structure of Linux's original radix tree, but fixes the interface. For example, it adopts the array metaphor, instead of tree. So I think we should reference the best article that describes Linux radix tree, which could give people the general idea behind our implementation.
No description provided.