The datasets in our experiments are derived from two sources:
- the raw meta information (e.g., title, review) downloaded from Amazon review.
- the preprocessed interactions (i.e., item sequences) obtained from UniSRec.
Please preprocess the reviews and records based on the scripts. Let's take the Office dataset as an example, the preprocessed dataset should be:
Office
├─title_review_summary_descroption
├──test.pkl
├──train.pkl
├──val.pkl
├─negative_title
├──user_item_negitem_nge_title_seq_test.pkl
├──user_item_negitem_nge_title_seq_train.pkl
├──user_item_negitem_nge_title_seq_val.pkl
├─Office_products_5.json
├─meta_Office_Products.json
├─Office_products_5.json
└stmap.pkl
Or you can download the processed datasets from here.